Closed Thread Icon

Topic awaiting preservation: regex guru needed (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=21766" title="Pages that link to Topic awaiting preservation: regex guru needed (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: regex guru needed <span class="small">(Page 1 of 1)</span>\

 
Hockey
Neurotic (0) Inmate
Newly admitted

From:
Insane since: May 2004

posted posted 05-12-2004 03:57

Heres my problem:

I need to parse PHP classes using regex, but something I have noted is that any given function block can have any number of { } inside it's own function block so how can I determine when I have matched only a single function and the regex doesn't exit execution when it stumbles on the first close bracket like in a for loop or something???

Is what I am requesting make sense?

Basically:

function myFunc()
{
for(...){
if() echo "tezt";
}
}

How would I write a regex that can match a generic function declaration/definition NOT just the one labelled above???

If you know regex well enough to help me out I would appreciate it...especially if you can kind of walk me through the process of what each regex sequence does and finally a composed regex that matches functions...

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

posted posted 05-12-2004 04:13

I think I get what you are talking about...

Could you explain what you are trying to do a little better. I might be able to help, but I need some more information.

bitdamaged
Maniac (V) Mad Scientist

From: 100101010011 <-- right about here
Insane since: Mar 2000

posted posted 05-12-2004 06:05

What you are talking about goes a bit beyond regular expressions. There are ways but your going to run in to a lot of problems. Consider a function with a comment in it where the brackets don't have to match

function somthing() {
// (()){ }} }
}



.:[ Never resist a perfect moment ]:.

Tyberius Prime
Paranoid (IV) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 05-12-2004 07:08

this is outside of the scope of what regexps can do - The simple rule of thumb here is 'regexps don't count'.

I suggest you find all function starts, and then walk from there one char by char, counting up when you see a {, counting down on }, finished when you reach 0 the second time. It is pretty basic - but not doable via regexps.

So long,

->Tyberius Prime

Veneficuz
Paranoid (IV) Inmate

From: A graveyard of dreams
Insane since: Mar 2001

posted posted 05-12-2004 12:51

TP's solution seems like a nice solution. If you allow comments inside the functions you should also take note of that so you don't cound brackets inside those (like bitdamaged mentioned).

Strictly speaking it is possible to make a regex 'sort of ' count, don't remeber what it can and can't do at the moment since I've never used that feature before. But it is often easier to do it with something other than a regex if counting is included.

_________________________
"There are 10 kinds of people; those who know binary, those who don't and those who start counting at zero"
- the Golden Ratio - Vim Tutorial -

MajorFracas
Nervous Wreck (II) Inmate

From:
Insane since: Jul 2003

posted posted 05-12-2004 15:21

While bitdamaged brings up a good point (that anything could be in the file), for a finite group of files, not everything will be in the files.

So, if you can rule out patterns like mismatched braces in comments, maybe you can find something that works. Ignoring the issue of actually matching the braces, is there another pattern that you could use to identify the start and end of a function declaration? For example, if you have consistently applied formatting to your files, maybe a function declaration always starts at the beginning of a line (with no, or a fixed number of leading whitespace characters) .

Also, while it is true that execution blocks (if, for, while, etc) may be nested to an infinite depth, the truth is that for any given set of code, the nesting depth will be much smaller than infinity. In fact, there are those that espouse the belief that writing deeply nested code is not good and has an adverse effect on the maintainability and readibility of the code.

So, if we can assume some reasonable number for nested depth, say 3 or 4, then it is not beyond reason to write a regex that can handle that.

A regex for a function block with nesting depth of 1 might look like this:

code:
{\\([^{}]*\\|{[^{}]*}\\)*}



(I've been writing sed scripts recently so forgive me if the above syntax is not correct...)

Basically, you could interpret the above like so:
Between { and } there will 0 or more occurences of 2 possible patterns. Either something not in a nexted block: [^{}]* or something within a nested block: {[^{}]*}

To increase the nested depth, repeat the whole pattern within the second sub-pattern above. Something like:

code:
{\\([^{}]*\\|{\\([^{}]*\\|{\\([^{}]*\\|{\\([^{}]*\\|{[^{}]*}\\)*}\\)*}\\)*}\\)*}



This could handle a nested depth of 4 (I think...I'm getting dizzy....)

BTW, I haven't tried any of the above, so you'll clearly have to do some trial and error to get it to work.

Good Luck.

« BackwardsOnwards »

Show Forum Drop Down Menu