Closed Thread Icon

Preserved Topic: Regex to extract string (Page 1 of 1) Pages that link to <a href="http://ozoneasylum.com/backlink?for=21175" title="Pages that link to Preserved Topic: Regex to extract string (Page 1 of 1)" rel="nofollow" >Preserved Topic: Regex to extract string <span class="small">(Page 1 of 1)</span>\

 
synax
Maniac (V) Inmate

From: Cell 666
Insane since: Mar 2002

posted posted 01-23-2004 03:33

I'm looking for a regular expression that will extract string literals (for use with egrep). That is, any number of characters between two double quotes, and on the same line. Escaped double quotes are allowed to be part of the string.

I found this: "[^"\\\r\n]*(\\.[^"\\\r\n]*)*" here, but I keep getting a syntax error near unexpected token `('

Can anyone help me here?

"Nothin' like a pro-stabbin' from a pro." -Weadah

[This message has been edited by synax (edited 01-23-2004).]

Tyberius Prime
Paranoid (IV) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 01-23-2004 11:58

hm... I don't know egrep.
Does that use perl style regexps as well.
Most regexps parser by default are line based.

Anyhow, I'm not even sure the language you're describing is regular, though I right now don't have the time to attempt to prove that it can not be handled via regular expressions.
(Well... actually, I can immediatly think of a finite state machine to handle it. So it must be regular ;-).
How about
(")(([^"]*\\"[^"]*)+

Skaarjj
Maniac (V) Mad Scientist

From: :morF
Insane since: May 2000

posted posted 01-23-2004 13:51

you know 'nax...that original regex of your should work if you put ( and ) around the first *

Hang on...no. That's the weirdest regex syntax I've ever seen. I think what you need ot use are multiple streams in your regex...give it options.

But...I see now that's what TP has already shown you...so I'll shut up now

Veneficuz
Paranoid (IV) Inmate

From: A graveyard of dreams
Insane since: Mar 2001

posted posted 01-23-2004 15:28

The reason you're getting the syntax error near unexpected token `(' is that you should have ' around the regexp, not ".

I came up with the following that seems to find every line containing a quoted sentence. If the line only has one ", it won't get registered.

code:
egrep '"([^\\"]*)"' <path>


I think it is possible to get egrep to subsitute the selection with something, but I don't have access to the manual here and I don't remeber the syntax.

[edit]Found some errors with the above regexp, so it selects some things that shouldn't be selected. But I'll keep the regexp there so you can look at it...[/edit]

When I try TP's regexp I get a 'Invalid regular expression' error.

_________________________
"There are 10 kinds of people; those who know binary, those who don't and those who start counting at zero"
- the Golden Ratio -

[This message has been edited by Veneficuz (edited 01-23-2004).]

synax
Maniac (V) Inmate

From: Cell 666
Insane since: Mar 2002

posted posted 01-23-2004 15:33

Veneficuz, thank you! Your regex works perfectly. [Edit: Well, almost perfectly ]

"Nothin' like a pro-stabbin' from a pro." -Weadah

[This message has been edited by synax (edited 01-23-2004).]

« BackwardsOnwards »

Show Forum Drop Down Menu