Closed Thread Icon

Preserved Topic: Regular Expression Workshop (PERL BASED) (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=20880" title="Pages that link to Preserved Topic: Regular Expression Workshop (PERL BASED) (Page 1 of 1)" rel="nofollow" >Preserved Topic: Regular Expression Workshop (PERL BASED) <span class="small">(Page 1 of 1)</span>\

 
WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

posted posted 04-06-2001 22:04

We are know that perl regular expressions are the standard because they are incredible powerful. We all also probabally know that we have not touched on them in the network. I know that I do not have a fully comprehensive understanding of them, and I use them only occationally, which is probably very poor on my part, because they are so powerful.

I hope to gain something an give something back to people through this exercise. I have compiled a list of different perl regular expression operators, and would like to have any one with any knowledge at all work to expand this list, with greater details about this area, examples of regular expressions with an good explaination of them, as well as any futher help you can offer with them.

The goal of this will be to ultimately place a great tool online for us all in the Gurusnetwork, giving credit full credit to all who have participated and helped. This is a great oppertunity for those who have not yet contributed to the network to easily get their name listed on a hopefully helpful tutorial.

So with out much more babbling let me put dowm my list, and with all of your help we should be able to expand on this and make it work well for everyone.

Notice: I am looking for people to post rather useful information, not simple one line pieces of code, if you post the code, be sure to go through the code piece by piece explain how it works, as well as to give an example and an overall description.

What I have gotten so far:

Matching operation modifiers

i - Do case-insensitive pattern matching
m - Treat string as multiple lines. That is, change "^" and "$" from matching at only the very start or end of the string to the start or end of any line anywhere within the string.
s - Treat sting as single line. That is, change "." to match any character whatsoever, even a newline, which is it normally would not match.
x - Extent your pattern's legibility by permitting whitespace and comments.

metacharacter egrep-ish meaning

\ - Quote the next metacharacter
^ - Match the beginning of the line
. - Match any character (except newline)
$ - Match the end of the line (or before newline at the end)

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-07-2001 00:20

Love this idea.

# replace all occurences of foo with bar
s/foo/bar/g

# replace all occurrences of foo with bar in files named *.html, saving the original file with .bak
perl -pi.bak -e 's/foo/bar/g' *.html

# replace all occurrences of foo with bar in files named *.html from this directory on down
perl -pi.bak -e 's/foo/bar/g' `find . -name *.html -print`
# those are backticks around the find.....

I use this last one damn near every day. Ninja powers...

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-08-2001 05:51

Parentheses are more than grouping, they're also backreferences. So to format 10-digit telphone numbers the way people like them, you could:

$fone = '2125551000'; // unreadable by most humans

$fone ~= 's/(\d{3})(\d{3})(\d{4})/\($1\) $2-$3/'; // now you've got (212) 555-1000

The first parenthesized subexpression is referred to as $1 in the substitution, the second is $2, etc.

If you haven't parsed the expression elements yourself, \d says "match a digit." \d{3} says "match exactly three digits."

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

posted posted 04-08-2001 17:15

This is exactly what I was looking for! Linear keep up the good work, everyone else, spend 10 minutes to learn something and post it for us, when I get back home I will be doing the same thing.

Thanks for all the help, lets keep this going, remember this will get your name into the gurusnetwork.

-mage-

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-09-2001 04:38

One more important matching modifier:

Since we're restricting the discussion to Perl-ish regexps, it's worth noting that the /e modifier in a substitution means to treat the substitution expression as code to be eval()-ed. That can lead to some turbo-powered stuff. I haven't got any spiffy examples to pull out and lay on the table, but think about these applications...

Define a subroutine that takes a key as argument and returns a value (from an array, a db lookup, a DNS query, whatever):
sub get_value($key) {
## do something useful
return($val)
}

then use the subroutine call in a substitution regexp:

$foo ~= s/(\d{3})/&get_value($1)/e;

That regexp searches for a three digit string (an area code, perhaps) and substitutes the result of calling the function &get_value *with that three digit string as the key*. Very powerful indeed.

You could even exec() external programs if that's what you need.

Notably, PHP can do this too when complied with Perl-compatible regexp support.

Mr. Pecker
Nervous Wreck (II) Inmate

From: Goslow
Insane since: Apr 2001

posted posted 04-25-2001 17:34

linear, or anyone else.

Any idea how to rename files (under windows) in a command line script, kindof like your first examples.

Like for mp3s to rename files from <artist> <track> <song>.mp3 to <track> <artist> - <song>.mp3?

--

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-28-2001 15:19

Without some consistent delimiter between <artist> and <song> it would be tough. It would probably be feasible to do something that would work fairly well under the assumption that no artists have numbers in their names (a bad assumption).

Windoze is shell-impaired, so a Perl script (or other prog) is your only hope for a stunt like this.

Let me look at it a while.

avidal
Bipolar (III) Inmate

From: austin, tx, usa
Insane since: Nov 2000

posted posted 04-28-2001 20:08

linear, you could always write a script that checks the ID3 of the mp3, and names the file using that, and if the file has no id3, then you can leave the name alone

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

posted posted 04-28-2001 20:20

The space would be the delimiter.

mr.maX
Maniac (V) Mad Scientist

From: Belgrade, Serbia
Insane since: Sep 2000

posted posted 04-29-2001 17:57

http://www.renatager.de/

BTW WarMage, what would happen if artist/track name contains multiple words?

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-30-2001 00:06

This is a long way from being optimal, but it works for the case you provide, assuming no digits in the artist or track name (crappy assumption. WARNING: Don't bitch to me if this maims your filenames! Practice on a copy! Your mileage may vary! You have been warned!

## rename files from <artist> <track> <song>.mp3 to <track> <artist> - <song>.mp3
# define whatever dir pleases you
$somedir="c:/Program Files/audiograbber/grabs/test/";
# open it or die trying
opendir(DIR, $somedir) &#0124; &#0124; die "can't opendir $somedir: $!";
# read all the files (more than you want for sure, at leas includes . and ..)
@allfiles = readdir(DIR);

#narrow down our list
# this regexp matches filenames that end in .mp3, case insensitive
# the dot is a metacharacter, so it's escaped
# the $ token matches end of line
@mp3files = grep (/\.mp3$/i, @allfiles);

# now let's loop through and maim the filenames
foreach (@mp3files) {
# $_ gets set to one filename at a time till we're done
# this is alittle ugly to a purist, but keeps readability high
$oldname = $_;
# we'll be transforming $oldname into $newname, so we start with $oldname
$newname = $oldname;

# here goes the fun:
# first subexpression: match all the non-digit characters you can
# then match whitespace
# second subexpression: match two consecutive digits (do CDs have more than 99 tracks??!?)
# then match more whitespace
# third subexpression: match all the non-digit characters you can
# $1, $2 and $3 are backreferences to the three parenthesized subexpressions
# this is a substitution regexp, so $newname is transformed by the magic og the =~ operator
$newname =~ s/(\D*)\s(\d\d)\s(\D*)/$2 $1 - $3/;
#now the file renaming: if there's no match above, $newname and $oldname are equivalent, so this is cool, I think
rename $somedir.$oldname, $somedir.$newname or die "ouch: couldn't rename $!";
# a little user feedback, please
print "$oldname renamed to $newname\n";
}
# and mop up the blood, we're done
closedir DIR;
# whee!

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-30-2001 00:16

The above worked for me on ActivePerl build 509, which is way ancient. It ought to work on any perl you can get to install on Win32.

If you wanted to dork around with ID3 tags, you'd get the MP3::Info perl module, and rock on. But this is a regexp thread....

foreach $dirtydiaper (@diapers) { s/$dirtydiaper/$cleandiaper/g; }

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-30-2001 17:04

OK, this is an improvement, but only works because of the way you want to shuffle the filenames. Perl has three special match variables for regexp matches:

$` refers to the portion of the string preceding the match
$& refers to the portion of the string that matched
$' refers to the portion of the string that follows the match

If we match two digits surrounded by spaces, then $& is the track number, $` is the artist, and $' is the song name plus .mp3.

So change the patten match in the above to
s/\s(\d\d)\s/$1 $` - $'/
and the match should work on any filename that has the track number surrounded by spaces, with no assumptions about characters in the song or artist name.

The reason this is less than totally general is that it doesn't separate all the components into variables you can reassemble like Legos. You're stuck with the filename ending with <song>.mp3, which just happens to be what Mr. Pecker wanted.

Use of those three magic variables is often considered lazy, and carries a significant computational expense, so avoid using them when you can, especially inside loops. But they're part of why Perl has no equal when it comes to this kind of programming.

foreach $dirtydiaper (@diapers) { s/$dirtydiaper/$cleandiaper/g; }

Mr. Pecker
Nervous Wreck (II) Inmate

From: Goslow
Insane since: Apr 2001

posted posted 04-30-2001 18:07

Sweet merciful crap!
That helps alot, I got afew tips for shortening another script abit that I wrote for renaming files to their DOS names (the really messy part that unfortunatly remains so), copying them to my car and making the dos playlist.

I never thought to see if perl had its own rename function, I like this function. It will be my friend.

So will you, you'll be my friend right?

Thanks for the help

--

hyperbole
Paranoid (IV) Inmate

From: Madison, Indiana, USA
Insane since: Aug 2000

posted posted 04-30-2001 20:08

Regular expressions in Perl are called 'greedy'. This means that if you create an expression line /.*A/, the expression will match the longest string it can find that ends with 'A'. This string may itself contain 'A'. For example, "banana" =~ /.*a/ will match "banana".

You can make matching 'non-greedy' as follows: "banana" =~ /.*?a/ will match "ba".

Note that you can use any of the following:
*? Match 0 or more times
+? Match 1 or more times
?? Match 0 or 1 time
{n}? Match exactly n times
{n,}? Match at least n times
{n,m}? Match at least n but not more than m times
for non-greedy matches.

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 04-30-2001 21:23

non-greedy == parsimonious

Hey, that smells like the perlre man page!


[This message has been edited by linear (edited 05-01-2001).]

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 05-01-2001 17:01

lookahead assertions let you specify things that should match (or not match) without adding them to $& (the matching portion).

(?=pattern) is a zero-width positive lookahead assertion
"the next thing must be pattern, but don't add it to $&"

(?!pattern) is a zero-width negative lookahead assertion
"the next thing must not be pattern, but don't add it to $&"


Another gem (from _Perl Cookbook_ by Tom Christiansen and Nathan Torkington)

Adding commas (thousands separators) to numbers using regexes with lookahead assertions:

sub commify {
my $text = reverse $_[0];
$text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
return scalar reverse $text;
}

The devious part here is to reverse the string before inserting the commas.

Lets look at the regex:
(\d\d\d) matches three digits, the parens mean save the result of the match in $1
(?=\d) is a zero-width positive lookahead assertion (huh?)
that means "match if the next character is a digit, but don't include that digit in $&"
(?!\d*\.) is a zero-width negative lookahead assertion
that means "the next thing can not be zero or more digits followed by a literal period" which effectively keeps you from inserting commas to the right of a decimal point (to the left in the reversed string!)
$1, is out group of three digits from above, followed by a comma
/g means match and substitute as often as you can

Cool, or what?

hyperbole
Paranoid (IV) Inmate

From: Madison, Indiana, USA
Insane since: Aug 2000

posted posted 05-02-2001 01:09

linear: That is a cute trick! Why is the scalar necessary before the reverse in the return statement?

[This message has been edited by hyperbole (edited 05-02-2001).]

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted posted 05-02-2001 06:06

Hyperbole: "Why is the scalar necessary before the reverse in the return statement?"

Reverse in list context reverses the list element-wise. We'd get the one element list with it's elements in reverse order.

In scalar context (which we force), reverse returns the concatenation of the list elements, reversed bytewise.

Ya gotta love Perl.

« BackwardsOnwards »

Show Forum Drop Down Menu