Preserved Topic: That automatic-URL thing in UBB/ezBoards... (Page 1 of 1) $Pages that link to <a href="https://ozoneasylum.com/backlink?for=20895" title="Pages that link to Preserved Topic: That automatic-URL thing in UBB/ezBoards... (Page 1 of 1)" rel="nofollow" >Preserved Topic: That automatic-URL thing in UBB/ezBoards... <span class="small">(Page 1 of 1)</span>\$

PenguinKing Nervous Wreck (II) Inmate From: Regina, SK, Canada Insane since: Apr 2000	posted 05-05-2001 03:48 Heya. You know that automatic-URL-linking feature that appears in UBBs and ezBoards, wherein, if the user types a plaintext web address: http://some.random.page ... the board automatically inserts the anchors when it's posted and turns it into: <a href=http:some.random.page>http://some.random/page</a> Yeah, that. Well, to be brief, how does it work? I've looked at the UBB source code and I think I've found where it happens, but damned if I can figure how. Yep, I'm one o' those sad, strange little men who isn't satisfied with knowing that it works, but just has to know the reason it works it as well. Anyone feel masochistic enough to try explaining to a quasi-newbie how that particular bit o' code works? - Sir Bob. P.S. Nih! [This message has been edited by PenguinKing (edited 05-05-2001).]
WarMage Maniac (V) Mad Scientist From: Rochester, New York, USA Insane since: May 2000	posted 05-05-2001 09:05 It happens because of a regular expression that parses for http:// or any other variation, and then all information up until a white space character and then paces it into the aprropriate format as you stated above. -mage-
PenguinKing Nervous Wreck (II) Inmate From: Regina, SK, Canada Insane since: Apr 2000	posted 05-06-2001 03:02 Hmmm. Thanks. While we're on the subject, would anyone like to give me a bit of a crash-course in how I'd go about constructing a similar function myself? Let's use a simplified case, say, I wanna take a sentence and surround every word that starts with the letter "a" with asterisks. I'm passingly familiar with PERL (so don't feel you have to break down every line of code and explain every single character's purpose in words of one syllable) but I'm still a bit clueless when it comes to plugging functions and expressions together to accomplish something specific. Any help? - Sir Bob. P.S. Nih!
linear Paranoid (IV) Inmate From: other places Insane since: Mar 2001	posted 05-06-2001 05:02 $foo =~ s/\b(a.)\b/$1*/g; seems to me like it would do it.
PenguinKing Nervous Wreck (II) Inmate From: Regina, SK, Canada Insane since: Apr 2000	posted 05-06-2001 19:45 Ah. Thank you. That's simpler than I would have expected. However, I think ye've got a tiny error in there. Correct me if I'm wrong, but as written, wouldn't it put one asterisk before the first word in the string that starts with "a", and a second asterisk at the end of the string, and that's it? If you want asterisks before and after each individual word starting with "a", I'd think it'd be more like: $foo =~ s/\b(a.?\ )\b/$1/g; EDIT: Ach, just tried it out; that surrounds the individual words all right (including non-alphanumeric characters), but includes* the space at the end. So close, yet so far... Again, thanks. This will come in quite handy. - Sir Bob. P.S. Nih! [This message has been edited by PenguinKing (edited 05-06-2001).]
linear Paranoid (IV) Inmate From: other places Insane since: Mar 2001	posted 05-06-2001 20:35 you're right about making the match non-greedy, so adding ? is appropriate. I don't know why you'd want the \ in front of the ) though. #!/usr/bin/perl $foo = "aunt annie's alligators aggravate a-list foo\n"; $foo =~ s/\b(a.?)\b/$1/g; print $foo; outputs this: aunt* annie's alligators aggravate a-list foo so you can see that the \b may not be doing what you really want. How about: $foo =~ s/(a\S)/$1/g; That gets aunt* annie's alligators aggravate a-list foo For those following along, my first solution neglected to account for the fact that matches are "greedy," i.e. will try to match the longest of all possible matches. The ? modifier make the match parsimonius i.e. match the shortest string possible (the opposite of greedy). \b is a zero-width word boundary, but perl considers word characters to be alpanumeric plus underscore, so apostrophe and hyphen got missed . To fix that, I eliminated the \b atoms and used \S* meaning "zero or more non-whitespace characters."
linear Paranoid (IV) Inmate From: other places Insane since: Mar 2001	posted 05-06-2001 20:40 This might suggest $foo =~ s/(http:\/\/\S)/<A href="$1">$1<\/A>/g; or (changing delimiters for readability): $foo =~ s#(http://\S)#<A href="$1">$1</A>#g; but there's some issues. The Perl Cookbook, which I have at work, has a decent solution that accounts for ftp:// urls also.
timothymcnulty Neurotic (0) Inmate Newly admitted	posted 05-06-2001 21:49 holy crap i am lost and i shouldnt be....where was that thread on reg exp?? for anyone... perl-based: http://www.ozoneasylum.com/Forum12/HTML/000150.html a little here: http://www.ozoneasylum.com/Forum12/HTML/000165.html books: http://www.ozoneasylum.com/Forum12/HTML/000152.html ~Age doesn't always bring wisdom. Sometimes age comes alone.~
linear Paranoid (IV) Inmate From: other places Insane since: Mar 2001	posted 05-07-2001 04:24 Fasten yer seat belts: Here's the code in GreyMatter that does this: $thiscommenttext =~ s#(^
mr.maX Maniac (V) Mad Scientist From: Belgrade, Serbia Insane since: Sep 2000	posted 05-07-2001 17:48 Here's a much simpler regex... $$text =~ s!((?:https?
linear Paranoid (IV) Inmate From: other places Insane since: Mar 2001	posted 05-07-2001 18:43 Mr. Max, yours just builds the character class by excluding bad URL characters, but otherwise it's much the same. The analysis, for those following along: ! is the delimiter here The outermost parens are the "memory" parens. (?:https?
mr.maX Maniac (V) Mad Scientist From: Belgrade, Serbia Insane since: Sep 2000	posted 05-07-2001 19:16 Yeah, I know that it works much more in the same way... It's just easier to look at, heh! Anyway, as far as other protocols are concerned, they can be added very easy, but for the sake of simplicity I didn't put them there. The same explanation goes for that problem with punctuation, because regexes take some CPU time, and if they are executed a lot (in forums for example), they should be as simple as possible (that's why every forum has some internal markup language, like UBB Code). Oh, BTW, in your explanation, you didn't mention that using "?:" in (?:bla1
linear Paranoid (IV) Inmate From: other places Insane since: Mar 2001	posted 05-07-2001 19:33 Yeah, I missed it. I meant to write that The /o modifier is a must when you have to regex-process lots of text, especially with a complicated substitution. You shouldn't probably undertake to do any of the above without doing some benchmarks first--you ought to know roughly how long a page with 100, 1,000 and 10,000 URLs in it will take to munge on your server. The /i modifier is tradionally thought of as expensive also, so depending on the approach you took to building your character classes, you may win by disabling /i. Just a thought. [This message has been edited by linear (edited 05-07-2001).]
mr.maX Maniac (V) Mad Scientist From: Belgrade, Serbia Insane since: Sep 2000	posted 05-07-2001 20:26 As far as /i modifier is concerned, my orignal code is much different (the actual text is being pre-processed before inserted in the db, so I don't need some modifiers, like /s for example)...
linear Paranoid (IV) Inmate From: other places Insane since: Mar 2001	posted 05-07-2001 21:15 as to the /s modifier, I don't have a clue why they would want it. Newlines are obviously not legal URL characters, and I dont see the . used anywhere, so it seems unnecessary to me. Here's the solution from the version of UBB we having lying around: $ThePost =~ s/(^

Preserved Topic: That automatic-URL thing in UBB/ezBoards... (Page 1 of 1) $Pages that link to <a href="https://ozoneasylum.com/backlink?for=20895" title="Pages that link to Preserved Topic: That automatic-URL thing in UBB/ezBoards... (Page 1 of 1)" rel="nofollow" >Preserved Topic: That automatic-URL thing in UBB/ezBoards... <span class="small">(Page 1 of 1)</span>\$

PenguinKing
Nervous Wreck (II) Inmate

From: Regina, SK, Canada
Insane since: Apr 2000

posted 05-05-2001 03:48

Heya. You know that automatic-URL-linking feature that appears in UBBs and ezBoards, wherein, if the user types a plaintext web address:

http://some.random.page

... the board automatically inserts the anchors when it's posted and turns it into:

<a href=http:some.random.page>http://some.random/page</a>

Yeah, that. Well, to be brief, how does it work? I've looked at the UBB source code and I think I've found where it happens, but damned if I can figure how. Yep, I'm one o' those sad, strange little men who isn't satisfied with knowing that it works, but just has to know the reason it works it as well. Anyone feel masochistic enough to try explaining to a quasi-newbie how that particular bit o' code works?

- Sir Bob.

P.S. Nih!

[This message has been edited by PenguinKing (edited 05-05-2001).]

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

posted 05-05-2001 09:05

It happens because of a regular expression that parses for http:// or any other variation, and then all information up until a white space character and then paces it into the aprropriate format as you stated above.

-mage-

PenguinKing
Nervous Wreck (II) Inmate

From: Regina, SK, Canada
Insane since: Apr 2000

posted 05-06-2001 03:02

Hmmm. Thanks. While we're on the subject, would anyone like to give me a bit of a crash-course in how I'd go about constructing a similar function myself? Let's use a simplified case, say, I wanna take a sentence and surround every word that starts with the letter "a" with asterisks. I'm passingly familiar with PERL (so don't feel you have to break down every line of code and explain every single character's purpose in words of one syllable) but I'm still a bit clueless when it comes to plugging functions and expressions together to accomplish something specific. Any help?

- Sir Bob.

P.S. Nih!

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted 05-06-2001 05:02

$foo =~ s/\b(a.*)\b/*$1*/g;

seems to me like it would do it.

PenguinKing
Nervous Wreck (II) Inmate

From: Regina, SK, Canada
Insane since: Apr 2000

posted 05-06-2001 19:45

Ah. Thank you. That's simpler than I would have expected.

However, I think ye've got a tiny error in there. Correct me if I'm wrong, but as written, wouldn't it put one asterisk before the first word in the string that starts with "a", and a second asterisk at the end of the string, and that's it? If you want asterisks before and after each individual word starting with "a", I'd think it'd be more like:

$foo =~ s/\b(a.*?\ )\b/*$1*/g;

EDIT: Ach, just tried it out; that surrounds the individual words all right (including non-alphanumeric characters), but includes the space at the end. So close, yet so far...

Again, thanks. This will come in quite handy.

- Sir Bob.

P.S. Nih!

[This message has been edited by PenguinKing (edited 05-06-2001).]

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted 05-06-2001 20:35

you're right about making the match non-greedy, so adding ? is appropriate.

I don't know why you'd want the \ in front of the ) though.

#!/usr/bin/perl
$foo = "aunt annie's alligators aggravate a-list foo\n";
$foo =~ s/\b(a.*?)\b/*$1*/g;
print $foo;

outputs this:
*aunt* *annie*'s *alligators* *aggravate* *a*-list foo

so you can see that the \b may not be doing what you really want.

How about:
$foo =~ s/(a\S*)/*$1*/g;

That gets
*aunt* *annie's* *alligators* *aggravate* *a-list* foo

For those following along, my first solution neglected to account for the fact that matches are "greedy," i.e. will try to match the longest of all possible matches.

The ? modifier make the match parsimonius i.e. match the shortest string possible (the opposite of greedy).

\b is a zero-width word boundary, but perl considers word characters to be alpanumeric plus underscore, so apostrophe and hyphen got missed .

To fix that, I eliminated the \b atoms and used \S* meaning "zero or more non-whitespace characters."

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted 05-06-2001 20:40

This might suggest
$foo =~ s/(http:\/\/\S*)/<A href="$1">$1<\/A>/g;

or (changing delimiters for readability):
$foo =~ s#(http://\S*)#<A href="$1">$1</A>#g;

but there's some issues. The Perl Cookbook, which I have at work, has a decent solution that accounts for ftp:// urls also.

timothymcnulty
Neurotic (0) Inmate
Newly admitted

posted 05-06-2001 21:49

holy crap i am lost and i shouldnt be....where was that thread on reg exp??

for anyone...

perl-based: http://www.ozoneasylum.com/Forum12/HTML/000150.html

a little here: http://www.ozoneasylum.com/Forum12/HTML/000165.html

books: http://www.ozoneasylum.com/Forum12/HTML/000152.html

~Age doesn't always bring wisdom. Sometimes age comes alone.~

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted 05-07-2001 04:24

Fasten yer seat belts:

Here's the code in GreyMatter that does this:

$thiscommenttext =~ s#(^

mr.maX
Maniac (V) Mad Scientist

From: Belgrade, Serbia
Insane since: Sep 2000

posted 05-07-2001 17:48

Here's a much simpler regex...

$$text =~ s!((?:https?

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted 05-07-2001 18:43

Mr. Max, yours just builds the character class by excluding bad URL characters, but otherwise it's much the same.

The analysis, for those following along:
! is the delimiter here
The outermost parens are the "memory" parens.

(?:https?

mr.maX
Maniac (V) Mad Scientist

From: Belgrade, Serbia
Insane since: Sep 2000

posted 05-07-2001 19:16

Yeah, I know that it works much more in the same way... It's just easier to look at, heh! Anyway, as far as other protocols are concerned, they can be added very easy, but for the sake of simplicity I didn't put them there. The same explanation goes for that problem with punctuation, because regexes take some CPU time, and if they are executed a lot (in forums for example), they should be as simple as possible (that's why every forum has some internal markup language, like UBB Code). Oh, BTW, in your explanation, you didn't mention that using "?:" in (?:bla1

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted 05-07-2001 19:33

Yeah, I missed it. I meant to write that

The /o modifier is a must when you have to regex-process lots of text, especially with a complicated substitution. You shouldn't probably undertake to do any of the above without doing some benchmarks first--you ought to know roughly how long a page with 100, 1,000 and 10,000 URLs in it will take to munge on your server.

The /i modifier is tradionally thought of as expensive also, so depending on the approach you took to building your character classes, you may win by disabling /i. Just a thought.

[This message has been edited by linear (edited 05-07-2001).]

mr.maX
Maniac (V) Mad Scientist

From: Belgrade, Serbia
Insane since: Sep 2000

posted 05-07-2001 20:26

As far as /i modifier is concerned, my orignal code is much different (the actual text is being pre-processed before inserted in the db, so I don't need some modifiers, like /s for example)...

linear
Paranoid (IV) Inmate

From: other places
Insane since: Mar 2001

posted 05-07-2001 21:15

as to the /s modifier, I don't have a clue why they would want it. Newlines are obviously not legal URL characters, and I dont see the . used anywhere, so it seems unnecessary to me.

Here's the solution from the version of UBB we having lying around:

$ThePost =~ s/(^