Closed Thread Icon

Preserved Topic: RegExp Pages that link to <a href="https://ozoneasylum.com/backlink?for=18105" title="Pages that link to Preserved Topic: RegExp" rel="nofollow" >Preserved Topic: RegExp\

 
Author Thread
lallous
Paranoid (IV) Inmate

From: Lebanon
Insane since: May 2001

posted posted 08-12-2001 11:53

Hello...
Can someone help me building a regexp that returns what's inside the HREF of an "A" tag?

code:
<a prop1='xxx' prop2='yyy' href='image1.jpg' prop3='xxx'>big image</a>



Please also explain the regexp solution.

galaxal
Paranoid (IV) Inmate

From:
Insane since: Oct 2000

posted posted 08-13-2001 06:38

the easy way is:

first you need a name for the anchor, change <a prop1='xxx' prop2='yyy' href='image1.jpg' prop3='xxx'>big image</a> to <a prop1='xxx' prop2='yyy' href='image1.jpg' prop3='xxx' name='theLink'>big image</a>
and
var str = document.all.theLink.href.
then you have the [var] variable to hold whatever the href is.


the regExp way I think would be: (I am not very good with RegExp, so please fix me if you think this is the long and dumb way, and watch out for spaces, even a space matters)

str = "<a prop1='xxx' prop2='yyy' href='image1.jpg' prop3='xxx'>big image</a>"
str = str.match(/href=.*[ >]?/); //now str has became from [href='image1.jpg' prop3 ... ] till the last white space or > mark in the string.

str += ''; //the match function will return an array, this is to force [str] to become a string again.

str = str.replace(/[ >].*/, ""); //replace from the first found white space or > mark in the string to end of string to nothing, now you have the variable str = [href='image.jpg']

str = str.replace(/href='?"?/, ""); //take out [href='], you probably see href="somelink" more than href='somelink', having '?"? will make it work in both cases.

str = str.replace(/['"]/g, ""); //finall remove any single or double quotes. I think we should only have image1.jpg left now

(I tested this in IE5 only, but it should work in most cases or need only a little modification)
so you have:

str = "<a prop1='xxx' prop2='yyy' href='image1.jpg' prop3='xxx'>big image</a>"
str = str.match(/href=.*[ >]?/);
str += '';
str = str.replace(/[ >].*/, "");
str = str.replace(/href='?"?/, "");
str = str.replace(/['"]/g, "");

[This message has been edited by galaxal (edited 08-13-2001).]

lallous
Paranoid (IV) Inmate

From: Lebanon
Insane since: May 2001

posted posted 08-13-2001 08:39

I've built this code, but it ain't very optimal...looking forward for a better code, that only reads the HREF form only "A" tag!

code:
str = "<a href=\"asd.txt\" prop1='aasd'>dadasD</a>";
x = new RegExp("href=\"([^\"]+)\"");
alert(x.exec(str)[1]);



mr.maX
Maniac (V) Mad Scientist

From: Belgrade, Serbia
Insane since: Sep 2000

posted posted 08-13-2001 09:38

I haven't tested this code thoroughly, but it should work fine...

<SCRIPT TYPE="text/javascript" LANGUAGE="JavaScript">
<!-- ;

// Written by mr.maX, http://www.max.co.yu/

maxText = "<a prop1='xxx' prop2='yyy' href='image1.jpg' prop3='xxx'>big image</a> some text here <a prop1='xxx' prop2='yyy' href='image2.jpg' prop3='xxx'>big image</a> some text here <a prop1='xxx' prop2='yyy' href='image3.jpg' prop3='xxx'>big image</a>";

maxRe = new RegExp ("<A(.+?)HREF=('

galaxal
Paranoid (IV) Inmate

From:
Insane since: Oct 2000

posted posted 08-13-2001 13:16

these are way better than mine, alright!!

lallous
Paranoid (IV) Inmate

From: Lebanon
Insane since: May 2001

posted posted 08-14-2001 11:15

Thanks mr. Max! It works fine!

Can you please explain me the '.+?'

. = any character
+ = one ore more characters
? = zero or more.

now the +? = what = one or more or zero?
as if .+ = one ore more, then .+? = (.+)? = last expression can be zero or one time?


Slime
Lunatic (VI) Mad Scientist

From: Massachusetts, USA
Insane since: Mar 2000

posted posted 08-14-2001 13:12

putting a ? after a + or a * or some other things means minimal matching. In other words, .+? means match any character one or more times, but only as few times as possible (as opposed to as many times as possible, which is what happens without the question mark). So, if the pattern would work with 2 characters there, or with 20 characters there, it will try to match it with 2 characters, and only go on to 20 if that doesn't work.

Er, that didn't really make sense, did it.

[This message has been edited by Slime (edited 08-14-2001).]

mr.maX
Maniac (V) Mad Scientist

From: Belgrade, Serbia
Insane since: Sep 2000

posted posted 08-14-2001 19:21

Slime already explained it very well. Anyway, to put it simple adding ? instructs it to go in the non-greedy mode...

Slime
Lunatic (VI) Mad Scientist

From: Massachusetts, USA
Insane since: Mar 2000

posted posted 08-14-2001 21:00

BTW, one little part of that thing Max gave can be simplified just a bit...

('

lallous
Paranoid (IV) Inmate

From: Lebanon
Insane since: May 2001

posted posted 08-19-2001 11:42

Thanks mr.Max and Slime.

I did some extra readings about RegExps..and I think I'm good. Therefore i ask you to ask me a good/hard RegExp question (to see if i still need more to learn about 'em)

Slime
Lunatic (VI) Mad Scientist

From: Massachusetts, USA
Insane since: Mar 2000

posted posted 08-19-2001 16:17

Hmm...

write a regexp that will parse the [ url ] UBB code thing. You know, it will turn:

[ url=www.karl.nu/slime/ ]slime's page[ /url ]
(without the spaces inside the brackets)

into:

<a href="http://www.karl.nu/slime/">slime's page</a>

Notice that it should add http:// if there's no transfer protocol selected.

lallous
Paranoid (IV) Inmate

From: Lebanon
Insane since: May 2001

posted posted 08-19-2001 18:11

I would also like to know how to do this with the .replace method of RegExp.

code:
txt = ["[ url=http://www.karl.nu/slime/ ]slime's page[ /url ]",
"[url=ftp://ftp.cdrom.com]CDROM's FTP site[/url]",
"[ url=www.ozoneasylum.com]Cool place! [/url]"];
txt = txt.join('');
reObj = new RegExp("\[[ ]*?url=([^:]+:\/\/)?([^\] ]+)[ ]*?](.+?)\[[ ]*?\/url[ ]*?\]", "ig");
resultArr = reObj.exec(txt);
while (resultArr != null)
{
link = "<a href='" + (resultArr[1] == '' ? 'http://' : resultArr[1]) + resultArr[2] + "'>" + resultArr[3] + "</a>";
document.write(link + "<br>");
resultArr = reObj.exec(txt);
}



lallous
Paranoid (IV) Inmate

From: Lebanon
Insane since: May 2001

posted posted 08-21-2001 08:45

Slime, mr. Max, anyone?
I still got an unanswered question that I'm sure at least one can answer it!

lallous
Paranoid (IV) Inmate

From: Lebanon
Insane since: May 2001

posted posted 09-04-2001 11:34
code:
txt = [
"You can visit [ url=http://www.karl.nu/slime/ ]slime's page[/url] for cool javascript games!<br>",
"And you can also visit [ url=http://www.max.co.yu]MAX's page[/url] to download the latest version of his nice freeware HTML editior<br>",
"Special thanks goes to [ url=www.ozoneasylum.com]The OzoneAsylum[/url] for all the nice help!"
];
txt = txt.join('');
pat = /\[ url=([^:]+:\/\/)?([^\] ]+)[^\]]*\](.+?)[\[]\/url[\]]/ig;
document.write(txt.replace(pat, '<a href="$1$2" target="_blank">$3</a>'));


one more question!
the $1 is empty at www.ozoneasylum.com and it gets replaced as: <a href="www.ozoneasylum.com">

is there is a way with regexps to set a default value to the unmatched sub-pattern?

I mean if the $1 is not matched then a default value should be assigned to it?

« BackwardsOnwards »

Show Forum Drop Down Menu