Closed Thread Icon

Topic awaiting preservation: Regex (again) (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=30865" title="Pages that link to Topic awaiting preservation: Regex (again) (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: Regex (again) <span class="small">(Page 1 of 1)</span>\

 
redroy
Paranoid (IV) Inmate

From: 1393
Insane since: Dec 2003

posted posted 02-20-2009 17:26

Hello everyone... I know I only show up when I need help but I figure I'll just keep tapping your knowledge until you say when.

I'm looking to do a preg_replace on a string where all human visible instances of the match are replaced that do not fall within an HTML tag.

For example:

code:
<p><a href="test.html" title="Test titles">Test your skills</a> and test some more</p>

Doing a desired preg_replace on the above code snippet for "test" I would want to only replace the 3rd and 4th occurrence (nothing within an HTML attribute).

I found a bit of regex that will find ALL matches that are not within HTML (as I'm wanting) but I don't understand it enough to update to then only find specifics (as opposed to ALL).

code:
(?<=^|>)[^><]+?(?=<|$)

Any help/pointers would be much appreciated... peace.

Arthurio
Paranoid (IV) Inmate

From: cell 3736
Insane since: Jul 2003

posted posted 02-21-2009 03:28

This is really a very simple thing conceptually and IMO not suitable for regex.

You can probably get better performance out of a simple algorithm that will loop over the text, check if there's an "<", ignore until ">" and concatenate the rest (but not character by character).

reisio
Paranoid (IV) Inmate

From: Florida
Insane since: Mar 2005

posted posted 02-21-2009 04:07

http://htmlparsing.icenine.ca/doku.php
http://oubliette.alpha-geek.com/2004/01/12/bring_me_your_regexs_i_will_create_html_to_break_them
?

redroy
Paranoid (IV) Inmate

From: 1393
Insane since: Dec 2003

posted posted 02-21-2009 19:04

Thanks for the pointers... those links are just what I needed. Peace my brothers.

redroy
Paranoid (IV) Inmate

From: 1393
Insane since: Dec 2003

posted posted 02-24-2009 00:07

For those who may want to know or critique here's what I came up with (seems to be working properly):

code:
// Find match
preg_match("/" . preg_quote($keyword) . "/Uis", strip_tags($string), $match);
if(!empty($match[0]))
{
	// Highlight entries		
	$length = strlen($string);
	$new_string = "";
	$cur_start = 0;
	while($cur_start < $length)
	{
		$html_start = strpos($string, "<", $cur_start);
		$html_end = ($html_start !== false) ? strpos($string, ">", $html_start) : NULL;
		$inside_html = ($html_start !== false) ? (($cur_start >= $html_start && $cur_start <= $html_end) ? true : false) : false;
		
		$cur_end = ($html_start !== false) ? (($inside_html) ? $html_end : $html_start) : $length;
		$cur_string = substr($string, $cur_start, ($cur_end - $cur_start));
		
		$new_string .= ($inside_html) ? $cur_string : preg_replace("/" . preg_quote($keyword) . "/Uis", "<span style=\"background-color: #ffff66;\">$0</span>", $cur_string);
		$cur_start = $cur_end;
	}
	$new_data = $new_string;
}



« BackwardsOnwards »

Show Forum Drop Down Menu