Topic: Scraper not working Pages that link to <a href="" title="Pages that link to Topic: Scraper not working" rel="nofollow" >Topic: Scraper not working\

Author Thread
Obsessive-Compulsive (I) Inmate

Insane since: Jul 2011

IP logged posted posted 11-09-2011 22:07 Edit Quote

I'm not getting any errors. It just isn't working.


$html = file_get_contents("");

<div class="listing_content">.*?
	<h3 .*?>
		<a .*?>(.*?)<\/a>
	<span class="listing-address adr">
		<span class="street-address">(.*?)<\/span>
		<span class="city-state">
			<span class="locality">(.*?)<\/span>,
			<span class="region">(.*?)<\/span>
			<span class="postal-code">(.*?)<\/span>
	<span class="business-phone phone">(.*?)<\/span>.*?
	<li><a href="(.*?)">/s',


foreach ($posts as $post) {
$listing['title'][] = $post [ 1 ]; 

$listing['street'][] = $post [ 2 ]; 

$listing['city'][] = $post [ 3 ];
$listing['state'][] = $post [ 4 ]; 

$listing['zip'][] = $post [ 5 ]; 

$listing['phone'][] = $post [ 6 ]; 

$listing['website'][] = $post [ 7 ]; 

    // do something with data
	echo  $post [ 4 ];



(Edited by PixelMamma on 11-09-2011 22:08)

(Edited by PixelMamma on 11-09-2011 22:11)

(Edited by PixelMamma on 11-09-2011 22:12)

edit Tyberius prime: added code tags

(Edited by Tyberius Prime on 11-10-2011 16:55)

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 11-10-2011 16:59 Edit Quote

If you see no errors, you should see an empty array being output - that means your regular expression is not matching.
You'll need to take it apart until you have something that does match, then readd and find the difference.

But honestly, a regular expression is not the right tool for this job (html can not be reliably parsed with a regular expression - you need a more powerful parser), and the one you sketched out is terrible sensitive to the whitespace between tags (and I'd bet even money on that that's also the reason it is not matching).

See for some suggestions on how to do this in a robust manner.

so long,

->Tyberius Prime

Nervous Wreck (II) Inmate

Insane since: Jul 2011

IP logged posted posted 11-14-2011 16:21 Edit Quote

TY, its working now

Post Reply
Your User Name:
Your Password:
Login Options: Remember Me On This Computer
Your Text:
Options: Show Signature
Enable Slimies
Enable Linkwords

« BackwardsOnwards »

Show Forum Drop Down Menu