Closed Thread Icon

Topic awaiting preservation: Scraper not working (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=32442" title="Pages that link to Topic awaiting preservation: Scraper not working (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: Scraper not working <span class="small">(Page 1 of 1)</span>\

 
PixelMamma
Obsessive-Compulsive (I) Inmate

From:
Insane since: Jul 2011

posted posted 11-09-2011 22:07

I'm not getting any errors. It just isn't working.

code:
<?

$html = file_get_contents("http://www.yellowpages.com/fort-lauderdale-fl/acupunture");

preg_match_all(
    '/
<div class="listing_content">.*?
	<h3 .*?>
		<a .*?>(.*?)<\/a>
	<\/h3>
	<span class="listing-address adr">
		<span class="street-address">(.*?)<\/span>
		<span class="city-state">
			<span class="locality">(.*?)<\/span>,
			<span class="region">(.*?)<\/span>
			<span class="postal-code">(.*?)<\/span>
		<\/span>
	<\/span>
	<span class="business-phone phone">(.*?)<\/span>.*?
	<li><a href="(.*?)">/s',
    $html,
    $posts,
    PREG_SET_ORDER 
);


$listing=array();

foreach ($posts as $post) {
	
$listing['title'][] = $post [ 1 ]; 

$listing['street'][] = $post [ 2 ]; 

$listing['city'][] = $post [ 3 ];
  
$listing['state'][] = $post [ 4 ]; 

$listing['zip'][] = $post [ 5 ]; 

$listing['phone'][] = $post [ 6 ]; 

$listing['website'][] = $post [ 7 ]; 

    // do something with data
	
	echo  $post [ 4 ];
}


print_r($listing)


?>



(Edited by PixelMamma on 11-09-2011 22:08)

(Edited by PixelMamma on 11-09-2011 22:11)

(Edited by PixelMamma on 11-09-2011 22:12)

edit Tyberius prime: added code tags

(Edited by Tyberius Prime on 11-10-2011 16:55)

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 11-10-2011 16:59

If you see no errors, you should see an empty array being output - that means your regular expression is not matching.
You'll need to take it apart until you have something that does match, then readd and find the difference.

But honestly, a regular expression is not the right tool for this job (html can not be reliably parsed with a regular expression - you need a more powerful parser), and the one you sketched out is terrible sensitive to the whitespace between tags (and I'd bet even money on that that's also the reason it is not matching).

See http://stackoverflow.com/questions/3650125/how-to-parse-html-with-php for some suggestions on how to do this in a robust manner.

so long,

->Tyberius Prime

PixelMamma
Nervous Wreck (II) Inmate

From:
Insane since: Jul 2011

posted posted 11-14-2011 16:21

TY, its working now

« BackwardsOnwards »

Show Forum Drop Down Menu