Closed Thread Icon

Topic awaiting preservation: Hidden url finder on remote server with cURL slow timout (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=27781" title="Pages that link to Topic awaiting preservation: Hidden url finder on remote server with cURL slow timout (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: Hidden url finder on remote server with cURL slow timout <span class="small">(Page 1 of 1)</span>\

 
Sam
Bipolar (III) Inmate

From: Belgium
Insane since: Oct 2002

posted posted 04-13-2006 16:02

Hi all,

I've written a php script to find a hidden url on a remote server.
the hidden url is in the range:

http://www.dummy.com/000001.htm to http://www.dummy.com/999999.htm

I thought the best way was to write a php script which validates every link...

This is what I've got so far

code:
<html>
<?php
   function url_exists($strURL) { 
    $resURL = curl_init(); 
    curl_setopt($resURL, CURLOPT_URL, $strURL); 
    curl_setopt($resURL, CURLOPT_BINARYTRANSFER, 1); 
    curl_setopt($resURL, CURLOPT_HEADERFUNCTION, 'curlHeaderCallback'); 
    curl_setopt($resURL, CURLOPT_FAILONERROR, 1); 

    curl_exec ($resURL); 

    $intReturnCode = curl_getinfo($resURL, CURLINFO_HTTP_CODE); 
    curl_close ($resURL); 

    if ($intReturnCode != 200 && $intReturnCode != 302 && $intReturnCode != 304) { 
       return false; 
    }Else{ 
        return true ; 
    } 
   } 

for($i = 0; $i < 1000000; $i++)
{
   $j=str_pad($i, 6, "0", STR_PAD_LEFT);
   $url="http://www.dummy.com/".$j.".htm";
   If(url_exists($url)) {
      Echo"$url";
      exit;
   }Else{ 
    Echo""; 
   } 
}
?>
</html>


Guess what, it's slow and doesn't finish because of script timeouts.
1 - Is this the way to go?
2 - Can I make this script faster?
3 - Will it generate too much traffic on my webserver and will I get into trouble with my provider?

Thanks in advance,
Sam

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 04-13-2006 17:17

This might work, but hitting 10^6 pages ain't gonna be ever fast.

Now, I don't even want to know why there's this hidden url around... nayhow, I'd advise on doing that only against your own server - you're bound to generate a lot of traffic.

All right, I'd see if the server suppered HTTP HEAD instead of HTTP GET (check with a known url, read the rfc if yo must),
I'd probably dump curl in this and do the talking directly on a tcp socket on port 80, but that' just an off chance.

Otherwise, you probably should be running this in serveral threads (or server php page calls), just to lower the latency.
See... the actuall sending of data is neglible in what you are doing - but it simply takes half a second to establish a new tcp connection, and ask the server wether there's a page there. But you could probably do 50 of them at the same time.

Oh, and I'd so put a known good url in there just to check that your code would actually print on the found url .

so long,

->Tyberius Prime

« BackwardsOnwards »

Show Forum Drop Down Menu