Closed Thread Icon

Topic awaiting preservation: Problem with parsing remote xml files (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=27694" title="Pages that link to Topic awaiting preservation: Problem with parsing remote xml files (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: Problem with parsing remote xml files <span class="small">(Page 1 of 1)</span>\

 
Moon Shadow
Paranoid (IV) Inmate

From: Rouen, France
Insane since: Jan 2003

posted posted 03-26-2006 17:14

Hi everybody,

Ok let me explain the situation. What I want to do is to parse several remote xml files with php, process the data and then output it with a nice layout. To do that, I set up a xml parser both from php.net examples and a parser Emp wrote some time ago and adjusted it to suit my needs. It is running fine now and the data is processed correctly, I have no problems with this side.

What is worrying me is that 3 times out of 4, I get only a part of the xml file. For example, when I to process a file like that, what I get is almost always the three first arrays with a missing tag at the end :

code:
Array
(
    [0] => Array
        (
            [name] => Armania
            [race] => 8
            [class] => 7
            [level] => 8
            [map] => 1
            [zone] => 14
            [ping] => 5
            [ip] => 62.147.129.134
        )

    [1] => Array
        (
            [name] => Nanami
            [race] => 5
            [class] => 8
            [level] => 11
            [map] => 0
            [zone] => 85
            [ping] => 28
            [ip] => 83.112.234.13
        )

    [2] => Array
        (
            [name] => GlenoXx
            [race] => 4
            [class] => 11
            [level] => 21
            [map] => 1
            [zone] => 17
            [ping] => 9
            --- note : missing a tag here ---
        )

)



Typically, the files are located on remote user computers, so I have to access them with urls like http://server:port/stat.xml. My first thought was that the remote server was closing the connection and that it prevented my script to parse the whole file. But since the script manages to parse the whole file some times, I don't think this is the real reason.

Here's the function I am using :

code:
function run_xml_parser ($xml_file, $start_tag_function, $end_tag_function, $data_function) {		
		
	  	$xp = xml_parser_create();

		xml_set_element_handler ($xp, $start_tag_function, $end_tag_function);
		xml_set_character_data_handler ($xp, $data_function);
		xml_parser_set_option ($xp, XML_OPTION_CASE_FOLDING, FALSE);
		xml_parser_set_option ($xp, XML_OPTION_SKIP_WHITE, TRUE);
	
		$file = fopen ($xml_file, "rb");
		
		while ($xml = fread ($file, 4096)) {
			if (!xml_parse ($xp, $xml, feof ($file))) {
				die("XML parser error: " . xml_error_string (xml_get_error_code ($xp)));
		  	}

		}

		xml_parser_free ($xp);
		
	}



Anyone knows what I am doing wrong ?

----
If wishes were fishes, we'd all cast nets.

(Edited by Moon Shadow on 03-26-2006 17:18)

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 03-26-2006 17:51

yeah.. you're not concatinating the data you read, placing half of an xml string probalby right in the middle of <tags> in there...

try tho following (won't work for huge xml files... but anything less < 2mb should be fine)

code:
$file = fopen ($xml_file, "rb");
$xml = '';
while ($buffer = fread ($file, 4096)) {
 $xml .= $buffer;
}
if (!xml_parse ($xp, $xml, feof ($file))) {
    die("XML parser error: " . xml_error_string (xml_get_error_code ($xp)));
}


in your inner section...

Moon Shadow
Paranoid (IV) Inmate

From: Rouen, France
Insane since: Jan 2003

posted posted 03-26-2006 21:19

Ohhh thanks TP for pointing that out

Now the data is concatenated and processed correctly for almost all the files, but I still have some problems. Sometimes, the function stops reading a xml file after the first 4096 bytes. Of course there is still data after the 4096 bytes... And the file size is way below 2mb.

What I don't get is that the error will happen for some file, and when I reload the page, the same file is processed perfectly...

I'm a bit confused now

Would you have an idea about what's going wrong ?

----
If wishes were fishes, we'd all cast nets.

bitdamaged
Maniac (V) Mad Scientist

From: 100101010011 <-- right about here
Insane since: Mar 2000

posted posted 03-26-2006 21:37

I believe TP's code won't parse anything more than the 4069 bytes of a file.

Just my quick interpretation of the fread docs and I could very well be wrong so you could try changing 4096 to something greater or

You have a couple of options the best probably being this:
while (!feof($file)) {
$contents .= fread($handle, 8192);
}


or the PHP 5+ version

$contents = stream_get_contents($handle);



.:[ Never resist a perfect moment ]:.

Moon Shadow
Paranoid (IV) Inmate

From: Rouen, France
Insane since: Jan 2003

posted posted 03-26-2006 23:28

Well, using that piece of code works the same... I still get partial xml files.

I tried using fread with a larger size, it didn't work. I also tried :

code:
$xml = fread ( $handle, filesize ($xml_file) );



And

code:
$xml = file_get_contents($xml_file);



Both did not work either. I am really beginning to think that this problem is bound to the remote computers... Most of the time the script works, but sometimes I get partial files for no apparent reason... Maybe they are closing the connection or something, preventing the script to retrieve the whole file and parse it.

I'll try to get more information about this problem... Perhaps another solution would be using an xslt parser, I'll look into that as a last resort. If you have other ideas about what is wrong, you are welcome.

Thanks again for the help so far

----
If wishes were fishes, we'd all cast nets.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 03-27-2006 06:47

yeah... http requests can be like that.
but you should easily be able to see if you have a closing </whatever> tag in there and if in doubt retrieve the file again, shouldn't you?

Bitdamaged:
While (!feof($handle)) is a bad idea. You see, feof returns false if $handle isn't valid. So you'd have to turn that into while ($handle && !feof($handle)) to be safe from endless loops.

So long,

->Tyberius Prime

Moon Shadow
Paranoid (IV) Inmate

From: Rouen, France
Insane since: Jan 2003

posted posted 03-29-2006 00:31

I tried that as a last resort :

code:
$max_loops = 5;
$loops = 0;
		
while ( (strpos($xml, "</stats>") === false) && ($loops < $max_loops)) {
			
	$handle = fopen ($xml_file, "rb");
	$xml = '';
		
	while ($buffer = fread ($handle, 4096))
		$xml .= $buffer;
				
	$loops++;
	fclose($handle);
		
}



It seemed to improve the percentage of files correctly read... But not much, perhaps it is just my imagination

I contacted somebody running a server generating this kind of xml file. He told me that it was probably because the xml files I am trying to parse were edited very often. I dunnow... Anyway, it was just a personnal exercise to set up an xml parser, so I won't bother trying to make it work all the time.

Again, thank you guys for the help

----
If wishes were fishes, we'd all cast nets.



(Edited by Moon Shadow on 03-29-2006 00:33)

chex
Obsessive-Compulsive (I) Inmate

From:
Insane since: Apr 2006

posted posted 05-01-2006 11:57

Looks like I'm late, I thought maybe the file is being written to while you're reading it? Can you lock files on another server for reading?

Since it's a remote file could there be a problem with calling xml_parse-ing functions before the file is fully read?

« BackwardsOnwards »

Show Forum Drop Down Menu