Topic: Reading special characters with FileSystemObject Pages that link to <a href="https://ozoneasylum.com/backlink?for=27909" title="Pages that link to Topic: Reading special characters with FileSystemObject" rel="nofollow" >Topic: Reading special characters with FileSystemObject\

 
Author Thread
TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

IP logged posted posted 05-09-2006 19:41 Edit Quote

I'm trying to read resume.dat generated by uTorrent to get statistical info out of it.
But when I use readAll() and check how much has been read, it always stops at a certain character without any errors. (reading as ASCII)
I don' know which character it is, but I think it might be a white-space or newline character of some sort.

I've had this problem before with a file generated by a script and then read by it again. Then I could solve it by reading/writing it as unicode, but that's not possible now...

Tried searching for ways of getting around this, but failed...

/TwoD

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

IP logged posted posted 05-09-2006 20:34 Edit Quote

It would most likely be ascii(0) which tends to be treated as an EOF delimiter.

I am going to assume that you are using VB for this, and I will not be any help for you in that respect since I have never looked too deep into the API.

Dan @ Code Town

TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

IP logged posted posted 05-10-2006 00:20 Edit Quote

I'm using JavaScript, sorry I forgot to mention that...

Thanks for your comment, I've still had no luck reading the whole file though.

/TwoD

poi
Paranoid (IV) Inmate

From: Norway
Insane since: Jun 2002

IP logged posted posted 05-10-2006 00:36 Edit Quote

I guess you mean JScript, MS proprietary thing, as there is no FileSystemObject in JavaScript or ECMAScript.
I've tried a few times to load binary datas in JavaScript, with no success alas.

You'll certainly have to use another language to parse the binary datas into plain/text or text/javascript.

liorean
Bipolar (III) Inmate

From: Umeå, Sweden
Insane since: Sep 2004

IP logged posted posted 05-10-2006 00:48 Edit Quote

I suggest you check this out: Eric Lippert's Fabulous Adventures In Coding: Binary Files and the File System Object Do Not Mix

--
var Liorean = {
abode: "http://web-graphics.com/",
profile: "http://codingforums.com/member.php?u=5798"};

TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

IP logged posted posted 05-10-2006 10:14 Edit Quote

Poi: Sorry for being so sloppy with the terms, but I think they too refer to it as JavaScript sometimes...

Thanks to a comment to the blog entry liorean posted, I discovered there are more than one way of reading files. (I subscribe to the feed, but somehow missed that entry, and I didn't see it when searching...)

After applying google-fu to this info and reading about its uses, I simply fired up an ADODB Stream to get what I want!
(I've always found the msdn library to be a good reference, but only when you already know where things in it are located.)

Here's what I do:

code:
var str = new ActiveXObject("ADODB.Stream"); // Won't work if run from a browser because of a vulnerability fix (throws an "..can't create object" error), but I don't care since it won't be running in a browser...
	str.Type = 2; //adTypeText
	str.Open();
	str.loadFromFile(".....uTorrent\\resume.dat");
	str.Charset="ASCII"
	var txt=str.readText(); 
	str.close();
	str = null;
	txt=txt.substring(2); // Remove the two first Byte Order Mark characters.
	alert(txt) // Had me fooled at first because the text was too long to fit in an alert. But txt.length assured me I had the whole file.


I think this code snippet might become very useful in future script too. Too bad there isn't a non-MS-standard for file access out there

/TwoD

WarMage
Maniac (V) Mad Scientist

From: Rochester, New York, USA
Insane since: May 2000

IP logged posted posted 05-10-2006 15:02 Edit Quote

My non-MS-standard is don't use MS.

Dan @ Code Town

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-10-2006 15:26 Edit Quote
quote:

Too bad there isn't a non-MS-standard for file access out there



There is. Java and LiveConnect. But then you have to compile a little I/O class that provides javascripters with the two public methods,
readfile and writefile.

And you also have to trick the JVM into thinking it's writing/reading to/from an url instead of a local file system (so don't use FileInputStreams,
use URL objects and theyre streams instead).

It's discussed here for instance: http://www.webdeveloper.com/java/java_jj_read_write.html

You can copy/paste/compile the code. If it has to run locally as an applet, you should build the URL from the local file system,
otherwise, from any web domain. I think, but am not sure, that a good way to return this would be returning a string.
Maybe a char array.

Anyhow, this will work on any browser where js/java communication is enabled. (and this also solves locales issues, as Java
should handle the locales transparently, and provides you with hundreds of ways to handle this on your own)

(Edited by _Mauro on 05-10-2006 15:29)

TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

IP logged posted posted 05-10-2006 16:29 Edit Quote
quote:

_Mauro said:

There is. Java and LiveConnect. But then you have to compile a little I/O class
that provides javascripters with the two public methods,readfile and
writefile.
...



Yup, I've used Java to complement JS before, but this time that's not an option since I'm not running it in a browser (note the comment in the code lol).
I'm running it in Samurize, which uses IE's engine to execute either VB- or JScript. I do have the option to write a plugin in whichever language I know, but I prefer doing it with scripts since they are easy to modify. And I know all the users run the code in the same enviroment, so I don't have to worry about cross-browser compatibility.

/TwoD

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-10-2006 16:43 Edit Quote

Well, IE's script engine also is called the "Windows scripting Host", so it should have the same capabilities in samurize as the engine when run in IE.
Eg. If you can connect to ActiveX objects (like FSO), connecting to java objects should be far easier.

Furthermore, such an easy java snippet can be made java 1.1 compatible and therefore, compatible with the dreaded MS crippled VM,
making it compatible with 90% of Windows systems worldwide.

Then of course, the choice is up to you, but giving this a try doesn't take long.
my 2 cents.

TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

IP logged posted posted 05-10-2006 23:01 Edit Quote

Yes, I don't doubt I could do that, but I'll try to keep it to a single file and language as long as possible.
I don't know the exact specifications for the engine when it runs in Samurize, other than I can't keep data in memory between function calls, it runs the engine once for each function call (I think), and the returned value is used as a meter value in Samurize. And that I don't have access to any kind of DOM-like interface...

Think I'll try your approach when I've got this one up and running.

The only problem I have now is displaying a hash correctly.
The hash looks like this:
(0x)76 DB CA D8 E9 B2 04 6C 05 D4 EB 03 D8 07 70 4A 54 76 75 82
in uTorrent, and is stored like that in resume.dat. (confirmed with a HEX editor)
But I read it as ASCII, which is then automatically converted to Unicode (if I'm not totally mistaken) by JS. Somewhere this goes wrong so I can't convert it back and display the hash as it should look.
When I loop through the stored string to get the character codes, this is what I get:
118 91 74 88 105 50 4 108 05 84 107 3 88 7 112 74 84 118 117 2
That these are the values returned by myString.charCodeAt(i) for each character in the JS String.

Some of these values I can use, as they can directly represent the HEX value, but some make no sense to me. Like the last character (0x82 / char code 2)...

This is what the whole hash looks like when opened in Notepad: vÛÊØé²lÔëØpJTvu? (I wonder how the Asylum will treat it :/)
And here's the char code values I get if I manually put in in the code and use charCodeAt() on that string:
118 219 202 216 233 178 04 108 05 212 235 03 216 04 112 74 84 118 117 8218

Anyone who knows how to fix this, I'm starting to get confused about all these numbers...
Probably has something to do with the reading of ASCII above code 127...

Nurse! Meds please, the numbers are attacking me again!!

/TwoD

poi
Paranoid (IV) Inmate

From: Norway
Insane since: Jun 2002

IP logged posted posted 05-11-2006 20:28 Edit Quote

no problem about the confusion between JScript and JavaScript

Regarding your assumption about unicode encoding, if the file is indeed treated as a unicode string, the size of the string should half of that of the original file. If you could get consistent results for several files of various size, then you could run a small loop to convert the string into an array, i.e:

code:
var	UInt8Array = []
while( unicodeVersion.length )
{
	var charCode = unicodeString.charCodeAt(0);
	UInt8Array.push( charCode&255, charCode>>8 );
	unicodeString = unicodeString.substr( 1 );
}

Hope that helps

TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

IP logged posted posted 05-12-2006 00:41 Edit Quote

WOOT! Thanks Poi!
Now I can read the file in Unicode and split up the characters with your code prior to parsing them. Then I just use charCodeAt(x).toString(16) to get the HEX back for the hashes!

*sigh* why do I always forget JS can do bitshifting... maybe I don't use it often enough..

Treating it as Unicode all the way does make more sense, guess I gave up on that too soon when all I got was garbage from the file in Unicode...

/TwoD

liorean
Bipolar (III) Inmate

From: Umeå, Sweden
Insane since: Sep 2004

IP logged posted posted 05-12-2006 02:17 Edit Quote
quote:

_Mauro said:
Well, IE's script engine also is called the "Windows scripting Host", so it should have the same capabilities in samurize as the engine when run in IE.Eg.

Actually, no. Iew is one host for the JScript engine, and WSH is another host for the same scripting engine. Note that the scripting engine doesn't have any built in I/O, no networking, no security concept, and in fact no host objects except for the built ins. All that comes from the host environment.

(Built ins means those objects that are part of the actual language - which means basically just those in the ECMAScript spec - not those added by the host environment)

Iew and WSH provide the JScript engine with very different host environments. So does Windows Script Components and ASP too. In each of these host environments, the scripting engine is the same, but the host objects may be entirely different, may work in different ways below the hood, may have different security provisions or security systems etc.

--
var Liorean = {
abode: "http://web-graphics.com/",
profile: "http://codingforums.com/member.php?u=5798"};

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-12-2006 08:24 Edit Quote

Thanks for the correction, am still confused about the Jscript "ActiveXObject" though, and if it is internal to the Jscript engine, as opposed to ECMAscript, then it should be possible to use it anyway to get a Java VM instance.

poi
Paranoid (IV) Inmate

From: Norway
Insane since: Jun 2002

IP logged posted posted 05-12-2006 09:25 Edit Quote

TwoD: what ? it works ? the only time I had to use that was for the currency conversion widget. For some mysterious reasons the IMF returns the TSV currency table in unicode. But that trick failled miserably when I tried to apply it on pure binary files loaded with an XHR. JScript being more tied to the OS than JavaScript, it seems logical ... for some extended values of 'logical'

_Mauro
Maniac (V) Inmate

From:
Insane since: Jul 2005

IP logged posted posted 05-12-2006 09:29 Edit Quote

Hey, found a nifty widget while fiddling. Some js based java vm detection plus the java vm instanciated as an ActiveX so yeah,
wether the WSH or IE are present (or not), java seems to remain accessible:

http://forum.java.sun.com/thread.jspa?threadID=536912&messageID=2597230

TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

IP logged posted posted 05-12-2006 14:37 Edit Quote

Yuppers, it sure does Poi.

Here's the function I use for testing.
It's supported by a few conversion functions for dates, times, sizes etc, and also an algorithm to Bdecode the file contents.

code:
function decodeData(){
	// Read resume.dat with Unicode
	try{
		var str = new ActiveXObject("ADODB.Stream");
		str.Type = 2; //adTypeText
		str.Open();
		str.loadFromFile("....\\uTorrent\\resume.dat");
		str.Charset="Unicode"
		var unicodeString=str.readText();
		str.close();
		str = null;
	}
	catch(er){return "ERROR"}
	// Split the Unicode characters, 8 bits rule!
	var UInt8Array = [],charCode;
	while( unicodeString.length )
	{
		charCode = unicodeString.charCodeAt(0);
		UInt8Array.push( charCode&255, charCode>>8 );
		unicodeString = unicodeString.substr( 1 );
	}
	for(i in UInt8Array){
	        UInt8Array[i]=String.fromCharCode(UInt8Array[i]);
	}
	var txt=UInt8Array.join("");
	UInt8Array=null;
	// Unserialize the object =)
	var decoder=new BDecoder();
	decoder.data=txt;
	var result=decoder.decodeEntry();
	// Find out what we got...
	var out="";
	for(i in result){
		out+=i+":\n"
		for(j in result[i]){
			switch(j){
		 		case "have": case"prio": case "peers":
  				continue;
			case "info":
				out+="\t"+j+": "+convertHash(result[i][j])+"\n";
	 			break;
			case "downloaded": case "uploaded": case "blocksize":
				out+="\t"+j+": "+convertSize(result[i][j])+"\n";
			        break;
			case "downspeed": case "upspeed":
				out+="\t"+j+": "+convertSize(result[i][j])+"/s\n";
			        break;
			case "added_on": case "time":
 				out+="\t"+j+": "+convertDate(result[i][j])+"\n"
			        break;
			case "runtime": case "seedtime":
 				out+="\t"+j+": "+convertTime(result[i][j])+"\n"
			        break;
			default:
				out+="\t"+j+": "+result[i][j]+"\n";
			}
		}
	}
	return out;
}

function convertHash(string){
	var hex="";
	for(var k=0;k<string.length;k++){
		hex+=string.charCodeAt(k).toString(16).toUpperCase();
	}
	return hex;
}



/TwoD



Post Reply
 
Your User Name:
Your Password:
Login Options:
 
Your Text:
Loading...
Options:


« BackwardsOnwards »

Show Forum Drop Down Menu