Closed Thread Icon

Topic awaiting preservation: how best to parse large amounts of text? Pages that link to <a href="https://ozoneasylum.com/backlink?for=26476" title="Pages that link to Topic awaiting preservation: how best to parse large amounts of text?" rel="nofollow" >Topic awaiting preservation: how best to parse large amounts of text?\

 
Author Thread
gunder
Nervous Wreck (II) Inmate

From:
Insane since: Apr 2005

posted posted 08-18-2005 00:54

Hello everyone, I was wondering if anyone could give me some tips on how to parse large amounts of text. I play a strategy game through email, the turn report is sent to me, I write out my orders and send it back so on and so on. I normally just do this in notepad but I figured I could write a very basic client in javascript. There are clients already available but I would like to write my own for three reasons, the challenge, I don't really like any of the available clients and I can't install anything at work and I do most of it on my breaks while at work.

What I have in mind is a text box that I could paste my turn report into, hit a button, have it parsed and then display it in an easier to read fashion. I'm ok with creating the nicer display and everything, I'm just trying to find an easier way to parse the text. My current report is over 1500 lines long and getting longer each turn. Here is a small snippet of my report so you can see what I'm working with :

Faction Status:
Tax Regions: 4 (24)
Trade Regions: 6 (10)
Mages: 2 (2)

Errors during turn:
Dalesor Reavers (32264): MOVE: Unit has insufficient movement
points; remaining moves queued.

Events during turn:
Joss (377): Claims $100.
Mernic (1345): Claims $100.
Guards (6394): Gives 80 silver [SILV] to Fighters (6521).

That's just a small portion of the type of stuff I would be dealing with. I'm guessing it would be easiet to use indexOf() and split() but I'm a little lost as how to grab all the correct info. For example, under "Faction Status" there are only those three things, the only thing that would change is the numbers. The "Errors during turn" and "Events during turn" change constantly so how could make sure to grab all of the info each time and make sure that's all I'm grabing?

I'm sorry if this isn't making much sense, basically I just need to know the best way to parse large amounts of text. The book I have doesn't really cover it and I couldn't find anything too usefull through a google search. If anyone has any ideas I would really appreciate it.

-gunder

TwoD
Bipolar (III) Inmate

From: Sweden
Insane since: Aug 2004

posted posted 08-18-2005 03:11

I also deal with some pretty big amounts of text in some of my apps.
(I'm going through >150kb javascripts)

I've found Regular Expressions to be fast and efficient, along with indexOf and substring.

For the faction stuff you could use a RegExp.

code:
workText=textToParse
myTaxRegions1=[]
myTaxRegions2=[]
while((/Tax Regions: (\d+) \((\d+)\)/.test(workText)){
	myTaxRegions1.push(parseInt(RegExp.$1)) // add the first number to first list
	myTaxRegions2.push(parseInt(RegExp.$2)) // add second one to second list
	workText=RegExp.rightContext  // cut off the first part of workText since we've already checked there
}
workText=textToParse


Same goes for Trade Regions and Mages.

Grabbing the Error text can be grabbed with substring and indexOf.

code:
workText=textToParse
errorTexts=[]
errorMsg=""
chr=0
while(/Errors during turn/.test(workText.substring(chr))){
	start=workText.indexOf("Errors during turn:",chr)+20
	// find the start of the message, but skip past what we've already searched and compensate for message length
	errorMsg=workText.substring(start,workText.indexOf("\n\n",start))	// Extraxt a text string from the message to the empty row (\n\n)
	errorTexts.push(errorMsg) // Store the messages as whole chunks of text for more parsing
	chr+=errorMsg.length	// move the start position past what was just read
}



Same for Events during turn. But that text will require a bit more detailed parsing later to make the info useful.

There you've got the basic portions of the text sorted into arrays.
Then each index in the array should correspond with the turn number,
if I understand this correctly and each block of text like this represents a turn in the game.

/TwoD

gunder
Nervous Wreck (II) Inmate

From:
Insane since: Apr 2005

posted posted 08-18-2005 18:44

Thank you for the suggestion I think that will work pretty well for what I need. I'll play around and if I have any more specific questions I'll ask. Thanks for you help, I really appreciate it.

-gunder

« BackwardsOnwards »

Show Forum Drop Down Menu