Closed Thread Icon

Topic awaiting preservation: Database Question (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=24810" title="Pages that link to Topic awaiting preservation: Database Question (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: Database Question <span class="small">(Page 1 of 1)</span>\

 
hyperbole
Paranoid (IV) Inmate

From: Madison, Indiana, USA
Insane since: Aug 2000

posted posted 01-26-2005 18:11

Since we've been discussing databases a lot recently, I have another database question:

I have a database associated with a number of perl scripts used to collect data. There is a table called complete that the scripts use to store status information about the data as it's being collected.

I have been keeping track of the number of transactions per day for several months and noticed a small, but steady decline in the number of transactions per day.

Thinking about the structure of the database, I realized that I don't need to keep around all the information about the status of a record once all the information for that record has been collected. So I split the complete table into two tables. One is the collection_status table with all the flags and other information for each record. The other is completed.

I continue to store the collection status information for each record in the collection_status table while the information is being collected, but once all the data for a record has been collected, I delete the record from the collection_status table and move the ID of the record to the table completed where I only store the IDs. That way I can search the completed table to find out if a record has been completed before gathering its information again.

This one change has more than doubled the number of transaction per day the scripts can process. I made the change a couple of weeks ago, so I don't have enough data to determine if the processes will slow down as the size of the completed table grows.

My question is: After I created the new completed table and got rid of all the status information for the completed records, it occurred to me that I might have been able to get a corresponding increase in speed by simply adding an index to the ID column of the original complete table.

Does anyone know if this is true or would the extra data in each record have cause a slow down in the access time for the table?



.

-- not necessarily stoned... just beautiful.

InI
Maniac (V) Mad Scientist

From: Somewhere over the rainbow
Insane since: Mar 2001

posted posted 01-26-2005 19:22

The poster has demanded we remove all his contributions, less he takes legal action.
We have done so.
Now Tyberius Prime expects him to start complaining that we removed his 'free speech' since this message will replace all of his posts, past and future.
Don't follow his example - seek real life help first.

hyperbole
Paranoid (IV) Inmate

From: Madison, Indiana, USA
Insane since: Aug 2000

posted posted 01-27-2005 17:08

The application layer got a little more complicated because I now have to look in two places for records of completion (collection_status and completed) before I start a new record. That way I don't have more than one script working on gathering the same data at one time.

Other than that the application layer has stayed pretty much the same.

I think you are saying what I'm saying here: Before there was one large table with a lot of data per record. Now there is a small table (completion_status. not many records) with a lot of data per record; and a large table (completed, lots and lots of records) with very little data per record.

The database engine is MySQL.



.

-- not necessarily stoned... just beautiful.


(Edited by hyperbole on 01-27-2005 17:10)

hyperbole
Paranoid (IV) Inmate

From: Madison, Indiana, USA
Insane since: Aug 2000

posted posted 01-31-2005 17:11

I'm going to bump this thread beacuse I was hoping to get a reply from InI (or anyone else) after I answered the questions he asked about the structure of the db.



.

-- not necessarily stoned... just beautiful.


(Edited by hyperbole on 01-31-2005 17:12)

Tyberius Prime
Paranoid (IV) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 01-31-2005 17:21

if you simply had added an index to your id column in completed table (and that was not the primary key in the first place),
you probably would have seen a similar improvment.
The databases are pretty good in organizing data so that it does not have to read all of it to answer queries on indices.

« BackwardsOnwards »

Show Forum Drop Down Menu