Closed Thread Icon

Topic awaiting preservation: Need help with SQL Pronto!!!!!! Please... (Page 1 of 1) Pages that link to <a href="https://ozoneasylum.com/backlink?for=29869" title="Pages that link to Topic awaiting preservation: Need help with SQL Pronto!!!!!! Please... (Page 1 of 1)" rel="nofollow" >Topic awaiting preservation: Need help with SQL Pronto!!!!!! Please... <span class="small">(Page 1 of 1)</span>\

 
paritycheck
Bipolar (III) Inmate

From: you tell me
Insane since: Mar 2004

posted posted 01-14-2008 12:27

Hi guys need help Pronto right here... I have ok two tables. One is basically an email repository and contains email messages - teh other table is an address book and well just contains email addresses.

Ok my email repo table has like 4 columsn called

|to|from|cc|bcc|

Now these columns contain email addresses either one or more separated by commas in the form.. email@email.com, email2@email.com, email3@email.com

However my address book has one column which contains just one email...

WHat I need to do is to figure a sql statement through which I can find out which email addresses in the emailrepos to, from, bcc, cc columns do NOT exist in the addressbook. In this case the tricky part where I'm stuck is that I'm comparing the existance of an element which on its own exists as a group of comma seperated elements against a table which has one element per column...

I need some help on this real quick guys... the address book in this case is HUMONGOUS so scratch the idea of trying to compare the other way around..this doesn't appear to be something that a simple nested query or a WHERE NOT IN statement woudl fix.. Any ideas guys

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 01-14-2008 13:14

yeah. Don't do this in pure sql.

Instead use your favourite scripting language to seperate the commas into
real values, then do a huge 'not in' query .


Otherwise: That is a stupid, foobared database layout. The comma seperated thing belongs into text files,
not into a relational database schema. If you want to store e-mails in a relational db, it would have been a good idea
to store those 1:n fields in a seperate table (and if it must, encode a 'type' to have just one table for
to|from|cc|bcc instead of four). Then this would be actually supported and fast on your database engine...

argo navis
Paranoid (IV) Inmate

From: Switzerland
Insane since: Jul 2007

posted posted 01-14-2008 13:42

Have to disagree, having worked on large-large scale databases very recently : pre-fetch as much as possible inside the database engine itself.
(we had an SQL 2000 DB weighing some 500MB which I had to migrate to 2005 for the United Nations, and a terabyte-bound one
in another assignment for the UE police - NDA but it was about "product tracking" - coughcough)

You can use the "LIKE" sql clause to fetch a value inside a field - for one.

Google SQL + LIKE and get back here - can you use stored procedures and thus, compile them some?
Is it possible to disclose the type of database in use? Sybase and MS SQL are close "friends", Oracle is different, I know
the first ones best, but I've also been tweaking some Oracle stuff.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 01-14-2008 14:11

I' assumed it to still be parity's roundcube project.

Either he has a gazillion users - in which he'd better normalize his data (model a real 1:n relationship with
a secondary table) - and then your 'prefetching' just means 'let the db engine do it's job'.,

or it's just a few users - in which case 'solve in a few minutes using php' beats 'normalize your database & rework
your code', beats 'learn to use mysql string operations to split it into a temporary table (and don't forget to delete the table...)',
and certainly beats using 'like %%' and a full scan of the database - and that would be the wrong way around for his query anyhow .


Parity, either do this (quick, dirty, and will require a certain amount of round tripping between your application and the database)

code:
$arrMails = query_db_fetch_array( "select header_to, header_from... from mails where Whatever");
$arrAddresses = [];
foreach ( $arrMails as $arrMail)
{ 
  $arrTo = split($arrMail['to'],','); //and pray for merce there never is a , in an e-mail address... the rfc is to large for me to exclude it.
  foreach ( $arrTo as $strAddress)
    $arraddresses[$strAddress] = true;
}
$arrAddresses = array_keys($arrAddresses);
$arrAddresses = array_map('mysql_escape_string',$arrAddresses);
$strAddresses = add quoting to each field..., and implode on (',')
$strQuery = "select address, count(address) from addressbook where address in ({$strAddresses})";
...
do something for those with count == 0



or rework your system to have a seperate table of (varchar address,foreign key email_id),
and then do a regular old join operation (which would be the best way to go about this if you have the time)

argo navis
Paranoid (IV) Inmate

From: Switzerland
Insane since: Jul 2007

posted posted 01-14-2008 14:43

What were Tao's words already? "Mellowest prowess mister Prime". [off topic]I'd certainly like to collaborate
with someone who differs from my views and complements them in many ways, let me know about that rfc ,)[/off topic]

paritycheck
Bipolar (III) Inmate

From: you tell me
Insane since: Mar 2004

posted posted 01-15-2008 12:06

Hmmm.... I had a feeling this would be kinda out of my SQL league. For now I've just done a quick and dirty solution by selecting ALL of the emails in the addressbook - upon closer inspection I noticed that the total number of records surmounted above a few thousand and emails were a few hundred in the list... but this again is just a quick fix to show something to my boss in the extremely limited time span he's given me ...

Right now I'm just selecting all emails in teh addressbook dumping them into an array and selecting ALL the emails in teh 4 columns of the messsage repo and dumping that into another array after passing them through umpteen explodes and trim functions and finally running an array_diff on them both to get the nonexisting emails....

It should work for now however if the db records top a few thousands - theeeeen I guess we'll be back to the old drawing board again..... but still I would be interested in finding out hwo this would be accomplished in a more... lets say .... conventional way ...

argo navis
Paranoid (IV) Inmate

From: Switzerland
Insane since: Jul 2007

posted posted 01-15-2008 12:54

I understood the original request incorrectly, but :

SELECT COUNT(*) FROM table_adressbook WHERE emailadress LIKE 'email_in_other_table'

If this returns 0, "email_in_other_table" is nowhere to be found inside the adress book.
Simple.

I recommend you do two things :
- check documentation about the DB engine in use (how to optimize it namely, using indexes optimizes the data files processing differently on different DB engines
and generally is key to optimisations up to 400-500% - this and preparing stored procedures that are precompiled inside the DB if applicable).
- learn the in out of SQL DML (Data Manipulation Language - subset of SQL)
- additionaly, you can grab a book about Merise or UML database modelling, or relational algebra (that one is a bit complex
for an approach like yours - it's great, but it can get tricky).

It should get you "in shape" for quick adaptation to many database technologies (although they all have their specialties -
the only ones which are "close" are Sybase and MS SQL in my experience, at some point they were the same product).

Fwiw.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

posted posted 01-15-2008 13:06

huh - why to arrays parity?

you have a (small) number of emails - extract their addresses. Then query against your addressbook with where address in (series,of,email,addresses,from,the,mails).
Since you're doing equality comparison, you don't need like and can use a string index on the addressbook table's address field - and you get off with two queries, both of
which can use an index.

Only drawback is that you can't do it totaly in the database, otherwise this will hold up quite a while in terms of performance ( trouble starts when you either query
so many emails that the in(...) is larger than the largest allowed query, or when your addressbook's so huge that the index doesn't help anymore... but that's
a far way off).

argo navis
Paranoid (IV) Inmate

From: Switzerland
Insane since: Jul 2007

posted posted 01-15-2008 14:10

I am not interested in any form of rivalry in my life - I am interested in trying to give options
when a question is asked - in maximizing options for me and others considered "of interest" to be specific, and that's all.

quote:

I would be interested in finding out hwo this would be accomplished in a more... lets say .... conventional way ...



TP's tip is good from a strictly technical perspective, but I am stroking the SQL way because of this very question :
conventional == (no recording with multiple entries separated by commas, that reflects a major DB design problem - email should be entities referred
to by some structure in the db as pointed out earlier).

But given that the db structure problem is there, the statement above, adapted to your SQL engine and db fields, is the "conventional way".
TP's alternative is a good workaround for the time being.

My best advice when trying to limit bugs is : do not let them occur in first place (as much as possible).

There are two approaches that are working, these days, as far as software design is concerned :
- Agile - Analyse, do, test, fiddle, adapt and start over again. Makes for flexibility.
- Try to get the plan covered as much as possible. Makes for robustness.

I have fostered both as "the best way" in the past and was wrong : different problems, different approaches.
But in DB design, and related application design, you'll spare yourself maintenance time by starting off on the right foot.

My 2 cents.

paritycheck
Bipolar (III) Inmate

From: you tell me
Insane since: Mar 2004

posted posted 01-15-2008 14:17

Well.. the thing is that since it is the same roundcubemail project... and thats why I'm actually trying to minimise if not avoid tampering with the roundcube db for fear of messing it somewhere else... apart from that I might give it a go on another version nwo that I've got tortoiseSVN up and running

argo navis
Paranoid (IV) Inmate

From: Switzerland
Insane since: Jul 2007

posted posted 01-15-2008 14:38

Completely understood - what you got here are "two different fishes", both crispy and tasty, a net, and the friggin' manual to go fishing next time around.

« BackwardsOnwards »

Show Forum Drop Down Menu