|
|
Author |
Thread |
Thumper
Paranoid (IV) Inmate
From: Deeetroit, MI. USA Insane since: Mar 2002
|
posted 05-05-2004 18:10
I'm off to discover how a search, wherewhich you specify a zip code or city and a distance radius, returns results that are within a given radius. Has anyone had to do something like this? And if so, where does one start? An example can be found on http://www.forsalebyowner.com . In this site, if you search for something on the home page, you are taken to a list of results where you can then fine tune the search with distance radius and sorting.
Any direction would be most appreciated. I've Googled for a while and will not quit just yet, but I figured a guru could zone me in faster.
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-06-2004 00:12
I've done it, don't have the time to go into detail right now, but basically what you need to do it right is a table with zipcodes and their basic x-y coordinates (taken from a known baseline).
Then you take the clients zipcode & grab the coordinates from the db and calculate the distance from the basline to get the centerpoint, then do the same with the results that match your other criteria and compare each one with the clients number. If the difference is within the radius, show the item.
That's about it.
The hard part is to get hold of the data to work with (zipcodes with corresponding coordinates), we did this locally in sweden and ended up buying the data from the government.
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
(Edited by DmS on 05-06-2004 00:16)
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-06-2004 01:48
I looked into this, but I couldn't find a way of doing it without some third party help. I had a bunch of links to some info, but, alas, don't have access to those links right now.
I *THINK* that this is one that I was looking at.
|
Thumper
Paranoid (IV) Inmate
From: Deeetroit, MI. USA Insane since: Mar 2002
|
posted 05-06-2004 03:17
Ah thanks guys...At least I now know what I'm up against! Seems like a bit more involved than I'd hoped, but it surely steers me in the right direction.
When I figure out the entire process from square one, I'll be sure to post it in the FAQ.
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-06-2004 10:04
I don't know what areas you need to search in but here's a link to what seems to be us-zipcodes as an sql-file for mysql: http://deanspace.org/download/zipcodes.mysql
This uses latitudes and longitudes so you need to calculate these correctly to get distances.
So, here's a link to a place describing these formulas http://www.meridianworlddata.com/HTML7/distances-from-city-to-city-2.asp?guid=B5DF02C4-82FA-4141-9136-55D911A840DB (this site seems like a goldmine of related info...)
If this won't get you going i don't know what will
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
Thumper
Paranoid (IV) Inmate
From: Deeetroit, MI. USA Insane since: Mar 2002
|
posted 05-07-2004 05:39
WOW DmS, thanks for the direction!!! You have uncovered a virtual goldmine indeed!!! Thanks!
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-09-2004 17:50
Yeah, that's some cool info. I started noodling around with it after reading this thread because I'd wanted to do something like this for another project a while back, but couldn't figure it out.
Distance is my little test page. It lets you choose Michigan cities with my county, and shows the distance between the two. I'm going to do the radius stuff next (showing zips within a certain radius distance), but I'm curious as to which is the best way to calculate this.
Do I take a given lat & long and then just run a calculation across the lat & long for each zip? That seems to be the most logical way. But can you do that DURING the select from MySQL, or do you need to read in all the zip info, then cycle through it with a while loop? In my example (and needs), I'm only dealing with about 50 zip codes, but that above linked database has 40,000+, so I'm looking at flexibility.
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-10-2004 10:33
That depends on the math functions in mysql.
When we did it we had east and north coordinates for each zip and did the colculation either in the query using something like this:
code:
select * from zip_codes as z where ( SQRT( POW(z.ost - ".$usersPositionX.", 2) + POW(z.nord - ".$usersPositionY.", 2) ) < ".$userRangeInMeters." ) ";
In order to get this to work you need to first retrieve the coordinates for the users zipCode and use that as input to this select (which is from memory, cannot test it now).
This we did in the sql as we searched ONLY IF the user had given his zip plus a range.
The other way is when you have the coordinates for two zip's and need to know what the distance is between them, that we did on demand.
Note, I don't think this will work with lat & long since we worked with positive x-coordinates in meters counted from the equator and y-coordinates counted from the central meridan (15°48'29".8 East Greenwich.)
Now, I don't understand half of this conversion from lat/long to coordinates but for those who are interested and know their math, this are links to how the swedish geo-institute did it (swedish only, but the math is there), basic info: http://www.lm.se/geodesi/refsys/rt/projektioner_i_rt.htm
formulas: http://www.lm.se/geodesi/kartprojektion/gauss/gauss_konforma_projection.pdf
Have fun
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
poi
Paranoid (IV) Inmate
From: France Insane since: Jun 2002
|
posted 05-10-2004 11:01
DmS: Your query will work independently of the sign of the coordinates since in the end you compute the relative ( to the position given by the user ) coordinate of the cities in the database, and then square its components to get a distance. BTW, to save a lot of useless calculation you can remove the SQRT statement and simply square the $userRangeInMeters value in the query.
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-11-2004 02:30
Well, what I'd like to do is have a user pick their zip code from a <select>, and be able to show other zip codes within x miles of it. All of the zip and their related lat & long are stored in the db. So I just need to do a calulculation to show which are with a certain radius. The radius distance will vary.
So, back to my original question. While the number of zip codes I'll be dealing with is fairly small, that could change. So - can I use a select that does the calculation on the fly?
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-11-2004 09:59
Thanx poi!
Pugzly, That's what we did, we took a users zip + a range and calculated each zip on the fly to determine if the ad was inside or outside the raduis.
It worked for us and we used appr 9000 zips, however with more zips than that and some traffic I'd expect a slowdown on performance.
However, what I'd do today if I was working with 40000 + zips & wanting to do it on the fly, is probably to build a persistant cache in the db.
For each request from a zip & a distance, check if you have it stored, if not query & calculate on the fly and store it in the cache for next time.
This will work as long as you only need to show zipcodes within a radius since the zip's are relatively static.
One plus with this is that you allow the system to build the cache where it's needed, on those combinations that are in demand, no power is wasted on those combos no one asks for. It'll be a bit slow the first run for each combo, but after that you are just polling the cache with "SELECT * from zip_cache where range = '20' AND userZip is '90210' ".
Of course you could create this cache in advance, but you will end up with millions and millions of rows in the db if you pre-calculate all combinations no matter if you need them or not.
You might need another solution if you are connecting data that changes a lot with the zips though, then you risk having the data in the cache growing stale quickly.
The other way, to poll 40000+ rows on each query to calculate it in a while loop, don't really like that alternative either, just as bad as computing the same thing over and over again.
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
poi
Paranoid (IV) Inmate
From: France Insane since: Jun 2002
|
posted 05-11-2004 12:16
Precomputing the distance between all the cities may sound like a good idea at first, but you may consider the HUGE size of the datas. Let's do this for 40.000 cities. We'll need to compute only half of the distances to avoid to store some useless datas.
40.000 x 39.999 / 2 distances = 799.980.000 distances to store
So if in a project you need to to compute the distance of a city to a all the other cities, one thing to consider is to arrange the cities in some circles of say ~500 Km. A city can be in several circles, but that way it's extremely fast to reject the circles and thus hundreds/thousands of cities if they belong to a circle that is too far from the circle of the city given by the user. At least, that's what I'd do
Hope that helps
(Edited by poi on 05-11-2004 12:23)
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-11-2004 15:47
poi, the humongous amount of rows that all possible combos would generate is what turned me to dynamically buildning a cache to precalculate what actually is used.
Your suggestion is smart though!
As long as the end user isn't allowed to input his/her own distance, selected precalculation would be a good solution, it will require you to recalculate if you add a new radius to the options in the form though.
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
MajorFracas
Nervous Wreck (II) Inmate
From: Insane since: Jul 2003
|
posted 05-11-2004 15:53
What's the real performance hit? Running a query against such a large database or performing the calculation in the query?
The appropriate index would help. Obviously, you can't have an index based on the user's current position, but indexing by lat/long could help. (Maybe?)
If the math is the hold up, you could approximate by simply subtracting lat/long from user's current position and make sure each difference is less than radius. Then you could work over this smaller result set and calculate the actual distance using the a*a + b*b < c*c method. (More than 75% of the hits within the 'box' will also be within the 'circle'.)
|
poi
Paranoid (IV) Inmate
From: France Insane since: Jun 2002
|
posted 05-11-2004 16:00
DmS: My idea is to make a sort of hash table to reject thousands of cities of the final ( and exact ) distance calculation.
First, you need to find in which circle(s) belong the city the user entered, then find which circles are in the radius specified by the user, finally compute the distance of the cities in those circles and reject those those who are too far, and voilà. It's damn easy, avoids thousands of useless computations and require a rather small amount of pre-calculations ( I suppose having 128 to 256 circles should be fairly enough )
MajorFracas: Indeed, rejecting the cities outside of the radius box is also a neat idea.
(Edited by poi on 05-11-2004 16:06)
|
MajorFracas
Nervous Wreck (II) Inmate
From: Insane since: Jul 2003
|
posted 05-11-2004 19:07
I had another thought regarding this, similar to your multiple circles, poi. What if you precalculated the distance of each point in the database to 3 known points. Then you could sort of 'triangulate'.
A point that is a candidate for inclusion may be found by comparing the differences in distance to the known points (as compared to the user's location).
Ok. Sounds complicated here, but follow me: Assume our known points are 1, 2 & 3 and the user is at point A and the distance between point A and our fixed points are represented by A1, A2 and A3. A point which we would like to consider, say B, with distances to the known points represented as B1, B2 and B3 may be within the radius R if it satisfies the following:
code:
abs(A1-B1) <= R && abs(A2-B2) <= R && abs(A2-B2) <= R
(where abs() is the absolute value function...)
Conceptually, the B points will be contained by the intersection of 3 arc segments (not technically correct terminology) with A at the center.
If our known points 1, 2 and 3 are well-dispersed in relation to point A and if A is near the center of the triangle formed by 1, 2 and 3, then the area containing the B points will approximate a hexagon. Which may not be a bad approximation of the circle with center A and radius R. As point A gets close to the edge of triangle 123 or even outside it, the 'hexagon' containing the B points will degenerate and look less like a circle.
Of course, you could index the database based on the distances to the known points which would improve the search performance (as nearby points will be near each other according to the order of the index).
Just a thought anyway.
|
poi
Paranoid (IV) Inmate
From: France Insane since: Jun 2002
|
posted 05-11-2004 19:17
And another good idead by MajorFracas
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-11-2004 23:13
Some great info & lot's of good tips here!
I whish I had this info when we did that project 6 months ago...
It would have made my life a LOT easier.
Someone make sure this is archived and/or into the faq when it's done if I forget
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-12-2004 01:53
Ok, this is making my head hurt.
In my scenario, we're dealing with sex offenders. A user enters their zip, and is shown a list of registered sex offenders in their zip (no problems so far). But what I also want to do is show how many are in zip codes within a certain radius. In Michigan, there is a little over 1000 zip codes (if I recall - I'm too lazy to go look). I'm focusing more on the zips in my county (~40), but want the possibility to expand to the state level.
That being the case, the dynamic caching idea is somewhat appealing, although I've never done that before. Can someone recommend a good source of info on doing that?
Also, if I do that, is the theory that I could do the select by first checking for an existing entry in the cache, and, if not found, do a select with the mathematical calculation inside, or should I read in all the zips and do a while loop? A while loop would be easy, but that's not what I'm after. I'm thinking performance.
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-12-2004 11:24
Pugz, if you are looking at around 1000 zip's and don't expect high traffic, do the calculations directly in the sql.
I think that would be both easiest and good enough performance wise.
Should you go the cached-way, then I'd go for calling one function that checks the cache, if nothing is found it retrieves the data with the calculations, loads the cache and returns the result. that way you'll always get it from the cache if it exists.
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-12-2004 13:34
Great - that's what I was thinking.....
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-13-2004 11:16
Just thought I'd note that this is now added to the FAQ: http://www.ozoneasylum.com/21781 together with an example page in PHP.
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-13-2004 14:02
DmS
Thanks for the example code. I'm trying to resolve an error I'm getting on line 44
Warning: Supplied argument is not a valid MySQL result resource in /path/to/radius.php on line 44
Line 44 reads
while($item = mysql_fetch_array($result)){
I'll let you know what I find...
|
DmS
Paranoid (IV) Inmate
From: Sthlm, Sweden Insane since: Oct 2000
|
posted 05-13-2004 20:42
Without looking I'd guess it's probably a nonexisting zipcode, I had a couple of those while writing it.
There are no errorchecking/validation whatsoever in there so you'll have to take care of that.
/Dan
{cell 260}
-{ a vibration is a movement that doesn't know which way to go }-
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-13-2004 21:14
Not a problem. I just tossed that example up real quick before I was out the door....
I'll take a closer look.
|
Pugzly
Paranoid (IV) Inmate
From: 127.0.0.1 Insane since: Apr 2000
|
posted 05-13-2004 21:31
Looks like it had something to do with the connection method....
I came up with http://runningwolf.com/dev/radius4.php after adding some error checking and fixing a couple of typos. This checks the full list of 40,000+ zips, so you guys can beat on this a little.....
|