Topic awaiting preservation: PHP - determining if two .jpg files are the same? |
|
---|---|
Author | Thread |
Paranoid (IV) Inmate From: 127.0.0.1 |
posted 10-31-2005 16:40
I'm doing some screen scraping, and part of the data is a .jpg for each record. If there is no photo for a given record, the source gives a generic "photo not available" image. The file name format matches those that are valid, so I can't verify just by filename. File sizes vary amongst the records, so I'm reluctant to look for each .jpg that matches the generic file just by file size. |
Paranoid (IV) Inmate From: France |
posted 10-31-2005 17:21 |
Maniac (V) Mad Scientist From: 100101010011 <-- right about here |
posted 10-31-2005 18:33
This is usually something that MD5 is used for (specifically md5_file). Though that can be slow depending on the size of the images. |
Paranoid (IV) Inmate From: 127.0.0.1 |
posted 11-01-2005 18:42 |
Paranoid (IV) Inmate From: France |
posted 11-01-2005 18:48
yep. Use GD to compare the color of each pixels of the 2 images. Before that, resize both images to an arbitrary size to decrease significantly the amount of pixels to compare. |
Paranoid (IV) Inmate From: 127.0.0.1 |
posted 11-01-2005 20:00 |
Maniac (V) Mad Scientist From: 100101010011 <-- right about here |
posted 11-01-2005 21:09
poi's suggestion actually physically examines if the two images are identical so if you know the height and width you'd loop through using something like so |
Paranoid (IV) Inmate From: 127.0.0.1 |
posted 11-01-2005 22:51 |
Paranoid (IV) Inmate From: France |
posted 11-01-2005 23:11
That's it. |
Paranoid (IV) Inmate From: 127.0.0.1 |
posted 11-02-2005 06:56 |
Paranoid (IV) Inmate From: France |
posted 11-02-2005 09:23 |
Paranoid (IV) Inmate From: 127.0.0.1 |
posted 11-02-2005 22:23
No - it's actually turned out to be fairly simple. I found one record with the 'no photo available' image, and got the MD5 hash on it. I then just did a SQL query in my db, and cycled through the records, cURLing the photo, running an MD5 hash on it, and deleting it if it matched (since I can't MD5 them remotely, I have to get them first). |