[pHash-support] Getting different distance results locally than the demo site

Clint Fenton clint at originproductions.com
Tue Aug 30 15:05:50 PDT 2011


Hi,

We use pHash in an application to help identify duplicate images and to 
help group similar images.  We access the library via the PHP bindings 
included with the distribution and use the DCT hashing algorithm. 
Overall, it works quite well.

There are, however, rare instances where our install returns radically 
different distance results than the pHash demo site.  In all of these 
cases, the demo site does a good job and our local install doesn't seem 
to get it right.

Here are 2 sample images that we are having the problem with:

url: 
http://thislife.s3.amazonaws.com/4e447cb7-1b70-4ef6-abc4-02cf7f000001.jpg
our phash: 10045688968694716518

url: 
http://thislife.s3.amazonaws.com/4e4481ed-80d0-41a5-b02b-02c47f000001.jpg
our phash: 11049976301167908966

our system says distance = 12
phash demo site says distance = 32 (this seems right to me)

What I am not sure of because I can't see what pHash values the demo 
site is coming up with is whether we are getting incorrect pHash data at 
hashing time or if the distance calculation is coming up wrong.  In 
either case, I don't have any idea how to start troubleshooting.

I just updated all of the packages that pHash uses, rebuilt pHash itself 
using the latest release 0.9.4, and rebuilt the php module using the 
bindings that come with the latest distribution.  I restarted the whole 
system with high hopes that this problem would be solved, but it didn't.

Anyone have any bright ideas?

Any responses are much appreciated,

Clint



More information about the pHash-support mailing list