Since I started taking photos 15 years ago, I’ve amassed a pile of images. 25,000 or so photos sourced from various cameras, phones and a handful that were scanned and imported. They have been managed manually, in various flavors of iPhoto, the occasional random application, and — for the past few years — in Aperture. Copies of the photos have lived on various computers and, even, have been recovered from backups after a rare hard drive death (multi-point backup strategy FTW!).
25,000 images of which somewhere between 5% and 20% are duplicates. Many of them are straight up duplicates; copies of the same image with different filenames resulting from merging various libraries or importing source media to multiple destinations that were later merged. Some are more insidious. Somewhere, something decided to down-res a slew of images and re-import them. Somewhere else, something decided to re-encode all my JPEG images (before RAW) at the same resolution, but with much higher compression.
Slogging through all those images would take hours. Days, really, as it’d have to be done in my spare time. And, given that it is a task I’ve been avoiding — digging the whole deeper — for a decade+, clearly not going to happen soon.
Clearly, there has to be a better way. And there is!
A quick search yielded PhotoSweeper.
On first pass, it quickly eliminated all straight up duplicates where the actual contents were identical. This took less than ten minutes to do on those 25,000 images and it eliminated nearly 4,000 dupes (and triplets and the occasional quad).
The second pass is where this software really shines. I configured it to do a content comparison and flag any sets of images that were pretty close, but not necessarily exactly, similar. In this case, I used the “approximate, align and blur” method. That is, PhotoSweeper re-renders each image as a 144×144 grid of pixels, then blurs it slightly, and aligns the edges. The resulting icon-ized images are compared and any that are similar enough are flagged as potential dupes. It is then a matter of review-and-compare. The arrow keys are used to navigate and the return key to toggle whether or not the image will be trashed.
One click and all the identified dupes are dumped in the trash.
What would have taken days of tedium was reduced to less than an hour. Personally? I would have paid $50 — nay, $100 — for this and have considered it a bargain. It saved me that much time (frankly, it finished a task in short order I’ve been putting off for a decade) and now my remaining organization task is largely one of actually looking at, tagging, and categorizing the photos.
And sharing them with my family. Because that’s what it is all about (for me).