Amazon's Top Rated Items
I developed a set of scripts that scrape Amazon’s product pages. The result of scraping and processing nearly 300,000 products from 26 departments is my own Amazon Top 400 List.
I struggled to come up with a good way to calculate a score for each product based on the number of reviews and average rating given. I wanted this “score” to be a true measure of how good the product was and not to be skewed by products with few reviews available.
Looking at how IMDB does it, I learned about something called a Bayesian Average or a “true Bayesian estimate”. It’s generally defined as estimating the mean using some measure of the larger population in addition to the available dataset. It’s described very eloquently here.
Here’s my calculation:
Where P is the mean rating across all products, m is the minimum number of reviews to be considered, R is the mean rating of the product and v is the total reviews for the product.
To account for products with differing datasets, the score calculation factors in the average review of all Amazon products (I found it to be 3.53) into each product’s “score”. Based on the outliers I saw in the data, I decided to make 50 reviews the cutoff for being considered for the Top 400 list. It worked great, all of my top 400 items seemed to be highly qualified.
The number one product is a Universal Pistol Loader/Unloader. 1416 out of its 1522 customer reviews are 5 stars! Strangely, there are lots of other firearm-related items in the Top 400 like this one, this and this. I have no idea why this is the case.
There are TONS of cables that made the Top 400 list. Mostly HDMI but also ethernet, USB, RCA and others. I’m not sure what compels people to review cables so often and so favorably. I guess if it works, 5 stars.
I was happy to see that many of the same classic products made my list that I’ve seen in other Top Amazon lists:
Also, there were a few rather unique items on the list: