Sunday, February 15, 2009

Making User Ratings Add Up

Many Web sites let users rate stuff: products, content, even people. For example, based on user ratings, you might see that Product A has three stars and Product B has four. Yet behind such ratings there is always a methodology. And as we’ve seen with election systems, depending on how you count, the same set of user preferences can lead to different outcomes.

Here is an example about rating products from blogger Evan Miller’s How Not to Sort by Average Rating:

Average rating works fine if you always have a ton of ratings, but suppose item 1 has 2 positive ratings and 0 negative ratings. Suppose item 2 has 100 positive ratings and 1 negative rating. This algorithm puts item two (tons of positive ratings) below item one (very few positive ratings).

Miller then shows a screen-capture from Amazon.com. The first product has a single rating, which happens to be five stars. That product is ranked ahead of a product with 4.5 stars across 580 ratings. If you were evaluating the two products, and that’s all the information you had, which would you suspect is better?

As an alternative, Miller suggests using a statistical technique that factors-in the number of ratings as well as their magnitude. I’d prefer that technique, or something like it. However, it has a cost. Explaining this...

...to users is a lot harder than explaining a basic average.

In Amazon.com’s case, a user can see each product’s individual user reviews and ratings. So making it obvious how those individual ratings roll-up into the overall rating has value. Indeed, Amazon.com’s solution is to show the overall rating in terms of stars, but next to that rating Amazon.com shows the number of ratings in parentheses. Thus, the user can judge the relative importance of various products’ number and magnitude of ratings. This approach puts more responsibility on the user, but it keeps the situation easily understandable.

At the end of the day, Amazon.com’s solution may be best for its users, because displaying the two numbers together reveals the key weakness of the system when it occurs, inviting users to compensate as they see appropriate. In contrast, there are numerous statistical methods, of which Miller proposed one, that could improve the rankings if only a single aggregate rating is desired. The problem is, different methods will lead to different rankings under some conditions, and only a small number of specialists would understand why.

The larger point is, aggregated ratings tend to imply objectivity that is not fully there. While aggregating many people’s ratings will lead to a more objective assessment than a single person’s rating, the process of aggregation has its own subjectivity. In other words, we see once again that the voice of the people is subject to which amplifier you use.

2 comments:

  1. Steve,
    I find it funny that after many years we are both still interested in the same retail web issues.
    At our product review portal (Buzzillions.com) we have chosen to publish a discrete ranking system but showing the actual star average / review counts as well if the user wants to digest it.
    The piece that we have decided to add addresses the fact that the relative counts & averages for the product category might be varied for each category. There are many more people reviewing digital cameras than patio furniture. I won't go into all the details in a comments section but we push products with a very low number of ratings toward the middle of a ranking list (neither top or lowly ranked) so that we give recommendations that we are more confident in. This may not serve the early adopter as well but if a consumer is worried more worried about making a reasonable choice rather than the perfect choice it works very well.
    Thanks,
    Joshua

    ReplyDelete
  2. Sounds like a reasonable approach, Joshua. For the average consumer, the cost of a bad choice is much greater than the incremental benefit of a perfect choice over a solidly good choice.

    ReplyDelete