Bayes theorem, estimators and the theory of Amazon ratings

If you've ever bought something on Amazon, or looked at player ratings on an online game, you've often had to weigh the average rating against the number of ratings. A higher average rating is surely a good thing, but would you really prefer one 5.0 rating to a hundred 4.5 ratings?

The way to think about a question like this is to ask: What are we trying to maximise, here? What is it that we're trying to achieve?

We're trying to maximise what our own rating would be -- or rather, the expected value of what our own rating would be.

So we should think of the rating as a random variable with some distribution, whose true mean (and not the sample mean) is what we seek to estimate. It was not immediately obvious that the problem would be a statistical one, but it is now.

So, why don't we just use the sample mean to estimate the true mean, and trust the "unreliable" 5.0 rating over the "reliable" 5.0 rating? Three reasons:
  • Items usually don't have a true 5.0 rating, and just one rating is unlikely to change our opinion of this.
  • We're not really trying to maximise the expected value, e.g. we may be concerned about risk (i.e. a greater variance in the distribution of the true value of the rating is itself a negative, because a true 1.0 rating may mean the application is dangerous or contains viruses)
  • Having fewer ratings may itself be an indicator of lower quality, because it's less popular.
In any case: the keyword is "belief distribution". The first point points out that we have a prior belief distribution and each rating updates this belief distribution. The second point points out that we need an idea of what the entire posterior distribution looks like, rather than just a point estimate.

Well, obviously, the solution is Bayesian statistics (link to the main Bayes's theorem article). If you have a prior distribution on the true mean, and a likelihood function for the observations, you can compute the posterior distribution via Bayes's theorem.

Exercise: Compute, in the case of just two possible ratings (upvote and downvote/like and dislike) the "correct" rating (say, mode of posterior) given a uniform prior. Derive the so-called Laplace's rule of succession

No comments:

Post a Comment