Bayesian Average

The Bayesian Average is a mathematical formula that is used to derive average in a data set when the data set may be small. Typically you’ll see the bayesian average used on sites like Yelp.

Let’s assume for a moment, there are a number of restaurants, with various ratings across the board. Each of them is shown a rating from 1 to 5 stars. A new restaurant enters the site, and they have received a single, 5 star rating. If you were to consider all ratings only as an average, this new restaurant with their single rating, will now be considered the highest rated restaurant in the entire city, which may not be entirely accurate.

This is where the Bayesian Average comes in. In essence, the number of ratings (or votes) has an influence on the total outcome. It may be better to illustrate this with an example. Here is our data set. Every line contains the result of an individual vote. Right at the end, you’ll see Luigi’s, which had a single rating of 5.

VoteRating
Mario4
Brando4
Rocco3
Franco2
Mario3
Brando2
Rocco5
Franco5
Mario2
Brando2
Rocco3
Mario3
Brando3
Luigi5

Step 1 – We need to tally up the totals. Average the rating for each restaurant (this_rating), and count how many ratings (this_num_votes) each restaurant received.

Votethis_ratingthis_num_votes
Mario3.004
Brando2.754
Rocco3.673
Franco3.502
Luigi5.001

Step 2 – Calculate the average rating (avg_rating) and the average number of votes (avg_num_votes) by averaging the totals received from step 1. In this example avg_rating = 3.58 and avg_num_votes = 2.80

Step 3 – With all of this information, we can now calculate the bayesian average for each restaurant. The formula for the bayesian average is :

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

This leaves us with the following result. Even though Luigi’s only has one vote, the Bayesian Average comes in at 3.96. While it is still higher than all of his competitors, it is more realistic considering the total number of votes that have been received.

Votethis_ratingthis_num_votesBayesian
Mario3.0043.24
Brando2.7543.09
Rocco3.6733.63
Franco3.5023.55
Luigi5.0013.96

Python example

This example will demonstrate how you can calculate the Bayesian Average using Python.

import json
# Bayesian Average example in Python

# Step 0 - Feed the data set with the data we are interested in
data_set = [
	{'vote' : 'Mario'	, 'rating' : 4},
	{'vote' : 'Brando'	, 'rating' : 4},
	{'vote' : 'Rocco'	, 'rating' : 3},
	{'vote' : 'Franco'	, 'rating' : 2},
	{'vote' : 'Mario'	, 'rating' : 3},
	{'vote' : 'Brando'	, 'rating' : 2},
	{'vote' : 'Rocco'	, 'rating' : 5},
	{'vote' : 'Franco'	, 'rating' : 5},
	{'vote' : 'Mario'	, 'rating' : 2},
	{'vote' : 'Brando'	, 'rating' : 2},
	{'vote' : 'Rocco'	, 'rating' : 3},
	{'vote' : 'Mario'	, 'rating' : 3},
	{'vote' : 'Brando'	, 'rating' : 3},
	{'vote' : 'Luigi'	, 'rating' : 5}
]

# Step 1 - Tally up the totals
totals = {}
for d in data_set:
    # -- setup the dictionary
    if not d['vote'] in totals:
        totals[d['vote']] = { '_total' : 0, 'this_num_votes' : 0, 'this_rating' : 0.0 , 'bayesian_average' : 0 }

    # -- start counting the individual results 
    totals[d['vote']]['this_num_votes'] += 1
    totals[d['vote']]['_total'] += d['rating']
    totals[d['vote']]['this_rating'] = totals[d['vote']]['_total'] / totals[d['vote']]['this_num_votes']

# Step 2 - Calculate the averages
count = 0
avg_rating_total = 0
avg_rating = 0
avg_num_votes_total = 0
avg_num_votes = 0

for d in totals:
    count += 1
    
    # == calculate avg_rating
    avg_rating_total += totals[d]['this_rating']
    avg_rating = avg_rating_total / count
    
    # == calculate avg_num_votes
    avg_num_votes_total += totals[d]['this_num_votes']
    avg_num_votes = avg_num_votes_total / count  

# Step 3 - Calculate the Bayesian Average
for d in totals:
    totals[d]['bayesian_average'] = ( (avg_num_votes * avg_rating) + (totals[d]['this_num_votes'] * totals[d]['this_rating']) ) / (avg_num_votes + totals[d]['this_num_votes'])
    print('{vote} = {br}'.format(vote = d, br = totals[d]['bayesian_average']))

# Step 4 - Show the data we have collected, including the Bayesian Averages
print(json.dumps(totals,indent=4))

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s