Cultivate Labs | Collective intelligence solutions using crowdsourced forecasting

What is the Ordinal Scoring System and How is it Calculated?

The Ultimate Guide to Crowdsourced Forecasting

If you’re not already familiar with Brier scoring, you should read the first two parts of our forecast scoring series:

What is the purpose of ordinal scoring?

In our previous articles, we discussed a basic forecasting question: Will the Cubs win the World Series in 2017? Once the season is over, there’s an unequivocal answer to the question (either they won or they did not), and we score each allotment of probability in a forecast as correct or incorrect. But how should we score questions when some answers are "closer" to being correct than others?

One example would be the question "How many games will the Cubs win in the 2017 regular season?" with the answer options:

Less than 50
50-75
76-100
More than 100

Say I make the following forecast:

Answer	Forecast
Less than 50	15%
50-75	45%
76-100	25%
More than 100	15%

Now say they win 80 games, making the "76-100" bucket correct. The 45% I allocated to the 50-75 bucket was not technically correct, but it was closer to being correct than the "Less than 50" bucket. Ideally, our scoring system should penalize forecasters less for allocating probabilities closer to the correct outcome. This is exactly what the ordinal scoring system does.

How are scores calculated in ordinal questions?

Similar to a normal Brier score, we calculate a daily score for each day that the forecast was active and then average the daily scores to calculate an overall score for the question. The difference from a normal Brier score is in the method used for calculating the daily score.

Using a standard Brier score for this question (ie. not an ordinal score), the daily score for my forecast would be:

Answer	Forecast	Score
Less than 50	0.15	(0.15 - 0)² = 0.0225
50-75	0.45	(0.45 - 0)² = 0.2025
76-100	0.25	(0.25 - 1)² = 0.5625
More than 100	0.15	(0.15 - 0)² = 0.0225
Daily score (sum of answer scores)		0.81

In ordinal questions, we use a different method for calculating daily error. To start, we create successive groupings of the answer options as follows:

Grouping 1	Grouping 2
Less than 50	50-75 76-100 More than 100
Less than 50 50-75	76-100 More than 100
Less than 50 50-75 76-100	More than 100

For each of these groupings, sum the probabilities in that group and calculate the squared error using that sum:

(forecast_probability_sum - final_outcome)²

The final_outcome should be 0 or 1, depending on which bucket the correct answer (76-100) falls into.

Grouping 1	Grouping 1 Score	Grouping 2	Grouping 2 Score	Total Score
Less than 50	(0.15 - 0)² = 0.0225	50-75 76-100 More than 100	(0.45 + 0.25 + 0.15 - 1)² = 0.0225	0.0225 + 0.0225 = 0.045
Less than 50 50-75	(0.15 + 0.45 - 0)² = 0.36	76-100 More than 100	(0.25 + 0.15 - 1)² = 0.36	0.36 + 0.36 = 0.72
Less than 50 50-75 76-100	(0.15 + 0.45 + 0.25 - 1)² = 0.0225	More than 100	(0.15 - 0)² = 0.0225	0.0225 + 0.0225 = 0.045
		Daily Score (average of the 3 total score values):		0.27

As you can see, the ordinal score penalizes my forecast much less (remember, lower score = less error = better score) than a standard Brier score. This better reflects the fact that I allocated most of the probability to the correct bucket or a bucket that was "close" but not quite correct.