Cultivate Labs | Collective intelligence solutions using crowdsourced forecasting

What are Relative Brier Scores and How are they Calculated?

The Ultimate Guide to Crowdsourced Forecasting

To understand Relative Brier Scores and their calculation, it is essential to first understand basic Brier scores and how they are calculated. If you are not familiar with Brier scores, you should start by reading our article What is a Brier Score and How is it Calculated?

Note: in the past, we referred to Relative Brier Scores as Net Brier Points. They are functionally the same -- different names for the same calculations.

What are Relative Brier Scores and what problem do they solve?

In forecasting tournaments, the importance of fair and equitable scoring is paramount. If forecasters can gain an unfair advantage in scoring, then it undermines the goal of identifying the best forecasters. A traditional Brier score can be effective for comparing forecasters if all participate equally. That is to say, they all forecast every day on every question.

But what happens in a forecasting tournament where participants choose the questions and days on which they forecast? In that scenario, a forecaster may choose to wait until late in a question before submitting a forecast, when picking the correct outcome is much easier.

To illustrate the Relative Brier Score concept, we can continue the example from What is a Brier Score and How is it Calculated? In that post, we showed a hypothetical forecast on the question "Will the Cubs win the World Series?" and calculated the associated error/score for the scenario where the Cubs win.

n.b. the values in the 'Forecast' row are 'yes,no' forecasts. So '0.9,0.1' would correspond to the forecaster saying there is a 90% chance the Cubs will win the World series and a 10% chance they won't.

Forecaster #1	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Overall Brier Score
Forecast	0.9, 0.1	0.9, 0.1	0.9, 0.1	0.95, 0.05	0.95, 0.05	0.95, 0.05	0.95, 0.05
Score	0.02	0.02	0.02	0.005	0.005	0.005	0.005	0.0114

Now, let's consider 2 other forecasters in the same question:

Forecaster #2	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Overall Brier Score
Forecast	0.25, 0.75	0.25, 0.75	0.2, 0.8	0.2, 0.8	0.2, 0.8	0.2, 0.8	0.2, 0.8
Score	1.125	1.125	1.28	1.28	1.28	1.28	1.28	1.2357

And Forecaster #3 (a blank cell indicates no forecast, and thus no score, for that forecaster on that day):

Forecaster #3	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Overall Brier Score
Forecast						0.99, 0.01	0.99, 0.01
Score						0.0002	0.0002	0.0002

So at 0.0002, Forecaster #3 ends up with the best Brier score (remember, 0.0 is the best possible Brier score and 2.0 is the worst). This seems a little unfair though -- Forecaster #3 waits until Day 6 when the answer might already be obvious. Shouldn't Forecaster #1 be rewarded for making good forecasts early in the question?

Calculating Relative Brier Scores

Relative Brier Scores (also known as Net Brier Points) were created to correct this inequity -- a more fair scoring system that rewards early, accurate forecasts. Like traditional Brier Scores, lower is better with Relative Brier Scores (you can think of it like a golf score, where below par is better).

To calculate a Relative Brier Score:

Calculate the median daily score across all forecasters
For each day, subtract the median from the forecaster's daily Brier score to get his/her daily Relative Brier Score.
For each forecaster, sum the daily Relative Brier Score calculated in step #2
Divide that sum by the total number of days for the question (not just the days the forecaster participated)

To complete our example, let's calculate the median of the daily scores (step #1 above):

Forecaster #1	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7
Forecaster #1 Score	0.02	0.02	0.02	0.005	0.005	0.005	0.005
Forecaster #2 Score	1.125	1.125	1.28	1.28	1.28	1.28	1.28
Forecaster #3 Score						0.0002	0.0002
Median Daily Score	0.5725	0.5725	0.65	0.6425	0.6425	0.005	0.005

Now we can subtract the median from the forecaster's score, sum the forecaster's daily Relative Brier Scores, and divide by the total number of days for the question (7 days, in our example):

Forecaster #1	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Overall NBP
Forecaster #1 NBP	-0.5525	-0.5525	-0.63	-0.6375	-0.6375	0.0	0.0	-0.43
Forecaster #2 NBP	0.5525	0.5525	0.63	0.6375	0.6375	1.275	1.275	0.7943
Forecaster #3 NBP						-0.0048	-0.0048	-0.00137

Using Relative Brier Scores, Forecaster #1 is rewarded for forecasting early and accurately, resulting in the best score (-0.43 -- remember, lower is better). Forecaster #3 still receives a score that is better than the median (-0.00137), but no longer beats Forecaster #1, due to entering the question late. The Cubs hater, Forecaster #2, made poor forecasts throughout and received the worst score (0.7943).

If you're interested in learning more about Brier scoring, Relative Brier Scores, or running a forecasting tournament, feel free to contact us.

Next up: What is ordinal scoring and how is it calculated?