## David and Goliath: UI Computer Science Students make the Grade

##### (Posted on: October 6, 2016. Updated on: November 10, 2016.)

2016 was a difficult year for polling data-driven forecasters of the United States Presidential and Senate races. This does not mean that poll-based forecasting is wrong, outdated, or passé. It means that such approaches have their limitations, and that in some cases, the laws of chance result in unusual circumstances occurring. This is why there are upsets in the NCAA Men’s basketball tournament every March and why weather forecasts occasionally get it wrong on any given day. The key difference is that data-driven decision-making is better suited for forecasting (i.e., consider various scenarios and assessing their likelihoods) rather than forecasting (assessing with certainty that some event will occur in the future). Yogi Berra had it right when he said, “It's tough to make predictions, especially about the future.”

Fivethirtyeight.com has become the gold standard for election forecasting. Since being purchased by ESPN in 2014, they have assembled a superb staff of writers, data analysts, and graphic artists, providing data analysis across a spectrum of areas. Their stream of blog commentaries and insights, typically based on data analysis, attract an enormous following across a wide swath of the population.

With their launch in 2008, the genesis of their name, Fivethirtyeight.com provided forecasts for the United States Presidential election. Since then, they have expanded to include the United States Senate, the Presidential primaries, and numerous other events for which forecasts can be made.

Election Analytics in contrast, is run by a group of students (one graduate Engineering and four undergraduate Computer Science students at the University of Illinois at Urbana-Champaign). Launched in 2008, Election Analytics represents a STEM learning laboratory for these students. The students experiment with new ways to present data, participate in the interface design, and analyze the data for posting. The methodologies employed by Election Analytics have been published in peer reviewed journals, allowing anyone to replicate how they use polling data to make their forecasts.

Two measures can be used assess the accuracy of forecasts. The Brier Score
measures the squared distance between each forecast (the probability of the
correct selection, given between zero and one) and the outcome (given as a one),
averaged across all the forecasts. The Entropy Score
measures the negative of the natural logarithm of each forecast (the probability
of the correct selection), averaged across all the forecasts. A perfect forecast
would result in both a Brier Score and an Entropy Score of zero.
Therefore, if p_{1}, p_{2},..., p_{51} represent the
probabilities assigned to the winner for each state (plus DC), then the Brier score is

Σ_{j=1,2,...,51}(p_{j}-1)^{2}/51

and the Entropy score is

-Σ_{j=1,2,...,51}ln(p_{j})/51.

Note that the Electoral College Votes associated with the individual congressional district in Nebraska and Maine are excluded in these scores.

If there are N open senate seats, and q_{1},q_{2},...,q_{N}
represent the probabilities assigned to the winner for each senate seat, then the Brier score is

Σ_{j=1,2,...,N}(q_{j}-1)^{2}/N

and the entropy score is

-Σ_{j=1,2,...,N}ln(q_{j})/N.

In general, smaller scores indicate better forecasts.

The following tables report the Brier and Entropy Scores for the 2008, 2012, and 2016 Presidential Elections and the 2012, 2014, and 2016 Senate Elections. These are the six elections for which the two websites (Fivethirtyeight.com and Election Analytics) have provided forecasts. Note that if a probability of zero was assigned to an event, and the resulting event occurred, then a value of 0.0001 was used in the score calculation.

Brier Score | Entropy Score | |||
---|---|---|---|---|

President | FiveThirtyEight | ElectionAnalytics | FiveThirtyEight | ElectionAnalytics |

2008 | 0.0205 | 0.0161 | 0.0690 | 0.0525 |

2012 | 0.0091 | 0.0149 | 0.0404 | 0.0445 |

2016 | 0.0689 | 0.0851 | 0.2207 | 0.6207 |

Senate | ||||

2012 | 0.0448 | 0.0314 | 0.1488 | 0.0899 |

2014 | 0.0335 | 0.0325 | 0.1102 | 0.1200 |

2016 | 0.0536 | 0.0778 | 0.1650 | 0.4197 |

Each of the two sites had a lower Brier Score in three of the six elections. For the Entropy Score, Fivethirtyeight.com had lower entropy Scores for four of the six elections.

Based purely on accuracy of the final forecasts, Election Analytics and Fivethirtyeight.com have reported comparable results in their forecasting of the outcome of the six recent Elections, with Fivethirtyeight.com having a slight edge. Election Analytics has also done this at a fraction of the cost (the total direct expenditures by the Election Analytics group in 2016 is $25, not including computer support provided by the University of Illinois.)

Fivethirtyeight.com is an entertaining site for the general population to gain insights into the election and provide informative discussion and thoughts. They have provided a valuable service in popularizing the use of data to inform people on a wide swath of topics. No matter which site is more accurate in forecasting the outcomes of elections, the students involved in designing and maintaining Election Analytics are the real winners. Election Analytics is an activity that will launch their STEM careers in ways that make a difference, far beyond this year’s election.