Computational Methods for Nonlinear Systems

Physics 7682 - Fall 2014


Instructor: Chris Myers

Mondays & Fridays 1:30-3:30, Rockefeller B3 (directions)

http://www.physics.cornell.edu/~myers/teaching/ComputationalMethods/

US Presidential Election

Background

The 2012 US Election season is upon us, which will result in the selection of a new President. The President of the US is elected in a somewhat complicated fashion, not being directly elected by the people, but instead being selected by an Electoral College composed of electors from each of the 50 states and the District of Columbia. There are a number of websites that collect and track polling data, and make projections of the expected Electoral Vote outcome on election day: November 6, 2012. These websites include: Pollster.com, electoral-vote.com, fivethirtyeight.com, and The Princeton Election Consortium, and many, many others.

This exercise uses polling data to support the analysis and exploration of the electoral college map. At the moment, this exercise is targeted toward data synthesized by The Princeton Election Consortium, but similar sorts of analyses could be done with other data streams. (The Princeton data are usefully organized, the underlying code is available for those who are interested in the details, and the Princeton site offers a different computational approach than some of the other sites.) This exercise is not intended to be an introduction to polling methodologies, which is the focus of much serious work in the field (appropriately so). Nor is it intended to be partisan. Rather, it aims to introduce a few of the issues that arise in the analysis of US presidential elections, and perhaps feed interest in more rigorous and systematic endeavors.

Learning Goals

Science: You will learn some basics of the US Electoral College system (including what happens in the event of a 269-269 ties), perhaps a little bit about polling methodologies, and connections to some of the other course modules (e.g., NumberPartitioning and Random Text generation).

Computation: You will learn how to download data from the web and parse it for your purposes, how to sample from probability distributions to generate synthetic election data, and how to use convolutions to compute exact combinatorial distributions.

Procedure

If, at any point, you're interested in getting more detail on the polling data, the underlying methodology, or the more detailed approach taken by The Princeton Election Consortium, see the FAQ or the information For Fellow Geeks. (Yes, we're geeks, and we vote.)
  1. Download the file Vote2008Hints.py from the course website, and rename it to Vote2008.py. [The file name is historic -- this module was originally designed for the 2008 election.]
  2. In Vote2008.py, first notice that information regarding states and their respective number of electoral votes is provided. Examine the use of the dict() and zip() functions to produce a dictionary mapping state names (abbreviations) to numbers of electoral votes.
    • NOTE: Nebraska and Maine do not award their electoral votes on a winner-take-all basis as every other state does. For this exercise, we will not worry about that subtlety, and instead treat the votes in each state as a bloc.
  3. Write a function GetCurrentDate() that returns today's date as an integer indicating the day's position in the year. The polling data we will download is indexed using those integers.
  4. Write a function GetPrincetonPollingData() to download and process the current polling data from http://election.princeton.edu/code/matlab/polls.median.txt
    • Use the urlretrieve function in the Python urllib module to download the file.
    • The format of the file is pretty funny. For each date there are 51 lines, corresponding to the 50 States and Washington DC, in alphabetical order. The hints file already has the list of states, and how many electoral votes each state has. There are 5 columns: how many polls are used, what the median date of the oldest poll used is, the median margin, the standard error in the median, and the date. The file uses the last 3 polls, or 1 week of polls, whichever is greatest. Most polls take several days, so the "median date of the oldest poll" is the middle date in the polling. They use median instead of mean to get rid of outliers. More details are at For Fellow Geeks.
    • Parse the file to build up a dictionary named polls. At the highest level, polls is keyed by a date (an integer, e.g., that returned by GetCurrentDate()) that holds another dictionary; polls[date] is keyed by state names (abbreviations in the list of states) and holds a tuple for each state; the tuple contains the polling margin (Democrat-Republican) and the SEM (standard error of the mean) of those polling data.
    • NOTE: some of the polling data in polls.median.txt quote a standard error of 0. This causes problems when one tries to compute a probability of victory for each candidate (division by zero error). Digging into the relevant part of the Princeton code, we find the following MATLAB tidbit: polls.SEM=max(polls.SEM,zeros(1,51)+2);, i.e., any reported SEM less than 2 is bumped up to a floor of 2. I suggest we do the same here. [Who knows where the 2 comes from.]
  5. Write a function GetDemWinProbabilitiesFromPolls(polls, date) that will return from the polling dictionary, for a specified date, a dictionary that maps state names (abbreviations) to the probability of a Democratic win in that state, assuming a normal probability distribution. Hint: the erf function is the integral of a gaussian, and the scipy.special module is a useful place to look for special functions. The appropriate scaling of the erf function can be uncovered here. For those of you who might be worried that this exercise has a Democratic skew, feel free to write instead a function called GetRepWinProbabilitiesFromPolls that computes the probability of a Republican win in each state.
  6. With a probability of a Democratic or Republican victory in each state, one can sample those distributions to simulate a synthetic election. This is the approach, for example, taken by fivethirtyeight.com. Write a function SimulateElectionFromPolls(evotes, polls, date) that uses the polling data from a specified date, and the dictionary of electoral vote counts (evotes), to return a tuple of (Democratic_wins, Republican_wins, totals) where wins are lists of states won by each candidate, and totals is a tuple of the total number of (Democratic, Republican) votes.
  7. Create an ensemble of randomly sampled elections (say, 10000), and plot a histogram of all the Democratic EV totals, or, if you prefer, all the Republican EV totals.
  8. The Princeton site argues that one does not actually need to simulate many random elections. Instead, one can calculate the exact probability distribution of electoral college outcomes directly from the individual state win probabilities and the number of electoral votes in each state. Write a function ComputeExactEVDistribution(evotes, polls, date) to compute the exact electoral vote probability distribution from polling data on a specified date, using the meta-analysis convolution method described in the FAQ and in the geeky MATLAB code. Computationally, this relies on the fact that one can multiply two polynomials by doing a convolution of their coefficients, suitably organized and padded.
  9. Compute the exact EV probability distribution and plot that, comparing it with the simulated, sampled probability distribution computed previously.
  10. The exact probability distribution computed above assumes that the win probabilities in each state are known exactly, but of course they are only approximately known to within some margin of error. Analytically, one could derive the propagation of errors (uncertainties) through the polynomial equation, or computationally, one could explore the variation in the computed probability distribution by sampling win probabilities from a normal distribution. Using the downloaded probability and margin of error data, generate ensembles of win probabilities (drawn from a normal distribution), and compute the "exact" probability distribution for each member of the ensemble. How much variation is apparent in the resulting EV distributions?
  11. There are a finite set of possible electoral college outcomes. Some of those outcomes involve a tie: 269-269.
    • Civics 101 Quiz: What happens in the event of an electoral college tie? Look here, here or here for more information. (And when is the only time in US history that such a tie has taken place?)
    • A tie, of course, is only possible if there are an even number of total electoral votes. Prior to the 23rd Amendment to the Constitution, which granted electoral votes to the District of Columbia, there were 535 total electoral votes. The 23rd Amendment grants DC no more electoral votes than the least populous state, which means DC currently gets 3 votes (equal to two senators plus one representative, even though DC doesn't actually have those). If DC were granted statehood, its number of electoral votes would increase (based on its population), and it would therefore be possible that it would then possess an even number of electoral votes, thereby making the total once again odd. But I digress...
    The possibility of an electoral college tie is essentially the problem posed in the Number Partitioning course module: namely, if you have a set of integers, can you find a partitioning of those integers into two subsets such that the sum of each subset is equal? We are interesting in enumerating all scenarios of electoral college outcomes, but there are 2**51=2251799813685248 total possibilities. We can narrow down that prohibitively large number by enumerating over all those states that are plausibly close in the polls, i.e., the "swing states".
  12. Write a function GetBaseStatesAndSwingStates that, given the polling data and a specified threshold (percentage difference in poll numbers), returns a tuple composed of: safe states in the Democratic base, safe states in the Republican base, and swing states (polling margin less than the threshold).
  13. Use the EnumerateAllScenario(n) function to create an array listing all possible arrays of length n containing +1 and -1, i.e., the n-dimensional hypercube with 2**n vertices.
  14. Write a function to enumerate all outcomes, based on the swing state list generated above. Enumerate over all possible swing state scenarios and sum up the electoral votes for each scenario. What fraction result in electoral college ties? How does this fraction compare to that in the exactly computed distribution? What state(s) show the greatest amount of variability among the scenarios involving ties? Does that variability correlate at all with how close the states are in the polls? (presumably not)
  15. Play with the data, or dig around in the Princeton code if you're interested.
    • You need not confine yourself to the current day's polling data, but can travel back in time and see how the expected distribution of electoral college votes has shifted over the course of the campaign. Use the course wiki to post snapshots of this evolving distribution, or figure out how to make a movie showing the dynamics of the race.
    • There are analyses undertaken on the Princeton site (e.g., the Popular Meta-Margin) and other polling/election web sites that you could implement with these data.
    • There is some java code for displaying color-coded maps that you might be able to use to make maps from your own data.