# NNN Blog

## Standard Errors or Typical Errors?

Nate Silver has made election prediction sexy. This election cycle I've seen many estimates of the probability for a Republican takeover of the Senate. And when I say "many estimates" I don't mean different sources; I mean vastly different probabilities. Today I see the *New York Times *gives Republicans a 70% chance*'*] while the [link http://www.washingtonpost.com/wp-dre/politics/election-lab-2014 *Washington Post* puts the figure at 95%. (Sliver sets the probability at 68.9%. We can take up the topic of over-articulated precision in another blog!) In this context, those numbers mean very different things.

So, what's the source of variation in these estimates? Earlier in this election cycle one explanation was that people were estimating different probabilities. You could find estimates for "the probability that Republicans would take control if the election were held today" and for "the probability that Republicans will take control on election night." In the middle of summer, these are two vastly different concepts because the latter allows for the wide range of events that might shift elections over the course of three or four months.

But surely that can't be the explanation for the divergence in today's *Times*and *Post*estimates. Even if they are asking different questions regarding timing, we are less than 4 days from election day and many ballots have already been cast. It seems clear that the differences here are due to model specification. When students learn statistics, we teach them how to construct standard errors to account for random sampling error. There's nothing wrong with that, but as these election forecasts make clear, the far more typical specification error often swamps sampling error.

Fortunately, the idea of omitted variables bias or other specification error can be intuitively understood by undergraduates regardless of their mathematical prowess. Happy Election Weekend!

## Compared to What: Infectious Disease Edition

The Washington Post has a great online infographic comparing attributes of the spread of Ebola to those of more common diseases such as Chicken Pox or Influenza. The site really drives home how important it is to provide context when presenting data to people who are not intimately familiar with the topic. (While have some understanding of Influenza transmission from personal experience, that experience doesn't translate well into the statistics on transmission, for example.) I could image giving students the Ebola data only and ask them to draw some conclusions. Then I could give them the comparison data and ask how that added information alters their understanding of what is happening and how we might want to respond.

## Spurious Correlation

Tyler Vigen has some great examples of correlations that are surely not evidence of causation. One downside of the examples list, however, is that they are all time series. The underlying problem is that the two time series considered are not stationary; they are both trending which explains the high correlation. (The one exception is the example of the numbers of Nicolas Cage movies and people drowning after falling into swimming pools. I may be wrong, but to my eye those two series look stationary.)

One other interesting fact I learned from the site: The number of people who die by becoming tangled in bedsheets has more than doubled in the last decade to almost 800. What can explain this steadily growing national epidemic?!

## All Depends on How You Count

- How do we define unemployment? Do we want to adjust that measure for underemployed?
- Is employment always employment? Does it matter what kind of job people hold?
- How does the current labor market compare with that at other times (in particular, the time before the financial crisis)?

## Models of Ebola

While many students will not have had differential equations, the intuition of the math should be accessible to most undergraduates.