# NNN Blog

## Compared to What: Infectious Disease Edition

The Washington Post has a great online infographic comparing attributes of the spread of Ebola to those of more common diseases such as Chicken Pox or Influenza. The site really drives home how important it is to provide context when presenting data to people who are not intimately familiar with the topic. (While have some understanding of Influenza transmission from personal experience, that experience doesn't translate well into the statistics on transmission, for example.) I could image giving students the Ebola data only and ask them to draw some conclusions. Then I could give them the comparison data and ask how that added information alters their understanding of what is happening and how we might want to respond.

## Spurious Correlation

Tyler Vigen has some great examples of correlations that are surely not evidence of causation. One downside of the examples list, however, is that they are all time series. The underlying problem is that the two time series considered are not stationary; they are both trending which explains the high correlation. (The one exception is the example of the numbers of Nicolas Cage movies and people drowning after falling into swimming pools. I may be wrong, but to my eye those two series look stationary.)

One other interesting fact I learned from the site: The number of people who die by becoming tangled in bedsheets has more than doubled in the last decade to almost 800. What can explain this steadily growing national epidemic?!

## All Depends on How You Count

- How do we define unemployment? Do we want to adjust that measure for underemployed?
- Is employment always employment? Does it matter what kind of job people hold?
- How does the current labor market compare with that at other times (in particular, the time before the financial crisis)?

## Models of Ebola

While many students will not have had differential equations, the intuition of the math should be accessible to most undergraduates.

## Does the Internet Matter to GDP?

This study is a great tool for teaching about multivariate regression in general and fixed effects in particular. Students will quickly recognize the difficulty in finding good data to answer this question. If you examine the correlation between internet speed and income you will undoubtedly find a positive correlation, but how can you possibly interpret this as a causal effect? Shouldn't we expect to find that high-income people opt for nicer things of all types...including internet service? That logic suggests a good part of this correlation is due to reverse causality with higher income leading to better internet speed.

The Analysis Group folks aren't foolish and so they don't simply look at the cross-sectional correlation. Instead, they create a panel of data which records internet service and GDP values for 2011 and 2012. They then run a regression with city and time fixed effects. For those of you not familiar with the terminology of "fixed effects models," this is just shorthand for saying that they included categorical dummy variables to capture city-specific and time-specific effects. The city-specific dummy variables effectively control for all of the factors about a city which are constant across time. What this means is that the fixed effects model looks only at the correlation between the *change in* high speed access and the *change in* GDP per capita. In other words, the analysis focuses entirely on cities which changed internet service status between 2011 and 2012 to see whether income growth in these cities differs from the economic experience of all other cities. The authors get it exactly right when they summarize their work: "We found that in [metro areas] where gigabit broadband service was introduced between 2011 and 2012, GDP per capita levels were significantly higher." The effect size was 1.1 percent with a standard error of 0.7 percent.

From a conceptual view, this sounds great. If economic growth is particularly strong in cities in the year(s) following an advance in internet service that would be stronger evidence than a mere correlation between the two variables. (Of course, we could still critique this approach. If you anticipated increased economic activity wouldn't that increase your willingness to invest in internet service? Isn't this forecasting exactly what we hope and expect firms and municipalities to be doing? If this is what is going on, then the fact that an advance in internet service precedes exceptional GDP growth should be of little surprise and hardly evidence for the benefits of computing technology.)

But from a practical view, the fixed effects approach is often problematic because it ends up resting the entire argument on a few, very odd cases. In this particular instance, Analysis Group reports that only 14 of 55 municipalities in their sample were in the high-speed group. They don't say whether this was in 2011 or 2012 and so I will assume it is the latter. But the really important question is: How many of those 14 were not high-speed in 2011? How many changed their internet service status? Given that they don't report how many areas were switched from slow to fast service, we can only guess. It seems reasonable to guess the number is less than 5. That means your entire argument rests on the experience of a small number of (by construction) highly unusual cities.

Is that really better than working from the admittedly biased results based on a cross-sectional analysis? I'd love to engage advanced students in debating this question as a means to a deeper appreciation of the power and limitations of fixed effects models.