Using computation to understand statistics through climatology

Janel Hanrahan, Atmospheric Sciences, Lyndon State College

Put simply, climate is the statistics of weather. To gain a true understanding of climate science, students must therefore have a sound understanding of the underlying statistics. For the Atmospheric Sciences degree program at Lyndon State College, students are required to take a two-semester Climate and Statistics course. Our approach is to teach these concepts simultaneously to allow students to directly apply statistical theory to climate as they are learning about it.

Seemingly straightforward statistical concepts are quickly complicated when applied in a 3-d setting. For example, when computing a mean in a basic statistics class, students are often presented with a list of numbers from which this is computed. They simply add them up and divide by the total. The same can be done when presented with raw climate data, but the students must also decide what type of mean to compute. If students are given hourly averaged temperatures, for example, they may be asked to compute daily, monthly, annual, or climatological averages. Furthermore, such data exist for individual locations in space, both horizontal (geographical locations) and vertical (altitudes/pressure levels). Spatial means can thus also be computed in a variety of ways including global, regional, and zonal. These various temporal and spatial options are also present when computing anomalies, averaged departures, correlations between two or more datasets, etc. Such concepts are often too complicated for introductory statistics students to grasp, but they are fundamental to the science and regularly presented without explanation in scientific journal articles.

I have found that by allowing them to practice such computations firsthand, students are better able to disentangle various spatial and temporal statistical methods. Following in-class discussion about theoretical statistical concepts, I ask students to compute values (i.e., mean and standard deviation) using small sets of random numbers, similar to what might be done in a pure statistics course. Next, I have them compute the same values in MATLAB so they can see how the functions work. I then have the students jump right into much more complex applications using real climate data. The added complication of dealing with large 3-dimensional datasets slows students down and forces them to think carefully about what it means to compute statistics along latitudinal, longitudinal, vertical, and/or temporal dimensions. While mapping may seem overwhelming for a student who has only been using MATLAB for a few weeks, I've found that final product (a visually appealing figure) serves as a small reward for their effort. They are empowered knowing that they independently obtained data online and created a meaningful product.

For one such assignment (see attached figure), students are instructed to obtain global climate data form an online source. In this example we use NCEP/NCAR Reanalysis 1000-mb temperature data. Before creating global maps, students must plot a time series of actual temperatures in a specified location (here we examine Minneapolis, MN) with the climatological mean over a specified period (upper left panel). This graphical representation is relatively straightforward to most students and thus allows them to start the assignment with something familiar. Next, they must subtract the climatological mean from all data points to obtain temperature anomalies, and then divide all points by the standard deviation to obtain standardized anomalies (upper middle, and upper right, respectively). Students are then asked to discuss similarities and differences between the three figures. While students almost always start by saying that the figures are identical, they ultimately realize that the values do in fact differ. To promote this line of thinking, I often ask them to focus on one year which can be compared between the figures. Students are then instructed to compute the same values, but for just one year over the entire globe. In this example, we examine temperatures during 2015 (bottom three panels).

This exercise allows students to think about a seemingly simple atmospheric variable, temperature, in a variety of ways. While differences between the first three graphs may seem trivial, the spatial differences between the latter three are striking, even though all of the figures were generated using the same dataset. Ideally, students will conclude the following about different ways of visualizing temperature: 1) Annual average temperature differences are dominated by changes in the north-south direction, 2) local temperature anomalies allow us to examine temporal departures from climatology (i.e., is a location warmer or colder than average), and 3) local standardized anomalies also allow us to account for average variability so we can determine whether a value is particularly unusual. For example, note the particularly warm water in the Equatorial Pacific due to a strong El Nino and unusually warm values around the globe during this record year. These observations could not be made through examination of raw temperature data alone.

Only once students have had the opportunity to grapple with this assignment for hours, or sometimes days, do they learn to appreciate the differences in statistical representations of these data. By engaging in such activities, students have had the full experience of computation on which they can reflect when presented with other statistical interpretations of climate data in the future.

Downloadable version of this essay