Project EDDIE: Environmental Data-Driven Inquiry & Exploration

Jump to Project Goals / About Large Data Sets / Data Sources / References / Project Support

Scientists are increasingly using sensor-collected, high-frequency and long-term datasets to study geological and environmental processes. Our interdisciplinary team of faculty and research scientists has developed flexible classroom modules that aim to expose undergraduate students to such real-world experiences. These modules utilize large, long-term, high-frequency and sensor-based datasets that can be used in a variety of introductory, mid-level, and advanced courses that meet a series of pedagogical goals, allowing students to: (i) manipulate large datasets to conduct real-world, inquiry-based investigations; (ii) develop reasoning about statistical variation; and (iii) become excited about first-hand experiences with the scientific process. Each module requires students to collect data from online sources, such as discharge and water quality data from the US Geological Survey, ecosystem carbon dioxide flux data from FLUXNET, lake temperature data from the Global Lake Ecological Observatory Network, and seismic data from the Incorporated Research Institutions for Seismology.

Project Goals

Our objective is to develop stand-alone modular classroom activities for undergraduate students using large long-term and high-frequency datasets framed by the following pedagogical goals:

  1. Develop skills required to manipulate large datasets.
  2. Conduct inquiry-based investigations.
  3. Develop students' reasoning about statistical variation.
  4. Engage students in authentic scientific discourse.
  5. Foster conceptions about the nature of environmental science.

What are Large Data Sets?

Across science and engineering fields, the analysis and synthesis of large datasets is increasingly common. In many ways, the environmental sciences, including earth science and ecology, are undergoing an "informatics" revolution, with networks of sensors and people generating unprecedented amounts of data at a range of spatial and temporal scales.

Both long-term and high frequency datasets are typically large and complex, containing many variables, multiple sites, missing data points, and incorrect sensor readings. Large datasets can be long-term data collected manually over many years. High-frequency data generated by automated sensor-based systems (Schimel & Keller, 2015, Benson et al. 2009), are increasingly being used to measure and record data for multiple parameters at high frequencies (readings every 15 minutes or even more frequently) and over long time spans (years). These sensors provide records of change that are essential research and monitoring tools. Sensor technologies are now used to collect high-frequency data on ecologically relevant variables ranging from soil moisture to stream conditions to correlating animal movements with environmental conditions. From a practical perspective, large datasets are ones for which there is more data than can be easily viewed on a single computer screen, thus necessitating the use of software keyboard commands and graphing as ways to conduct initial explorations of these data. Thus young scientists should have opportunities to learn how to manage, analyze, and interpret large datasets.

Where Do Our Data Sets Come From?

The table below lists current module topics and data sources.

ThemeHigh frequency/long-term datasets and online sources
Climate Change: Atmospheric CONOAA Earth System Research Laboratory: Mauna Loa CO2
Lake Ice PhenologyNational Snow and Ice Data Center: Global Lake and River Ice Phenology
Climate Change: Air TemperaturesNOAA Earth System Research Laboratory: NCEP/NCAR Reanalysis Air Temperatures
Flood FrequencyUSGS: River Discharge
Food WebsGlobal Lake Ecological Observatory Network: Chlorophyll and Cyanobacteria
Climate Change: Physical LimnologyGlobal Lake Ecological Observatory Network: Temperature Profiles
Soil RespirationFLUXNET: Atmospheric CO2 Global Lake Ecological Observatory Network: Dissolved Oxygen


To assess achievement of the pedagogical goals during the 2014-15 and 2015-16 academic years, we used pre- and post-module student questionnaires. This information allowed us to determine whether our modules were effective at engaging students and increasing their quantitative skills, and to revise modules prior to widespread online dissemination in 2016. Our initial results suggest that students who complete an EDDIE module had significantly improved spreadsheet skills, an increased understanding of how to use large datasets, and a greater appreciation for the value of high-resolution and long-term data. Thus, in addition to developing critical data management skills, working with large datasets cements the real-world application of basic geological and environmental concepts.

Project Partners

CMS Logo
Cary Institute
Fairfield University
Whitman College

University of Arizona
University of Colorado Boulder
Queens College
SNY new paltz

Virginia Tech
SERC footer logo

Project Support

Project EDDIE is supported by funding from NSF DEB 1245707 and by the National Association for Geoscience Teachers.

NSF logo
NAGT logo

      Next Page »