About DataSheets

What are DataSheets? | Why are DataSheets Important? | Who Can Use DataSheets? |

What are DataSheets?

A DataSheet is a single-page educator-friendly description of a dataset. It provides the key pieces of information educators need to understand how a dataset relates to their curriculum and to start working with the data. DataSheets are also ideal locations from which to provide links to related research and educational materials such as existing classroom activities that use the data.

The DataSheet format provides a consistent framework for organizing this information. The particular fields and their presentation reflects the insights derived from a number of discussions within the geoscience education community about educational use of data.

The DataSheet Format


This section should include the title for the datasheet in one of the following formats:

Exploring 'x' in the classroom using 'y' data (where x is a topic and y is the source or type of data).
Example: Exploring Population Dynamics in the Classroom using National Marine Mammal Laboratory Data.

Exploring 'x' data in the classroom (where x is the data source and/or type)
a. Example: Exploring USGS streamflow data in the classroom.


This section should indicate who prepared the datasheet and acknowledge experts consulted or interviewed in the process of preparing the datasheet. For example:

This webpage was created for SERC by Heather Rissler in consultation with Bryan Dias of the Reef Environmental Education Foundation.

This section should contain links directly to the data as well as the homepage for the site providing the data (when available). A link to the data should be listed first, followed by a link to the homepage. For example:

Access Transect station data from the Phytoplankton Ecology Program (more info) at the Mote Marine Laboratory

This section should contain a brief summary of the data set described in the datasheet. The summary should include a brief description of the type of data, how it is presented, and its geospatial extent. There should be enough information for users to decide whether they are interested in exploring the data set. For example:

The site provides processed data in graphical form illustrating salinity, temperature, fluorescence, and density for a transect station in the Gulf of Mexico near Sarasota Springs, FL.
Use and Relevance

This section should discuss the importance of the data. It should concisely describe how scientists use this data; what questions it helps answer, and how it helps answer them. It should describe why those questions are important to science as well as their relationship to issues effecting society more broadly. For example:

The Mote Marine Laboratory Phytoplankton Ecology Program focuses on microscopic plants in the oceans, many of which produce harmful toxins. The program has a particular focus on the marine dinoflagellate Karenia brevis responsible for the Florida red tide. Eating red tide infected shellfish can be fatal to humans. Red tides are controlled by a variety of factors including nutrient availability and viral infections (see Review). Scientists use data generated from the Phytoplankton Ecology Program to better understand conditions under which red tide blooms develop.

Use in Teaching

This section serves as a heading for the teaching topics, and teaching skills sections below. This section should include a photo representing what the data looks like. It should include an introductory sentence of the form:
This data can be used to teach the following topics and skills in 'x' (where 'x' is one or more disciplinary area). For example:

This data can be used to teach the following topics and skills in physical or environmental oceanography:

Teaching Topics

This section contains an unordered list of specific science topics that can be addressed with the data set. Topics are issues or questions that are typically addressed within one or two lecture periods or less. Links to any classroom activities that use this data set should be provided beneath the corresponding topic. These activities should also be listed again in the 'Education Resources' section. For example:

  • Harmful algal bloom dynamics and prediction methods
  • Temperature-depth relationships
  • Relationships between temperature, salinity, and density
  • Understanding the use of CTD casts in making oceanographic measurements
Teaching Skills

This section contains an unordered list of specific skills that student may exercise in working with the data set (such as interpreting vertical transects data and their representation on maps). Activities that can be used to teach these skills in the context of this data set should be linked to beneath the relevant skill. These activities should also be listed again in the 'Education Resources' section. For example:

  • Using data to make hypotheses about factors that may induce algal blooms
  • Using hypotheses to make predictions about factors leading to algal blooms and testing these predictions
  • Using the data to make visualizations of temporal changes
  • Interpreting transect and vertical profile data and their representation on maps

Exploring the Data

Data Type and Presentation

This section should explain the nature of the data (e.g. raw, processed, and modeled) and how the data is presented (e.g. graphically, tab-delineated text file). For example:

Raw data is processed and represented as images in GIF format. Images (separate for each measured parameter) are archived for the years 1998 to 2004.
Accessing Data

This section should explain how to obtain the data. This should include specific guidance on how to find the data within the site and what exactly will be available when they reach the data. For example:

Data is provided as link to dates for CTD measurements. By choosing a specific date, users gain access to GIF files containing processed data in the form of maps that illustrate transect and vertical profile data.
Manipulating data and creating visualizations

This section should suggest ways in which students can manipulate the data to generate visualizations. It should explicitly state that these suggestions are only 'one way' that students might visualize the data. (Unless the nature of the data is such that only one process will work). For example:

One way that students can process this data is to create graphs from raw data (provided in HTML tabular format and tab delineated text files) using a spreadsheet application such as Excel. Graphs could be used to visualize streamflow temporally and spatially and to display the relationship between gage height and streamflow. This data set could be combined with precipitation data sets to create graphical representations of streamflow-precipitation relationships.

Tools for Data Manipulation

This section should describe tools that can be used to work with the data. When possible provide information on obtaining the tools and links to relevant tutorials and tool documentation. This section should also indicate whether or not there are data manipulation tools integrated into the data site. For example:

The USGS site does not provide tools for data manipulation. Raw data can be downloaded and imported into a spreadsheet application such as Excel for further processing. The Starting Point site provides a tutorial for using Excel. Surf your Watershed: An example from Integrating Research and Education that guides users through the EPA's Surf your Watershed tool, which incorporates data from multiple sites, including USGS streamflow data.

Acronyms, Initials, and Jargon

List and define acronyms, initials, or discipline-specific jargon users will encounter.

RAMP = Radarsat Antarctic Mapping Project

About the data

Collection methods

This section should provide details on how the data is collected (including information on instrumentation, transmission of data, and post-processing of data). For example:

Collection methods have varied historically. The U.S. Geological Survey uses stream-gaging systems to measure water height, with data being transmitted to stations via telephone or satellite. Manual methods for directly measuring or inferring streamflow (discharge) data from gage height have been replaced by Acoustic Doppler current profilers that use sound waves to measure velocity, depth, and path (which are used to calculate streamflow rates).

Limitations and sources of error

This section should describe limitations and sources of error related to data collection, and processing as well as limits inherent in any underlying model or representation (e.g. there may be factors relevant to the underlying scientific question that the data set does not explicitly address). It should indicate how these limits circumscribe the applicability of the data set and conclusion drawn from it. When applicable, provide a link to a section of the data site or a reference to a paper discussing error in the particular data set. For example:

Limits to this data vary historically as current methods for directly measuring discharge offer an alternative to inference of this parameter. The article 'Stream Flow Measurement and Data Dissemination Improve' discusses issues related to streamflow data quality.

References and Resources

Scientific references that use this dataset

This section should contain an unordered list of scientific references (research articles) which use or are about the data set described in the datasheet. Up to 5 key research articles should be provided, or when applicable a link to a bibliography of the data set can be provided. For example:

A bibliography (link) is available highlighting publications from the Broadband Seismic Data Collection Center.
Education resources that use this dataset

This section should contain an unordered list of educational resources: references to papers or links to websites that describe using the data in the classroom, or that describe activities using the data. These resources are also included with the appropriate skills and topics in the "Use in Teaching" Section. For example:

Education and Outreach Based on Data from the Anza Seismic Network in Southern California is an article from Seismological Research Letters that describes collaborations amongst scientists and the community to provide earthquake education for the public and local school communities.
Other related scientific references

This section should contain an unordered list of scientific references: review articles or research articles that discuss topics and concepts related to the data set or similar data sets. These articles should be relevant to users who are working with the data set and need additional background on the related science. For example:

  • Earthquake prediction: A seismic shift in thinking is a article from Nature that discusses the debate regarding accuracy in predicting earthquakes.
  • Mantle Convection and Plate Tectonics: Toward an Integrated Physical and Chemical Theory is an article from Science that reviews the physics of plate tectonics.
Other related education resources

This section should contain an unordered list of broader pedagogical references: papers and links describing activities or pedagogical approaches that cover the same science topics addressed by the data set, or address pedagogical concerns relevant to working with the data of this type. For example:

  • The Broadband Seismic Data Collection Center maintains an education section with activities of relevance to students and teachers.
  • The Earth Exploration Toolbook has a chapter on Investigating Earthquakes: GIS Mapping and Analysis that uses USGS and IRIS data to conduct GIS analyses. Users interpret earthquake distribution and activity and analyze the potential for predicting future earthquakes.

Related Links

This section should contain an unordered list of any additional websites that may be helpful for users who are interested in the data set described in the datasheet. For example:

  • The Seismological Society of America website contains information on earthquakes and a collection of issues related to teaching about earthquakes.
  • The USGS Earthquakes Hazard Program provides earthquake data and educational activities.
  • An earthquake preparedness fact sheet is available from FEMA.

Why are DataSheets Important?

"As the role of data in our world grows, it is increasingly important that students be empowered to use data and to overcome any sense of intimidation in the face of data. Students on their way to becoming informed voters, consumers, citizens, and scientists must develop a strong understanding and facility for using data". (from Manduca, C. and Mogk, D. (2002). Using Data in the Classroom: Workshop; Carleton College.)

Developing public understanding of complex scientific phenomena requires that scientific data be connected with pedagogy and materials that support learning about and with data. The wealth of data that is currently available allows faculty new opportunities to engage students in the practice of science in the classroom. However there are major barriers to realizing these opportunities. Sites with rich primary scientific data are usually organized for use by an audience of professional scientists: often experts within a particular sub-discipline. These sites assume a deep expert knowledge on the part of their users.

Educators (and scientists working with data outside their field of expertise) often lack the tacit contexual information and specific technical insights that allow the expert users to work meaningfully with the data. The data is often presented without any explicit connection draw to the larger scientific framework in which it is relevant as this information is obvious sub-text for the expert users. Effective use of the data is often dependant on technical details and idiosyncrasies familiar to the expert community but not clearly documented in a manner transparent to other educators and scientists.

In recognition of these challenges, members of the geoscience education community working through the DLESE Data Access Working Group (DAWG) identified the following criteria for data sites that wish to be maxiumally accessible to educators:

  • Data site allows users to easily find and use appropriate data of interest.
  • Data site allows users to ascertain the quality of data and determine the impact of data quality of the certainty of their conclusions.
  • Data is provided in ways that facilitate manipulation through a variety of tools.
  • Data site supports, through these tools, data manipulation to answer questions by using data contained within the site or combined with data from other sites, generating appropriate visualizations, or comparing student's own data to that in the site.
More detailed information on each criteria can be found at the Data Site Criteria for Education section of the Using Data Portal.

Since reworking all data sites to align with these guidelines is impractical, and because most of the recommendation speak to the need for additional contexual information the DataSheet format was created. It represents a formal structure for presenting the minimal set of additional contextual information that educators might need to understand a particular data site. By using a consistent format a broad collection of DataSheets can be created which is easily mined by educators for datasets that align with their curricular goals.

Who Can Use Data Sheets?


Educators can use DataSheets to discover new data sets they might use with their students, gain guidance on effective use of data sites they may have discovered through other routes, and find existing activities and resources related to a particular type of data.


DataSheets may be a useful tool for advanced students doing independant exploration of data or who want deeper background and access to the authentic science behind what they are learning in the classroom. The larger collection of DataSheets also provide a uniquely accessible window into the scope of current scientific work in some areas.


DataSheets are a natural starting point for scientists beginning work in an area outside their field of specialization. And of course for scientists involved in data-producing projects the creation of a DataSheet is a relatively simple step toward making their work more accessible to a larger audience.

The Public

DataSheets provide any curious individual with a low-barrier starting point for understanding the relevance of ongoing science projects and an opportunity to start to explore the real numbers behind the science.

Creating DataSheets

If you are an educator with experience using a particular data set or are involved in a project with a data rich website and would like to be involved in the creation of a DataSheet please get in touch.