Types of Data

Initial Publication Date: October 22, 2013

Processed data: Data can be "processed" in many senses. They many have been aggregated from other results (e.g. global temperature measurements originate from thousands of measures at individual weather stations). Or the original data may have been cleaned or transformed to make them ready for quantitative analysis. Or the data may be "made up" as in "canned experiments" to ensure the data clearly demonstrate a particular trend or phenomenon. Usually the consequence of processing is to free students from significant time spent getting data into usable shape. That simplification allows students to focus on methodological or analytical questions.

Published research data: In some fields, published research data is ready for student's to analyze. In other fields, "raw" research data often requires a bit of work so that it is ready for analysis. In some cases, published datasets can be very massive, requiring selection of pre-determined variables or multiple queries of the dataset to collect the data needed. While this work takes time, it teaches many lessons about the nature of the research process. For example, students learn that what one researcher calls "cleaning" another may deem "cherry-picking." Requiring that students confront the challenges of raw data allows them to weigh the many small choices (and related ethical implications) that must be made as raw data become final research results.

Simulations: Model-generated data are commonly used by professional scholars in their own work. By working with simulation data--particularly data which they have generated themselves--students can come to appreciate the power and limitations of modeling. For example, they might realize that models are at times the only way they can generate answers to questions when systems are very complex or if they need to make predictions about states of the world that do not presently exist. They also might come to see the critical role of assumptions inherent to interpreting the results of any model.

Data provided by civic partners: Civic engagement (CE) has been recognized among key outcomes of a liberal education. (See, for example, the Association of American Colleges & Universities' LEAP initiative .) Many CE projects involve analyzing data collected by the civic partner. Sometimes students can be involved in creating the data collection plan. Other times, the data already exist and students are asked to contribute via problem-directed discovery.

Student-generated data: Students can collect data in surveys, draw on direct observation (using research instruments as crude as a ruler or as advanced as a mass spectrometer), or generate simulations (see the PhET Interactive Simulations for one example). The process of collecting data underscores the fact that measurements are inherently imprecise and this imprecision must inform subsequent analysis and interpretation. Student-generated data may also be used by other students. The GLOBE project (more info) is a particularly large example in which student measurements of environmental variables from around the world have been collected and made available for other students to analyze. But the principle can be applied on a single campus by having students in one course collect data that are subsequently analyzed in another.