Fostering Strategy #5: Learners create data-driven visualizations to answer a question, test a hypothesis, or tackle a problem
(most recent update 24jan2018) (return to workshop front page)
Contributors: Jung Lee, Caleb Trujillo
Simplest (least time-consuming) format:
- Instructor provides access to numerical data.
- For young learners, data could be in the form of a table on paper.
- For older learners, data would normally be provided digitally, or learners would access online data repositories.
- Instructor poses a problem or asks a question or states a hypothesis to be tested.
- Learners have to figure out what types of data visualizations would be useful to answer the question or solve the problem or test the hypothesis. Various levels of scaffolding are possible for this step.
- Learners make the visualizations. Depending on audience and setting, technology could be:
- graph paper and pencil,
- specialized software (e.g. ArcGIS),
- general purpose software (e.g. Excel),
- an online data visualization tool (e.g. US Geological Survey stream data), or
- data analysis programming environment (e.g. "R").
- Learners use the visualizations to develop and articulate an answer or solution.
More ambitious (more time-consuming) variants:
- Students collect the data rather than having it be instructor-provided.
- Students develop their own question to research.
Distinctions: This strategy differs from Strategy 1 in that in this strategy, learners make driven-driven visualizations built from quantitative empirical or model-base data, whereas in Strategy 1, they make visualizations based on concepts, hypotheses or theories (such as diagrams, flow-charts or concept sketches).
- Students collect large data sets of leaf color change and leaf loss from different trees around campus over the course of the semester and store these in a database. As a final project, students analyze the data to uncover the variation of seasonal changes in the data and visualize this in a report.
- Students gather data from a genomic data base on attributes of different bacterial genomes and attempt to visualize the variation of the GC content in the DNA.
- Students visiting the Gulf of Maine Research Institute tackle the problem of estimating how old a fish is from its size. They learn how to determine a fish's age from the rings in its otolith (a bone of the ear). Then they are given a data set of fish size and otolith age and make a data visualization (graph) which can be used to estimate the age of fish for which only the size is known.
Affordances of the strategy/what it is good for:
- Learners work directly with data and are engaged in interpretation, analysis, and hypothesis testing in the same way a researcher does, experiencing the important elements of the process of scientific inquiry.
- Strategy emphasizes the importance of connecting questions and hypotheses to data, looking to data for answers.
- Students can test anecdotal claims against the data, which gives them an opportunity to test their own ideas.
- Learners engage with appropriate technology for data analysis and visualization, where "appropriate" varies with the context and audience.
- Strategy affords student group work and discussion as they work through messy data and potentially confusing graphics. The data visualizations, in all their variations, provides the raw material for "discourse over materials."
- Reporting, writing, and communication with data is an inherent part of the task. In more ambitious variants, reports go beyond canned projects with known outcomes.
- Students engage in novel pursuits since data and databases are often new to students.
- Making data visualizations prepares students for project-based learning.
- This same strategy can work across many disciplines, as many fields have databases that can be used for data-driven visualizations and projects. Ideally instructors in different disciplines could coordinate pedagogical approaches so students encounter some consistency across courses.
- Programming, group work, and working with messy data are skills useful for many careers.
- In more ambitious variants, students may gain opportunities to disseminate their findings with potential for impact.
- This strategy helps prepare students for interdisciplinary and team science learning opportunities, in which each student brings expertise in his or her own data types to tackle an interdisciplinary problem.
Potential Pitfalls & Challenges:
- Potential for long work flows if trying to work across software.
- The data available may be limited or biased.
- Finding appropriate data is time-consuming--for either the instructor or the student, or both.
- Comes with the frustrations of doing real research.
- May require scaffolding and mentoring to guide students.
- May result in unanswered questions.
- Interpreting, analyzing, and connecting to the data is very demanding and can be frustrating; instructor may have to work on the emotion and motivation to work on these demanding skills.
- Time-consuming, can stretch into week-long to semester-long project. Even the shortest format, where question and data are provided by the instructor, takes much more time than merely have instructor or textbook show and explain the visualization.
- This is a strategy that works well across all disciplines of science and social sciences--any field in which claims are based on reasoning from data. There is a need for better understanding of whether and how experience making and interpreting data visualizations in one field supports (or potentially interferes with) doing the same in another field.
- Instructors using this strategy tend to be focused on content learning goals having to do with understanding the referent (the system from which the data were taken). There is a need for simple, quick instructional moves that can insert a subtle nudge towards visualization mastery into a lesson which is mainly about something in the content discipline. For example, in a graphing activity, how can the instructor ask about how should we decide what to put on the vertical axis and what to put on the horizontal axis in way that this insight will generalize to other x-y plots rather than merely serving the day's lesson?
- Proficiency in designing/making/interpreting data visualizations can be built up over many years of schooling, from elementary school kids with their colored pencils and graph paper through doctoral students with their computational data manipulations of complex data sets. There is need for a well-articulated learning progression that encompasses this full trajectory of learning--and is not tied to a specific content domain. Building blocks for such a progression can be found in the work of Tufte (2001) and Kosslyn (2006).
- Although much of the discussion in this group focused on advanced students in ambitious projects with complex datasets that they may have collected themselves, that is way too late to begin having students make their own data visualizations. There is a need to permeate the K-12 and intro level college curriculum with numerous and varied opportunities for students to make data visualizations and use them to answer questions and solve problems.
- In some educational settings, the only time students make their own data visualizations is when they collect data in the context of an inquiry. Since it takes considerable time to plan and execute an inquiry that generates substantial data, this means that students' opportunities to create data visualizations may be infrequent and rather late in the educational trajectory. With the vast stores of professionally-collected data now available via the internet, there are ample opportunities for students to make insight-yielding data visualizations from existing data. We need good curricular models of how to make this pedagogically valuable, beginning as early as elementary school. One way to scaffold this learning curve is for the instructional designer to pre-select datasets that are known to have a high insight:effort ratio.
- What is the added value of having students create their own data visualization and then interpret it, above and beyond having the same students interpret a visualization provided by the instructor?
- What degree of noisy data is appropriate for giving students? Inherent problems may exist within some of the data sources that may be difficult for novices to qualify, correct, or find missing information. On the other hand, real data are, in fact, messy.
- What are appropriate ways to evaluate student project work when the questions, hypotheses, and data are unique to each student or student group?
- Project-based learning often has students creating data visualizations in situations where the data and its implications are consequential for the students and/or stakeholders. In theory, this has the potential to deepen students' engagement with data and their desire to master ways to use data to answer questions and solve problems. How can we assess whether and how project-based learning leads to learning gains on data visualization mastery that will be transferable outside the specific context of that project?
- Under what circumstances does creating data visualizations strengthen learners' propensity to place a strong weight on data-based lines of evidence? When theory and data disagree, some learners have a tendency to discount the data and use the theory alone to formulate their interpretation. Likewise, when data and prior opinions disagree, some people stick with their opinions and disregard the data. Does making data-driven visualizations help with either of these situations, and if so, how?
- Students these days arrive at college with widely differing levels of data visualization and data interpretation mastery. What instructional practices can a college STEM instructor or college STEM program put in place to bring the less well prepared students into the mainstream of college STEM instruction?
- Tableau Desktop (https://www.tableau.com/products/desktop)
- GEPHI - open source tool (Network Diagrams) (https://gephi.org/)
- ParaView (https://www.paraview.org/) or VisIt (for Scientific Visualizations) (https://wci.llnl.gov/simulation/computer-codes/visit/)
- D3.js (interactive visualization - Data Driven Documents) (https://d3js.org/)
- Microsoft Excel (https://products.office.com/en-us/excel)
- NodeXL (for Network Diagrams) https://nodexl.com/
- GIS software " ArcMap (http://desktop.arcgis.com/en/arcmap/) , QGIS (http://www.qgis.org/en/site/about/index.html)
- Fathom (https://fathom.concord.org)
- GapMinder (gapminder.org)
- Chimera, for chemistry and biochemistry (https://www.cgl.ucsf.edu/chimera/)
- Bowen, G. M. (2014). The Basics of Data Literacy: Helping your Students (and You!) Make Sense of Data. Arlington, VA: National Science Teachers Association
- Cobb, P., & McClain, K. (2014). Guiding inquiry-based math learning. In K. Sawyer (Ed.), Cambridge handbook of learning sciences (pp. 171"185).
- Kosslyn, S. M. (2006). Graph Design for the Eye and Mind Oxford: Oxford University Press.
- Lehrer, R., & Schauble, L. (2002). Investigating real data in the classroom: Expanding children's understanding of math and science. New York: Teachers College Press.
- Roth, W.-M. (1996). Where is the context in contextual word problems?: Mathematical practices and products in Grade 8 students' answers to story problems. Cognition & Instruction, 14, 487-527.
- Roth, W. M., & McGinn, M. K. (1994). Graphing: Cognitive ability or practice? Science Education, 81, 91-106.
- Tufte, E. (2001). The Visual Display of Quantitative Information: Graphics Press.