Fostering Strategy #8: Learners use a model to make a prediction, and then use a data-driven visualization to test their prediction.

(most recent update 24jan2018) (return to workshop front page)

Contributors: All Fostering Stand participants worked on this strategy, in small groups by discipline. Physical Sciences/Engineering Group:Yuen-ying, Jung Lee, Alexey Leontvey, Sally Wu, Melissa Zrada; Life Sciences Group: Gayle Bowness, Pamela Marshall, Caleb Trujillo, Tiffany Herder; Earth & Space Science Group: Vetria Byrd, Elizabeth Joyner, Sarah Klain, Bob Kolvoord


  • Learners begin with some kind of model of a system of importance in the domain of study.
    • This can be a conceptual model that the learners have studied previously.
    • Or it can be a model that is provided to learners as part of this activity, in the form of a diagram, simulation, computational model, a physical model, or a mathematical model.
  • Based on the model, learners make a prediction about what data from the system will look like under a set of circumstances that have not yet been explicitly studied. Learners "sketch their hypothesis" to show what they think the data will look like.
  • Learners examine empirical data visualizations and evaluate whether the data align with the prediction. Data visualizations can be either provided by instructor or created by student.
  • If the data are found to not align with the model, learners, or learners with instructor, try to diagnose why: the model was wrong or incomplete? the model was misunderstood? the logic by which the prediction was generated from the model was flawed? the data were somehow inadequate?


  • Psychology: given serial and parallel mental search model, predict the time to verify if an item was in a to-be-remembered set of items for a range of set sizes (GRC workshop model and data.pptx (PowerPoint 2007 (.pptx) 76kB Aug4 17))
  • Earth Science: given plate tectonic model, predict what kinds of earthquakes are likely to occur in different regions
  • Space Science: predict what will happen to the amplitude of light from a star as an exoplanet passes between Earth and the star (Gould, et al, 2012).
  • Earth Science: given topographic map (spatial model of a terrain), predict where water will flow
  • Atmospheric Science: given a weather forecasting model, predict a severe weather event, and then use historical data to test prediction
  • Environmental Science: predict air and water quality, given a model that incorporates natural and anthropogenic factors.
  • Biochemistry: Students pull out AT/GC base pairs from a database to make predictions, using energy of bonds, about proportion of pairs in bacteria that are harmful to humans. Students plot the data on graphs to aid in their testing of predictions.
  • Biology: Use conceptual model of cell and cell membrane to predict what will happen when cells are put into different solutions: hyper, hypo, and iso. Sketch your predictions instead of writing them out.
  • Biology: Use model of photosynthesis to predict what will happen to the pH of water in 4 jars under different light conditions, with (a) water, (b) plant, (c) snail, and (d) plant & snail.
  • Biology: Use model of photosynthesis and metabolism to predict order of weight of petri dishes: dry seeds, wet/dark seeds, wet/light seeds. Students think that photosynthesis (wet/light) will be the heaviest, but the germinating one is consuming mass from the seed.
  • Climate: Given a model of climate warming, predict how species distribution will change.

Affordances of this strategy/what it is good for:

  • This strategy forces students to make a claim and back it up with data (critical thinking), and use visualizations at both the stage of making the claim (sketching the hypothesis) and at the stage of testing the claim (data-driven visualization).
  • The general benefit of using visualizations to help test predictions is that some of the cognitive burden can be shifted to the perceptual system. Assessing predictions can be easier if the match between model and data manifests as visually-available parallels.
  • Making a claim can motivate/engage the learner.
  • Students have an opportunity to grapple with the messiness of real data, and learn that real data do not exactly match the idealized simplicity of the model. This in turn gives an opportunity to learn about sources of departure from model, including actual variability in the phenomenon being measured and noise introduced during the measurement process.
  • This strategy has the learner begin with a model and go from there to data. Much of our instruction has the learner going from data to model. It is good to have learners practice the entire cycle of the scientific model.
  • We hypothesize that having the learner make a prediction (and express it as a drawing), may prepare them in some important way for viewing the data visualization. Potentially they see something more in the data visualization than they would have if they just been given the data visualization without the prediction step, because their attention is focused on specific places where a meaningful pattern may (or may not) be emerging.
  • This strategy allows students to take more ownership of the learning. Learners make a prediction, and they check their prediction against reality, not just against a right answer provided by a teacher or other authority.
  • It is possible to set up this strategy in such a way that the teacher doesn't know the answer in advance, and moreover the learners know that the teacher does not know. This adds an element of suspense, which helps with engagement, and makes the activity more like real science. The teacher should have a strong knowledge of the model and of the attributes of the data type being used for the comparison, so as to be able to guide instruction well when the not-previously-seen data emerges.
  • Learners can experience that a model that seems right and logical does not, in fact, align with reality as captured in empirical data. The appropriate grounds on which to judge a scientific model is not its elegance, but rather its ability to reproduce the behavior of the real world as captured in data.
  • This strategy can be used across many concepts and domains and many data types.
  • This can be a good strategy for teaching systems and inter-relationships: change one variable and predict how the system will be impacted. Then compare with data.
  • Can be used as a pre-lab strategy to predict what will happen; then the students' own data is used for the test.

Potential pitfalls & challenges:

  • This strategy is difficult for the instructor or instructional designer to set up, as it requires finding an appropriate model and an appropriate data set, which are well matched. There are plenty of interesting phenomena for which either the data is not available or models are not adequate (e.g. collapse of the Greenland ice sheet).
  • Building up enough understanding of the model so that students can make a prediction can be time-consuming. In the absence of this understanding, predictions may be faulty.
  • Real world data is messy. Students want "clean" data, that looks just like the model. They may have trouble recognizing a match between model output and empirical data, not because the model is wrong but because the data are messy.
  • Learners might not be convinced about the fit of the model to the data when using visualizations. There is not a quantitative way to judge the adequacy of the fit between the sketched hypothesis and the visualized data.
  • Messiness of data can be time-consuming and frustrating.
  • If a concept-driven visualization is used to convey the model, learners may have false confidence that they understand the model when they really do not. In this case, their prediction may be flawed. The flaw may or may not emerge at the data/prediction comparison step.
  • Sometimes there are competing models (e.g. hurricane prediction, weather forecasting using US and European models), so it is not a simple case of testing whether a single model is right or wrong.
  • It's a puzzle to know how the authenticity and credibility of the model should be presented to the learners. Usually scientific models are presented to students as a good and reliable product of the scientific process, or they wouldn't be provided in a STEM class. But if we want students to authentically feel that they are testing a model, then the model has to be presented as provisional or unproven; how best to do that?
  • Difficult to assess. It is not enough to score whether they said yes or no, the data confirm the prediction, or not. For the most ambitious usage of this instructional strategy, you also need to assess whether their prediction was reasonable, logical and grounded in available evidence. And then you need to assess whether they set up a rigorous data-based test of their prediction, with appropriate types and quantities of data. And finally, you need to assess how well they explained and defended the outcome of their test.
  • Instructional designer has to decide how much scaffolding to provide, aiming for enough but not too much. With more scaffolding, there will be less variation between answers, making scoring easier. (For example, if you provide the axes for the graph, all correct answers tend to look the same, while incorrect answers are obvious outliers. But if students determine their own graphing parameters, correct answers could differ widely.)

Emergent insights:

  • This strategy is a subset of a broader instructional strategy called Predict-Observe-Explain (P-O-E) (Haysom & Bowen, 2010; Kastens & Krumhansl, 2017). The power of P-O-E is thought to come, at least in part, because (a) the students are more invested in the outcome, (b) they have been primed to look for specific significant events or structures during the observation step, and (c) their work (prediction) is being tested by comparison against reality, rather than against the judgement of a teacher or other authority. Using visualizations at the "Predict" step ("Sketch your hypothesis") and the "Observe" step (with data visualizations) may be helpful in that it shifts some of the cognitive burden to the perceptual system. Assessing predictions can be easier if the match between model (prediction) and data (observation) manifests as visually-available parallels (see illustration above).
  • The pivotal moment in this instructional strategy comes when the learners compare the prediction with the data. There are several ways for things to go astray here, and it takes skillful facilitation by the teacher to turn this into a teachable moment:
    • Astray mode 1: the data do not match the prediction, and yet the learners say that they do match. This seems to be a case of people seeing what they want to see. One way forward from this situation is to have groups of learners who had reached different conclusions talk over their answers and seek consensus.
    • Astray mode 2: the broad pattern of the data matches the prediction, but there is scatter superimposed; a domain expert would say this is a match, but the learners say it is not, because they were expecting an exact match. This can be a chance to talk about sources of variability in both nature and data.
    • Astray mode 3: the data do not match the prediction; learners recognize the non-match, but misdiagnose the cause. It seems that students are quick to blame the data, especially if it is student-collected data. A simplistic view of the scientific method would say that the model is at fault in such a case, and needs to be revised. In actual scientific practice, a prudent scientist would scrutinize all possible sources for the mismatch: the integrity of the data, the applicability of the model, and the reasoning that went into the prediction itself.

Researchable questions:

We organized the questions here around supporting the three primary challenges in the process (running a model to make a prediction, comparing a prediction to data, and reasoning about patterns in messy data).
  • If accurately running a mental model of a disciplinary process is challenging, could one scaffold making a prediction using recognition in place of generation (for example with multiple choice options for the prediction as concept visualizations (this draws on the general pedagogical strategy of offering hypothesis templates)?
  • Because building understanding of the model can be time-consuming: Would having a set of known readily graspable models be useful for developing this skill?
  • A common way to engage this strategy is to offer two competing hypotheses (as in the psychology example above) and consider which model the data supports. It is possible this is more cognitively demanding than engaging using a single model. How does learning differ for single versus multiple models?

  • Instructional designers could use guidance on how much scaffolding to provide. We anticipate that the answer may depend on the topic and students skill level leading to series of questions such as, Which aspects of comparing a prediction to data are critical to support for novices?
  • There are different ways in which students might have difficulty comparing a prediction to data (see the three astray modes above). What instructional moves increase the chances of learning from failures? How does learning from the different types of errors differ among errors?

  • Recognizing that real world data is messy and learners may prefer "clean" data, is there a progression of learning to make a prediction and then compare to clean data to eventually becoming skilled comparing predictions to "messy" data?
  • Because there are different ways in which data might be "messy," students may need scaffolding on different types of messes.
  • A curated set of data with varying degrees of messiness might be useful to scaffold students at different points in their learning. Would each discipline need to develop its own set of such data?

References & Credits:

  • Gould, P., Sunbury, S., & Krumhansl, R. (2012). Using online telescopes to explore exoplanets from the physics classroom. American Journal of Physics, 80(5), 445-451.
  • Haysom, J., & Bowen, M. (2010). Predict, Observe, Explain: Activities Enhancing Scientific Understanding. Arlington, VA: NSTA Press.
  • Kastens, K. A., & Krumhansl, R. (2017). Identifying curriculum design patterns as a strategy for focusing geoscience education research: A proof of concept based on teaching and learning with geoscience data. Journal of Geoscience Education, special issue on Geoscience Education Research, v. 65, p. 373-392.
  • Lehrer, R., & Schauble, L. (2003). Symbolic communication in mathematics and science: Co-constituting inscription and thought. In Language, Literacy, and Cognitive Development: The Development and Consequences of Symbolic Communication (pp. 167–192).