Data Analysis

Content on this page is derived from participant presentations, discussions, and breakout groups at the 2016 Teaching Computation in the Sciences Using MATLAB workshop as well as Teaching with Data from Pedagogy in Action.

Using data is an important part of any science course, and this is especially true in courses that teach computational skills. Successful graduates need to be familiar with data collection, processing, analysis, visualization, and interpretation. Teaching with data in courses can help students develop these proficiencies. Teaching with data can also improve student problem solving capacities, quantitative reasoning skills, understanding of research ethics, and attitudes towards the sciences. When the types of data, research strategies, and student engagement styles used in the classroom are tied to specific learning goals, students can better develop and identify the data analysis skills they need. Students need to be able to visualize data, transform data to aid in visualization, and conduct analyses. Employers are looking for students that are able to think critically and solve problems, organize and evaluate information, and understand and manipulate data. Instructors can increase student success and learning by giving students multiple varied opportunities to analyze data in courses that deal with computation.

Jump down to: Challenges and Solutions | Resources

« Back to Teaching Computation in the Sciences


Approaches to Teaching Data Analysis

The way in which instructors expose students to data and data analysis depends on the style of experience they want students to have. Instructors can vary how students engage with data, the strategies students use to collect of analyze data, and the types of data used. These strategies, developed and described in Pedagogy in Action, are summarized below:

Student engagement with data

  • Watching - Instructors model effective data analysis and students learn by watching (least active)
  • Replication - Replication of existing data or analyses; prepared experiments
  • Guided Analysis - Data analysis with limited intervention but without a defined outcome
  • Problem-Directed Discovery - Students are given a problem or question to answer with limited to no guidance
  • Open-ended Discovery - Undirected, independent student engagement with data

Research strategies

  • Experimentation - Students engage in controlled experiements
  • Comparison - Students identify and quantify differences between different variables
  • Modeling - Students develop computer-based, physical, or conceptual representations of systems
  • Description and Measurement - Students perform systematic observation and cataloging

Data types

  • Processed - Data that has been "cleaned" or manipulated in preparation for student use
  • Published - "Raw," often large datasets that students can analyze
  • Simulated - Model-generated data
  • Student-generated - Student-collected (or generated) datasets that students can analyze

Examples from workshop participants:

Processing data to understand climatology
Click to view
Janel Hanrahan (Lyndon State College) presented on her activity on analyzing data to understand climatology. In the assignment, students replicate maps and plots of land and ocean temperature anomalies.

Download the presentation (PowerPoint 2007 (.pptx) 7.8MB Oct20 16)

A principled way to principal components analysis
Click to view
Daniel Zysman (MIT) presented an activity on principal components analysis that he uses to help students visualize large data sets, transform the data to aid in this visualization, cluster data, implement basic linear algebra operations, and connect this operation to neuronal models and brain function. Learning outcomes are for students to: visualize and manipulate relatively large and complex data sets, perform principal component analysis (PCA) by building it step by step, gain intuition of the geometry involved in a change of basis and projections, start thinking about basic clustering algorithms, discuss dimensionality reduction and other PCA applications, discuss assumptions and limitations, and build a model related to neural circuits using PCA.

Download the presentation (PowerPoint 520kB Oct20 16)

Data Analysis Challenges (and Solutions)

Teaching students proper coding and computational thinking skills

  • Start with fully functioning code; use to illustrate and discuss
  • Introduce well commented "skeleton code" and have students fill in blanks
  • Code interpretation exercise: provide code similar to one students have seen and have them add detailed comments
  • Write framework of comments intended to guide code writing, have instructor check it, and then have students write it
  • Write a complete code based on a specific question
  • Example: Modeling an Neuron Action Potential in Matlab by Marjorie Hubbard (North Carolina School of Science and Math). In this activity students use a "skeleton code" to complete a model and investigation of neuron action potential.

Assessing what makes an effective data visualization

  • Print out and critique plots for effectiveness in a gallery walk or "Rogues" gallery walk (series of plots, with purposeful mistakes)
  • Make a range of plot types and styles and have students discuss which ones are relevant to the problem
  • Use real data appropriate to the problem, bonus for societally relevant hook
  • Guide student interpretation with examples and interpretive questions
  • Example: Gravity prospecting by James Conder (Southern Illinois University - Carbondale). Students are given a set of gravity data with the aim of finding and visualizing high density anomalies in the subsurface.
Indexing and referencing large datasets
  • Teach referencing by manipulating/modifying large datasets (e.g. make a black stripe on an image in a given area)
  • Demonstrate efficiencies of different methods by measuring run-time of for-loop indexing vs. indexing (via tic and toc)
  • Example: Solution of an Equation by Using MATLAB by Mahmud Akelbek (Weber State University). Students construct computer code to find he solution of an eqaution and test their results for different problems.

Data ingestion and management in MATLAB

  • Give students exercises to write low level text read functions
  • Teach students how to pre-condition data before use by MATLAB to make it readable by the built-in functions, or supply preconditioned data
  • Student exercises can address methods of ingesting different file formats
  • Example: Monitoring Algal Blooms with Landsat (OLI) by Andrew Fischer (University of Tasmania). Students utilize MATLAB to access, process and extract data from a Landsat 8 OLI remote sensing data to investigate the cause an management of algal blooms.

Resources for Developing Students' Data Analysis Skills

Community

« Back to Teaching Computation in the Sciences