Initial Publication Date: March 16, 2017

# Data Analysis

Content on this page is derived from participant presentations, discussions, and breakout groups at the Teaching Computation with MATLAB workshops as well as Teaching with Data from Pedagogy in Action.

Using data is an important part of any science course, and this is especially true in courses that teach computational skills. Successful graduates need to be familiar with data collection, processing, analysis, visualization, and interpretation. Teaching with data in courses can help students develop these proficiencies. Teaching with data can also improve student problem solving capacities, quantitative reasoning skills, understanding of research ethics, and attitudes towards the sciences. When the types of data, research strategies, and student engagement styles used in the classroom are tied to specific learning goals, students can better develop and identify the data analysis skills they need. Students need to be able to visualize data, transform data to aid in visualization, and conduct analyses. Employers are looking for students that are able to think critically and solve problems, organize and evaluate information, and understand and manipulate data. Instructors can increase student success and learning by giving students multiple varied opportunities to analyze data in courses that deal with computation.

Jump down to: Challenges and Solutions | Resources

## Approaches to Teaching Data Analysis

The way in which instructors expose students to data and data analysis depends on the style of experience they want students to have. Instructors can vary how students engage with data, the strategies students use to collect of analyze data, and the types of data used. These strategies, developed and described in Pedagogy in Action, are summarized below:

Student engagement with data

Research strategies

Data types

### Examples from workshop participants:

Click to view
Janel Hanrahan (Lyndon State College) presented on her activity on analyzing data to understand climatology. In the assignment, students replicate maps and plots of land and ocean temperature anomalies.

Click to view
Daniel Zysman (MIT) presented an activity on principal components analysis that he uses to help students visualize large data sets, transform the data to aid in this visualization, cluster data, implement basic linear algebra operations, and connect this operation to neuronal models and brain function. Learning outcomes are for students to: visualize and manipulate relatively large and complex data sets, perform principal component analysis (PCA) by building it step by step, gain intuition of the geometry involved in a change of basis and projections, start thinking about basic clustering algorithms, discuss dimensionality reduction and other PCA applications, discuss assumptions and limitations, and build a model related to neural circuits using PCA.

## Data Analysis Challenges (and Solutions)

Teaching students proper coding and computational thinking skills

• Introduce well commented "skeleton code" and have students fill in blanks
• Code interpretation exercise: provide code similar to one students have seen and have them add detailed comments
• Write framework of comments intended to guide code writing, have instructor check it, and then have students write it
• Write a complete code based on a specific question
• Example: Modeling an Neuron Action Potential in Matlab by Marjorie Hubbard (North Carolina School of Science and Math). In this activity students use a "skeleton code" to complete a model and investigation of neuron action potential.

Assessing what makes an effective data visualization

• Print out and critique plots for effectiveness in a gallery walk or "Rogues" gallery walk (series of plots, with purposeful mistakes)
• Make a range of plot types and styles and have students discuss which ones are relevant to the problem
• Use real data appropriate to the problem, bonus for societally relevant hook
• Guide student interpretation with examples and interpretive questions
• Example: Gravity prospecting by James Conder (Southern Illinois University - Carbondale). Students are given a set of gravity data with the aim of finding and visualizing high density anomalies in the subsurface.
• Example: Data Analysis Activity Using MATLAB by Michael Ray (California State University-Sacramento). Students perform an experiment, collect the data, analyze the data, and produce a high quality graph that is used to show the results of their experiment.
Indexing and referencing large datasets
• Teach referencing by manipulating/modifying large datasets (e.g. make a black stripe on an image in a given area)
• Demonstrate efficiencies of different methods by measuring run-time of for-loop indexing vs. indexing (via tic and toc)
• Example: Solution of an Equation by Using MATLAB by Mahmud Akelbek (Weber State University). Students construct computer code to find he solution of an eqaution and test their results for different problems.

Data ingestion and management in MATLAB

• Give students exercises to write low level text read functions
• Teach students how to pre-condition data before use by MATLAB to make it readable by the built-in functions, or supply preconditioned data
• Student exercises can address methods of ingesting different file formats
• Example: Monitoring Algal Blooms with Landsat (OLI) by Andrew Fischer (University of Tasmania). Students utilize MATLAB to access, process and extract data from a Landsat 8 OLI remote sensing data to investigate the cause an management of algal blooms.