# Using Univariate Statistics to Understand Regional Drainage Patterns

## Summary

In this activity, students use MATLAB to compare two data sets of organic matter content in order to provide quantitative evidence that tests the null hypothesis that sediment samples have the same fluvial source. Students must load the data, conduct analyses, and plot the results by writing an efficient MATLAB script, and must "publish" their code to a well-organized, well-commented, .pdf document or .html file online.

After completing this exercise, students will have a better understanding of population distributions and the statistical test to distinguish two samples. They will also gain experience producing visually clear diagrams to present the results of their analysis.

Used this activity? Share your experiences and modifications

## Learning Goals

From this activity, students will learn how to: (1) plot histograms, (2) fit normal distributions to data, and (3) conduct and interpret the results of a t-test. MATLAB functions are used to plot the distributions, fit the data, and calculate significance levels and confidence intervals. Higher-order thinking skills are practiced by the students when evaluating the results of the t-test in a geologic context. The MATLAB publishing tools is used to organize and present the results.

## Context for Use

This activity is targeted at upper-level undergraduate and intro-level graduate students in Earth Sciences and is expected to be completed as an out-of-class activity, within one week after being assigned. Students will practice writing, and commenting, a MATLAB script, will gain familiarity with elementary univariate statistical methods (and the MATLAB commands to utilize these methods), and interpret their results in a geologic/geomorphic context. Students use functions contained in the statistics toolbox. This activity is assigned early in the course, to build the students confidence with programming, as the required MATLAB skills are limited to a handful of functions and plotting commands.

## Description and Teaching Materials

The detailed activity description is provided in the Student Handout for Univariate Activity (Acrobat (PDF) 55kB Oct15 15). In this two-part activity, students will first plot histograms of, and fit gaussian distribution curves to, measurements of organic matter content in sediment samples from freshwater deltas. Then students conduct a t-test intended to quantitatively distinguish two populations as being from "unique" sedimentary source regions. Students must interpret their results in a geomorphic context and weigh in on the likelihood of tectonic rearrangement of drainage basins in the region. The second part of the activity asks the students to run a similar analysis (t-test) on data of their own choosing. For each part of the activity, students must: (1) import data and plot the data properly, (2) fit theoretical distributions to the data with the MATLAB normpdf.m function, and (3) conduct a t-test that allows them to interpret the results. The students must present the analysis with a well-documented, MATLAB script that has been published to a .pdf document or as .html and uploaded to a website for public access.
Student Handout for Univariate Activity (Acrobat (PDF) 55kB Oct15 15)

## Teaching Notes and Tips

In addition to the typical struggles that students encounter when learning to write code or work with new functions, there are several predictable issues that the instructor should be prepared to address: (1) The students will find the simple hist.m function to be inadequate as a plotting tool for this exercise. The instructor should encourage the students to conduct the "counting" calculation with hist.m, but then save the results as an output and plot the population distribution with bar.m, using clear 'FaceColor' ('none') so that the two population distributions are visible, and in different colors, on the figure. This will also make it easy to view the fitted gaussian pdf's. (2) The instructor will definitely want to review the concept of the t-test, which should include coverage of standardization, the central limits theorem, hypothesis testing, p-values, significance, and confidence limits. A good, geologically-oriented resource for this is pp. 55-75 of John C. Davis's book "Statistics and Data Analysis in Geology, 3rd edn." (Wiley). (3) The MATLAB function ttest2.m, packaged with the statistics toolbox, is what I recommend that my students use to conduct the t-test, but more ambitious instructors might have students write their own function to perform this test. (4) I recommend that the instructor select a data set themselves to carry out the same calculations for part B of the activity. I chose a non-geology example, comparing the population distributions of Earned Run Average (E.R.A.) of Major League Baseball pitchers from the National and American league, to test the hypothesis that the Designated Hitter Rule makes a notable difference in pitching statistics and may impact Hall of Fame candidacy in the long run.

## Assessment

If the student produces a code that: (1) runs cleanly, (2) produces figures that allow interpretation, and (3) is well-documented (commented), the student receives a passing grade. As with all MATLAB-type activities, there is not "one and only one correct way" of solving this assignment. The goal is to develop programming skills, so I encourage students to meet with me so I help show them the process of debugging their code to converge on something that runs.

## References and Resources

Data for this activity were derived from Trauth, Martin H., MATLAB Recipes for Earth Sciences (2nd Edn.), Springer.

Two data sets (Organic material data 2, Organic material data 3) are provided. These data are freely available from the publisher on the following site:

https://www.researchgate.net/publication/215977722_MATLAB_Recipes_for_Earth_Sciences_4th_Edition