Analyzing datasets in ecology and evolution to teach the nature and process of science

Rebecca M. Price, University of Washington Bothell
Susan M. Waters, Center for Natural Lands Management
Location: Washington

Abstract

This quarter-long project forms the basis of a third-year course for majors and nonmajors at the University of Washington, Bothell called Science Methods and Practice. Students use databases to identify novel research questions, and extract data to test their hypotheses. They frame the question with primary literature, address the questions with inferential statistics, and discuss the results with more primary literature. The product is a scientific paper; each step of the process is scaffolded and evaluated. Given time limitations, we avoid devoting time to data collection; instead, we sharpen
students' ability to make sense of a large body of quantitative data, a situation they may rarely have encountered.

We treat statistics with a strictly conceptual, pragmatic, and abbreviated approach; i.e., we ask students to know which basic test to choose to assess a linear relationship vs. a difference between two means. We stress the need for a normal distribution in order to use these tests, and how to interpret the results; we leave the rest for stats courses, and we do not teach the mathematics. This approach proves beneficial even to those who have already had a statistics course, because it is often the first time they make decisions about applying statistics to their own research questions.

We incorporate peer review and collaborative work throughout the quarter. We form collaborative groups around the research questions they ask, enabling them to share primary literature they find, and preparing them well to review each other's writing. We encourage them to cite each other's work. They write formal peer reviews of each other's papers, and they submit their final paper with a letter-to-the-editor highlighting how their research has addressed previous feedback.

A major advantage of this course is that an instructor can easily modify it to suit any area of expertise. Students have worked with data about how a snail's morphology changes in response to its environment (Price, 2012), how students understand genetic drift (Price et al. 2014), maximum body size in the fossil record (Payne et al. 2008), range shifts (Ettinger et al. 2011), and urban crop pollination (Waters and Clifford 2014).

Student Goals

  1. Search for information in the scientific literature and critically read such information.
  2. Choose and perform statistical tests appropriate to continuous or categorical data (e.g., t-test, linear regression).
  3. Interpret results to reach conclusions supported by the statistical and graphical evidence, and communicate research clearly in a formal scientific paper that includes graphical representations of the results and that is revised over several drafts after peer review.

Research Goals

  1. Maintain current knowledge of relevant research by planning for students to share and discuss literature pertaining to their projects.
  2. Explore side questions related to main research area by allowing students to generate questions that relate to underexplored variables in a dataset: these act as analytical "pilot projects".

Context

Instructor: 1 instructor, no TAs
Students: 24 undergraduate, primarily but not exclusively for environmental science, environmental studies, and biology majors; any major is welcome, and there are no prerequisites.
Level: 300-level (juniors)
Duration: 10 week quarter
Credits: 5 credits (standard course)
Requirements satisfied: required course for environmental science and environmental studies majors; can meet requirements for majors in science, technology and society and in global studies; satisfies writing requirement for all majors
Flexibility: occasionally taught as a hybrid course with one face-to-face meeting per week; otherwise taught with two face-to-face meetings per week.

Target Audience: Major, Non-major, Upper Division
CURE Duration:A full term

CURE Design

Different instructors can tailor this format to different research questions. The course offers instructors a way for students to explore large data sets that they have collected or with which they work. When RM Price teaches the course, students work with data from Paleobiology Database (https://paleobiodb.org/#/) to explore questions that they choose, but that emerge from shared readings in class about topics that include the latitudinal diversity gradient, patterns describing how body sizes change through time, and changes in climate across geologic scales. When SM Waters teaches the course, students work with data generated from a Public Participation in Scientific Research project on urban pollination, and choose elements of that dataset to explore.

The learning environment is structured to ensure that students are successful in achieving the goals of the CURE. The arc of the course is

(1) to be introduced to the broader research area through carefully structured exposure to the primary literature, 
(2) to practice statistical analyses with sample data selected from the database with which students are working, 
(3) to generate hypotheses, then discuss and revise them
(4) to extract data from the database, conduct statistical analysis, and revise hypotheses as necessary, 
(5) to write the Results and subject them to peer review,
(6) to interpret the Results based on primary literature that the students find with support from the instructor and a research librarian, 
(7) to write the Discussion and Introduction, and subject them to peer review, and
(8) to write and revise the entire paper. 

Students keep online lab notebooks that the instructor reads and comments on each week; these notebooks are a place where students and the instructor track activities that they complete in and out of class. The instructor works with each student closely, meeting students in class and offering both written and video feedback through the learning management system.

Core Competencies: Analyzing and interpreting data, Asking questions (for science) and defining problems (for engineering), Constructing explanations (for science) and designing solutions (for engineering), Planning and carrying out investigations
Nature of Research:Basic Research

Tasks that Align Student and Research Goals

Research Goals →
Student Goals ↓
Research Goal 1: Maintain current knowledge of relevant research by planning for students to share and discuss literature pertaining to their projects.
Research Goal 2: Explore side questions related to main research area by allowing students to generate questions that relate to underexplored variables in a dataset: these act as analytical "pilot projects".


Student Goal 1: Search for information in the scientific literature and critically read such information.

Search for relevant papers and read the primary literature, making annotations in small groups collaboratively with software like Perusall.

Answer the following questions in a lab notebook:
1. What is the main point of your article? (remember to paraphrase!)
2. What is the research hypothesis of your article (remember to paraphrase!)?
3. How is it related to your research question? Try and be specific.

Discuss in class: Does the literature give you further ideas about other variables in the dataset that would be interesting to explore?



Student Goal 2: Choose and perform statistical tests appropriate to continuous or categorical data (e.g., t-test, linear regression).

Complete guided practice of statistical tests using data provided by instructor. After choosing a research question to pursue, decide which of the tests they have learned about is appropriate for the type of data they are analyzing. Carry out and report the results of the test.



Student Goal 3: Interpret results to reach conclusions supported by the statistical and graphical evidence, and communicate research clearly in a formal scientific paper that includes graphical representations of the results and that is revised over several drafts after peer review.

Write papers that analyze data, contextualizing results with the primary literature.

Write papers that highlight a variety of aspects of the original dataset, providing insights that may influence future directions of study.


Instructional Materials

Syllabus RM Price (Acrobat (PDF) 556kB Aug24 18)

Assessment

Instructional Staffing

A single instructor runs the course. Therefore, the exact details of the research goals depend on the data that the instructor would like to explore.

Rebecca M. Price, University of Washington Bothell

We would like all of our students to have the opportunity to participate in mentored research, even those who have complex schedules that prevent them from pursuing extra-curricular research experiences.


Read full Instructor Story »

Advice for Implementation

No funds, equipment, or supplies are required. It is helpful, but not essential, to hold all meetings in a computer lab, especially one in which students can face each other to discuss their work. The project requires access to one or several datasets (published
or unpublished).

Iteration

Students run into two primary challenges in the course. The first is cleaning the data, so that it is in a format that is amenable to statistical analyses. In some cases, this means students need to recode data to be more simple than it was in the original database. It may also mean that students need to learn how to use pivot tables or how to look up values in excel in order to merge databases. The instructor works one-on-one to resolve these challenges, sometimes meeting outside of class either in person or through video conference. The second major challenge is that students often find that they reject their hypothesis--an outcome that an expert may expect, but one that is challenging to students who are used to getting high grades for being "right." The instructor must support students through this shift, helping them realize that rejecting hypotheses is powerful science.

Using CURE Data

Students are not collecting data in this CURE, but instead analyzing data that they extract from pre-existing databases. Thus, we do not need to concerned about aggregating data or ensuring its quality. Students' research progress has not been shared consistently. Instead, they own their research, and some choose to share it with potential employers or even submit it the campus research journal. In this case, the instructor gains a deeper understanding of the data under exploration, as well as the literature that contextualizes the research. So far, none of the student projects have led to published papers.

Resources

The following resources are helpful for both students and faculty: 

WHEN WATERS TEACHES THE COURSE
Ettinger, AK, KR Ford, and J Hille Ris Lambers. 2011. Climate determines upper, but not lower, altitudinal range limits of Pacific Northwest conifers. Ecology 92(6): 1323-1331.

WHEN PRICE TEACHES THE COURSE
Payne JL, Boyer AG, Brown JH, Finnegan S, Kowalewski M, Krause RA Jr, Lyons SK, McClain CR, McShea DW, Novack-Gottshall PM, Smith FA, Stempien JA, Wang SC (2008) Two-phase increase in the maximum size of life over 3.5 billion years reflects biological innovation and environmental opportunity. Proceedings of the National Academy
of Sciences of the United States of America 106(1): 24–27. doi:10.1073/pnas.0806314106

Plotnick RE, Smith FA, Lyons SK. 2016. The fossil record of the sixth extinction. Ecology Letters. 19: 546–553. doi: 10.1111/ele.12589.

Web Tutorials
Cohen, P. 2014. PBDB Navigator. https://www.youtube.com/watch?v=db2He3p-Jco
Alvarez, W. no date. Introduction to the Geologic Time Chart. Kahn Academy. https://www.khanacademy.org/partner-content/big-history-project/solar-system-and-earth/knowing-solar-system-earth/v/bhp-intro-geologic-timechart
Understanding Evolution Team. Visualizing life on Earth: data interpretation in evolution. https://evolution.berkeley.edu/evolibrary/article/0_0_0/ldg_01