# Data Science for Biologists

Michael Wright

California State University-Sacramento

#### Summary

The primary goal for this course will be to teach data science techniques to upper division biology students. Students will learn how to work with data originating from different fields within Biology.

*Course URL*: http://www.mbl.edu/nsb/

*Course Size*:

15-30

*Course Format*:

Integrated lecture and lab

*Institution Type*:

Public four-year institution, primarily undergraduate

## Course Context:

The target audience for this course would be upper-division Biology majors. The prerequisites for the course would include completion of their pre-major coursework (e.g. introductory Biology and Chemistry series) and some upper-division coursework (e.g., Systemic Physiology, Ecology) so they have the requisite background knowledge for some of the topics that would be covered. The requisite math would be developed during the course so students with different levels of quantitative background could still approach the course content in a meaningful way.

## Course Content:

This course will provide undergraduate Biology majors with a suite of analytical tools that will draw from various fields in Biology. Where possible, datasets will be collected from faculty in the Department to provide a real-world experience. For each topic, Matlab will be used extensively to perform simulations, analyze data, and/or solve quantitative problems.

## Course Goals:

At the conclusion of the course, students will be able to:

- Demonstrate the ability to import, export, and visualize data of different types and formats, including text files, excel files, and binary files

- Summarize data collected from procured datasets and/or simulations

- Perform data analysis designed to classify and discover trends in the data

- Use Matlab to perform simulations of biological systems and use the simulations to describe quantitative and qualitative features of the system

- Demonstrate the ability to import, export, and visualize data of different types and formats, including text files, excel files, and binary files

- Summarize data collected from procured datasets and/or simulations

- Perform data analysis designed to classify and discover trends in the data

- Use Matlab to perform simulations of biological systems and use the simulations to describe quantitative and qualitative features of the system

## Course Features:

The goal of the course is to focus on practical examples from Biology and, as such, each week will focus on examples drawn froma different broad area of Biology:

Bioinformatics, such as Phylogenetic Tree Analysis and Protein Structure Analysis

Ecology, such as Leslie Matrices, Lotka-Volterra Models, Harvest Models

Physiology, such as Time-Series Analysis, Mathematical Modeling, Dynamical Systems

Bioinformatics, such as Phylogenetic Tree Analysis and Protein Structure Analysis

Ecology, such as Leslie Matrices, Lotka-Volterra Models, Harvest Models

Physiology, such as Time-Series Analysis, Mathematical Modeling, Dynamical Systems

## Course Philosophy:

The overriding philosophy of the course is to learn by doing. Because students will have varying backgrounds in data analysis and programming, each exercise will be designed to build a student's confidence in their ability to interact with Matlab. To facilitate their learning, students will produce a tangible result each week (a figure, a saved data file, etc.) that they can use to explain that week's biological phenomenon.

## Assessment:

The primary means of assessment will be in the form of problem sets. Some topics will build over the course of 2-3 labs, whereas others will be self-contained. For each problem set, students will learn to effectively communicate their results within the context of that week's topic.