A stair-stepped approach to programming and scripting

Austin Polebitski, Civil and Environmental Engineering, University of Wisconsin-Platteville

One of the members of my PhD committee said something that has always stuck with me regarding programming, he said "You know you really understand something if you can write it in code." I have really taken this to heart, whether students are performing elementary or more advanced statistics in Excel, or creating their own version of a statistical test as their own function in MATLAB. In graduate school I remember creating my own Mann-Kendall function and checking it against the textbook and the MATLAB version of the function just to see if I could code it up and if I really UNDERSTOOD what the function was doing, it was great learning experience.

I believe that students can develop a deeper understanding of a mathematical method or model through coding, I have used MATLAB and R as gentle introductions to programming and scripting, and as a way to help students learn how to analyze data and create simple models. I began using this method to train and mentor graduate students during my time as a post-doc. I think learning how to translate a process into code is really advantageous for undergraduate students. Statistical methods, a physical process, like evapotranspiration, or even a complex decision making model can be broken down into simple chunks that are easy to understand. When students realize this, it makes complex subjects a lot more digestible and gives students confidence in their ability to look across disciplines or to tackle really large problems.

An example of how I have implemented the above thinking is through an independent study course that used R and focused on preparing students for the basics of programming and how to manipulate large datasets and conduct analysis. This course consisted of senior level undergraduates in the Civil and Environmental Engineering program at Platteville, all of whom had some interest in learning basic coding and data analysis techniques. The course began a small, real-world dataset, and slowly built to larger more complicated datasets as it introduced more complicated programming/logic topics. The first lesson in the course introduces students to the general interface and language, and then introduces how to bring data (peak streamflow data from the USGS) into the R framework and create a simple plot. Students then use this same method, but with a few additional calls to create four plots in a 2 x 2 frame. The last portion of the lesson focuses on simple statistical function calls and creating analysis plots, such as a histogram.

This process of small stair-steps within a lesson provides students small leaps that are accomplishable. This helps build confidence, which in turn will begin to open up new questions and avenues that the students want to explore. Within each lesson I usually include a 'bonus' question or assignment that is optional but encourages students to take a 'larger' leap, where students may be working ahead to accomplish some programming task easier or they must rely on their own ingenuity to come up with a method for analysis when there is no one right way to complete it.

Between lessons, the same learning method of the 'stair-step' applies. For instance, the next lesson builds on the first lesson by creating simple loops and reading in multiple sets of data. Students learn how to use loops to aggregate data together (daily to monthly, monthly to yearly, etc.). The idea is that by the end of the semester, the students slowly building up skills in programming and analysis so that they can work on either their own or provided large dataset projects. The 'final' project allows students to test their ability as analysts and data explorers. The confidence built during the lessons up to this point helps students work with large data or complicated phenomena without feeling overwhelmed. By the end of the independent study course groups of students were analyzing large and complicated datasets, such as the WI DOT crash dataset, which contains 100,000's of records with many fields. Students began their analysis by thinking of questions to investigate and then brainstormed a general approach for analysis before doing any coding. After performing data clean-up, the students completed their investigation related to all deer related crashes in the State of Wisconsin for the last two decades (10,000's of crashes!) and made both spatial and time series type plots. In addition to learning programming and building confidence to ask and answer complicated questions, this method really prepares students for career or graduate school pathways and gives them a leg up on where much of the science and engineering fields are headed in terms of research and 'big' data.