From zero to Matlab in six weeks - with Freshmen

Frederik J Simons & Adam Maloof
Princeton University

Summary

How to bring Freshmen up to speed in Matlab programming. A series of targeted demonstrations and exercises accompanying a first one-semester course (a "Freshman Seminar") in the Geosciences as taught at Princeton University.


Course URL: http://geoweb.princeton.edu/people/simons/FRS-SESC.html
Course Size:
Less than 15 (limitation set by logistics of accompanying field trip).

Course Format:
Small-group seminar.

Institution Type:
University with graduate programs, including doctoral programs.

Course Context:

Course is limited to 15 Freshmen, selected by application. A wide variety of skills is represented. Most of the students do not become Geosciences majors, though many of them stay in the natural sciences. One mandatory three-hour meeting per week, of which approximately one third to one half devoted specifically to building Matlab skills. A one-hour weekly "clinic" with voluntary attendance, conducted over lunch in a residential college. Additional on-demand help from instructors and volunteers (course alums, graduate students, post-doctoral researchers.)

Course Content:

Below is how we "sell" it to students over the summer before their first semester in college. They apply for one of the fifteen spots.

In this Freshman Seminar, you will combine field observations of the natural world with quantitative modelling and interpretation to answer questions like: "How have Earth and human histories been recorded in the geology of Princeton, the Catskills, and Spain, and what experiments can you do to query such archives of the past?" In the classroom, through problem sets, and around campus, you will gain practical experience collecting geological and geophysical data in geographic context, and analyzing these data using statistical techniques such as regression and time series analysis, with the programming language Matlab. During the required one-day field trip and over the week-long Fall break field trip, you will engage in research projects that focus on the cycles and shifts in Earth's shape, climate, and life that occur now on timescales of days, and have been recorded in rocks over timescales of millions of years. The classroom component of this Freshman Seminar will have graded (bi)weekly assignments built around on-campus data collection, data preparation or analysis, and scientific programming. A significant part of your assessment comes from writing assignments that teach you to communicate your scientific results, and culminate in an original research paper and an oral presentation for an audience of peers, Freshman Seminar alumni, and invited guests from the university community.

This is a science class: students should come prepared with an aptitude for, and a willingness to learn, the quantitative aspects of scientific inquiry. Scientific writing is an integral part of this seminar and its assessment. We teach and require the use of LaTeX and BibTeX!

Course Features:

From zero to Matlab over the course of a mere six weeks. The seventh week of the semester is spent travelling to a field location where students gather geoscience data (mostly using geophysical instruments - but some perhaps unexpected ones, such as their smartphones). In the second half of the course, six more weeks are spent on strengthening and cementing the students' Matlab skills, through and for the analysis of field data. After formal teaching ends, students spend an additional three weeks completing independent research work on projects of their choosing. Most of them end up making additional significant progress in Matlab programming, and end up turning in final papers that amply document their proficiency.

Course Philosophy:

College students need to be exposed to the natural (physical) sciences, in their Freshman Year. We interpret this need at its most fundamental level, which is: students need to learn how to make predictive models out of field-based (non-'experimental', non-'laboratory') observations. The Geosciences are particularly suited for such an introduction. Anyone armed with a basic curiosity can formulate a research question related to a process or observation in the physical world around us. Why are there deserts, why do they grow and decay? What are the human causes of desertification? If dunes are snapshots of slow-moving seas of sands, what does the morphology of dune fields tell us about winds, climate, the water table, and the bedrock over which they form and flow? On what time scale are such Earth systems in need of continued observations? How do we represent, statistically, two-dimensional dune patterns? How do we comb through and query, computationally, large data bases with remote-sensing observations such as satellite imagery? As never before, the Earth sciences have reached the point where technical advances allow new observations that are, literally, flooding our data centers ('big data' becoming a buzz word on Wall Street, Capitol Hill and in Nassau Hall), exceeding our human capacity to analyze and interpret them ('data' are not 'information'!). The latest low-cost technology is definitely the Unmanned Aerial Vehicle, or 'drone', carrying a small payload of instrumentation for Earth observation. On the data-analysis side, the Earth sciences are blurring the lines with computer science. All Earth scientists coming of age in this century will write computer code, often collaboratively, and much of it will become part of public-domain archives (as per funding-agency rules and by common sense). Nationwide, Introductions to Computer Science are overtaking many other courses in terms of enrollment. High-level jobs are for those who can program, and "Girls Who Code" (http://girlswhocode.com) has been lauded as an effectively targeted organization that might help erode gender differences in STEM enrollment, advancement and employment. But, much like learning to mix paint won't turn us into Rembrandts, mastering the Java 'while-loop' hasn't taught anyone the scientific method. For that, what is needed are (1) instruction in the scientific fundamentals of observation (using instruments or otherwise), (2) (computational) modeling, and (3) writing (about a 'real' subject, to a 'real' audience). Of these three, the one that is most sorely needed at our University and useful for our own Department is a 'gentle' introduction to programming literacy. Such an introduction needs to be tailored to, but not too exclusively focused on, a 'domain science', even one as broadly described as 'the Geosciences'. The most novel, the second of three major objectives of our course, is to introduce a small group (limited by the field trip location and the budget for it) of novice students (traditionally, students, especially Freshmen, interested in the Geosciences fall into this category) to computer programming for science . This goal will be achieved in a setting that allows them to see the motivation behind wanting to program the computer (a natural-science question about which they feel ownership by collecting their own data sets). It is also part of a "Freshman Seminar", where other skills will be taught also, thus eliminating student anxiety about taking a 'coding-only' course. After taking our Fall course, students still might want to take "Introduction to Computer Science" courses, or they might be able to skip those for more advanced work in a variety of departments. No "programming" course of any kind exists in our Geosciences department. We aim to be the first, gentle, step in luring a diverse audience to the life of the computer-programming natural (Geo)scientist. (If trends from our previous courses persist, we should reach a female-skewed audience, as over 75% of the applicants for previous editions of our Freshman Seminars have been female over the years.)

Assessment:

What do we want students to know and be able to do at the end of the course?

Students will be exposed to natural science at its most fundamental level, which is: making models from data - as relating to the Geosciences (but without much explicitly 'domain' knowledge and certainly no rote learning of any kind). The core skills that we will teach and expect the students to master are:

  1. Making observations: measurement, instrumentation, interpretation. This involves equipment (smartphones, at the most basic level, have GPS receivers, magnetometers, and accelerometers!). Unlike in other departments, with the possible exception of Astrophysics, ours are not actively conducted experiments. The only experiments that we have are the ones that nature did for us (often millions of times, under varying circumstances), in the "natural laboratory". This also involves our own minds: what is important, what is interesting? What do our eyes see, how do we distill the essence of such information, and how do we record it without mechanical equipment (e.g. by sketching, in physical or digital field notebooks).
    Assessment is by grading of their lab reports and field notes.
  2. Predictive modeling: data analysis, statistical inference, computational techniques. This involves computer code. Unlike in Computer Science courses, we will teach students how to write computer code for a specific purpose. As such it is a much more gentle introduction to algorithms and data structures. We will 'build the hammer' as we are 'making the cabinet' with it - we won't just build 'an awesome hammer'. But like Computer Science courses, it is about proper computer programming, to which we introduce the students, from scratch, and bring them up to a basic level of proficiency, enough for them to meaningfully build on Objective 1 above.
    Assessment is by grading of their code samples, or directly of the figures and data analysis conducted using the code that they wrote.
  3. Scientific writing: reporting, referencing, and communicating uncertainty. Few courses on campus teach scientific writing per se. As with the example of Computer Science above, scientific writing isn't just about logic, presentation and proper sentence structure; it is about having something to say, wanting to say it, and saying it well. It is also about how to access and use the literature, quote the sources in the proper manner (which is very varying by discipline, but more or less consistent through the natural sciences), and presenting uncertainty (without falling into the journalistic trap of simply presenting two opposing viewpoints) while exercising sound scientific judgment. We will teach the formal aspects of this (making it look good) but also the ethical aspects (attribution, plagiarism, reproducible research, data and code sharing practices, etc).
    Assessment is by grading of their oral presentations and written papers.

What sorts of activities or assignments will help students meet these objectives?

Ad 1 above, the essential vehicles are the campus exercises (outdoors) and the field trip. On campus, students learn to use the equipment; off-campus, they learn how to use it for an actual research purpose. On- campus exercises end up as short written laboratory reports, which are graded. Field trip research activities are recorded in an edited (by the student) laboratory (field) notebook, which is graded, and evolve into an end-of year research project, which is proposed, fine-tuned, peer-reviewed, and written up and separately graded in the form of a substantial piece of written work. The lab notebooks in question, a vital part in the practicing scientist's life, are both physical (for use in the field) and digital (for more digested and edited field notes, usually compiled and annotated at the end of the day while the memory is fresh.)

Ad 2 above, we introduce the students to the Matlab (as a computer programming language!) in a one-to-two-hour formal session every week (during "class time") and in a one-hour informal help session every week (during "lunch time" - a very effective low-threshold venue). It's hard to teach someone how to code, but it is workable in the intimate setting of our course - since the field trip location and budget limit our enrollment, we are in such a fortunate situation to bring everyone up to speed relatively quickly. The classroom will be "flipped" to some degree. We are working on producing short video clips (inspired by Minute-Physics and Minute-Earth - hits on YouTube!) that introduce students to a feature of the computer language (both "low-level", e.g. What are 'strings' and 'structures', how do 'for' and 'while' loops work, what is the difference between a 'script' and a 'function', how to control 'input-output' features?; and "high-level", e.g. how to make and annotate a histogram, how to mesh and render a surface, how to perform linear regression, how to do hypothesis testing?). The student can watch these ahead of time (and repeatedly afterwards), try a small exercise at home, and come to class for a thorough review, troubleshooting, and further learning. On campus and in the field the students will collect so much (digital) data that the taught tools will be immediately applied to the real-world setting from which they derive. This will strengthen students' learning by doing, and by working on a project (often in a small team) about which they care and feel ownership. We count on formal coding support from the instructors, but also informally (using the "they join us for lunch and they help students with their questions in the meantime" model that has worked so well over the past few years in our other courses) from graduate students and post-docs working in our Department and elsewhere.

Ad 3 above, there are first, the 'mechanics': we require all written work in LaTeX and all bibliographic information in BibTeX; these are open-source packages used by most physical scientists and engineers. Second, the 'fundamentals': how to formulate and pursue a hypothesis, how to track down sources and published data, how to report on data collected by the students themselves, how to represent data and illustrate inference using graphs and diagrams. Third, the 'ethics': what is an appropriate use of sources, and what isn't? Who owns an idea? What is an appropriate level of editorializing? In short, how to write well, for a specific audience, of professional (though not necessarily active in the same field) scientists. We envisage a stepped approach to producing a final paper that comes very close to actual practice: (a) brainstorm the hypothesis among peers and with the instructors, (b) find the material, (c) produce a short proposal, (d) circulate a draft, (e) peer-review the final 'submitted' version, (f) incorporate suggested changes, (g) print and proof for final 'publication'. Our past experience shows us that specific guidance (and graded feedback) on each of these steps, individually, produces very impressive results at the end.

How do we evaluate student performance?

Least loved part of the process: by grading! But with detailed individual feedback, and by group feedback during class time! (Students do complain it's hard -- harsh even, but they find the process fair, and value the feedback, and our accessibility.) Every week something is due, and during class there are 5-minute quizzes every week (about some reading assigned, or on a piece of code that 'works', etc). So much as the materials under 1 and 3 above are 'traditional', we don't require long philosophical considerations.

As far as 2 is concerned: how to specifically evaluate proficiency in writing computer code? Programming assignments (both the 'educational' ones that ask for specific pieces that do something specific, and the 'research' ones where the objective is to 'get something done', as a manageably broken-down piece of a larger computational task of data analysis or statistical inference) will result in 'code'. Functions, scripts, subroutines, compiled sets: the students' efforts under any of these possible forms (but all in the programming language Matlab) will be posted to us, such that we are able to try and run their code (and suggest modifications!). Code will be evaluated for usability, portability, speed of execution, and yes, 'elegance' - which will be taught by example and grown by experience.

Syllabus:

Week 1

Page numbers refer to the PDF document posted below under 01.The cheat sheet of which we write is the PDF document below under 02, with the LaTeX source code under 03.

Two example Matlab functions are posted under 04 and 05, and the image that they load is found under 06. Note that you need to verify that they are named correctly after downloading.

We begin by showing some illustrations of what Matlab can do in the context of geological data analysis. Example: the analysis of images, e.g. a scanned picture of a layered rock (p02, p03 top), and how Matlab's canny edge detection routine is able to identify and define the boundaries between the layers (p03, bottom). Admittedly, canny is rather sophisticated, but using mostly homegrown tools we can get some very good results for image segmentation ourselves (p04). At this point the students should be well motivated... how cool is that... but how? How do we achieve control over our digital environment? Through programming. Next item of business is to introduce the very small handful most basic Matlab commands. My personal list of top-fifteen commands (p05, p06) leads to a cheat-sheet with function names, what they do, and examples (p07) that we hand out in hard-copy (and also as the LaTeX source code, since we teach LaTeX also... but that's another story), and we ask the students to continue filling out this sheet (by hand or in their own digital version) throughout the semester. We continue with a short list of other vital commands (p08) that we introduce by functionality types: addressing, and logical operators. All of these commands were shown "live" in Matlab as projected on-screen. Next up us to actually use the command line to execute basic functions, culminating in the writing of the first script. In our case, we show the students how to load one of the images that we prepared, and cut a profile through the color values and show them in function of the "column" number, as running left to right through the image (p09). After spending time with the students until they manage this procedure on their own command line, we turn their script into their first function, which goes through the entire suite of manipulations and ends in the production of an annotated figure, saved in a publication-quality format (p10), ready for inclusion in their first lab report. And then it's onto more sophisticated things, such as making histograms of the colors in the image (p11). This first session ends with a lab homework assignment that is extremely simple (p12), and mostly designed for students to walk through the process - of generating a simple figure (p13) - from the beginning to the end (and also, to teach them how to use our course management system for the submission of assignments). Whether throughout the session or explicitly at the end (p14), we spend time on the basics of code hygiene, and the essential pieces of what makes a good Matlab function.

Week 2

We send students around campus tocollect GPS data of a series of control points and a control line, which they revisit daily over the course of the week. We give the students a csv-file data template, provide them with a data set collected by a student in a previous year, and walk them through a simple script to plot the data, with different symbols, means and standard deviations added, in a georeferenced plot, and then iterate through the process so they can identify outliers in the data sets. We discuss various ways of doing data analysis and leave them with an assignment to integrate their own data with that of the class, which we discuss in great detail during the next class.

In the first PDF document posted below under 07, you will find the plot generated during class. In the first plot (entitled "data collection") a massive outlier is easily identifiable. In the second plot (entitled "data curation") that outlier has been removed, and a new plot has been made. The lab assignments themselves are posted below under 08 and 09, and a simple script with the command sequence to ingest the data and plot the results is posted under 10. A template, a slightly more complete data file from the instructor, and a student's result are given to the student for processing during this class, and these are posted under 11, 12, and 13. Note that the csv files may need to be edited, i.e. the header line removed, etc., for which we recommend the simplest of text editors.

Week 3

Now we let the students be creative! Everyone came to class with their personal GPS data file, and we teach them how to integrate everyone's data sets into a master plot showing the results of the locations of all the control points collected during the past week. We walk through the various ideas of what to do next posted in the lab assignments (see Week 2).

The master plot is posted below under 14, and the script to make it under 15. Note that of course there will be a need for additional data files, and some edits will be required. We make these scripts "live" with the students in class, and then give them to the students with the task of modifying them for their own purposes. The students are learning how to troubleshoot, how to program, and how to deal with their own computational problems. Getting through the simple task of making one "simple" plot take a full three weeks of sessions! But they manage.

Week 4

Now it is time for some more serious fare. We spend the lecture and lab time on the creation of a series of increasingly sophisticated scripts to load in a data file containing some piece ofterrestrial topography (a NETCDF file, as it happens), and we show how to plot it, annotate it, draw profiles through it (horizontal, vertical, hand-drawn), look at slopes, hand-drawn polygons for analysis, histograms, and the analysis of roughness. Each of the little scripts also writes a figure file (a PNG file) for inclusion into a lab report.

Under 16 below, we list the lecture materials, and under 17, we make the entire script sequence available, including the data files, the figures generated, and the LaTeX source. Under 18 below, we list an example homework that we assigned to build on these competences. Some customization will be needed for your own purposes, of course!

Week 5

In this lecture we play with cycles! We have the students generate some time series that are superpositions of sines and cosines, and then we try to find their amplitudes, periods, and phase angles. We cover two methods. The first is by simple correlation analysis. Generate a signal of a certain period, amplitude and phase, and then generate a trial signal with which to correlate the first. The idea is that when the correlation between the signals is high, you must have found the right period and amplitude! (Note the special role of the phase in the correlation analysis!) But that's not all. We also teach the students how to formally invert for the best-fitting amplitude given a trial period, or a set of trial periods. For this we use Matlab's "pinv" function, and with this, we teach the basics of regression analysis.

We play with knocking out values, adding noise, interpolating and cleaning up data, and so on. We do all of this on the command line, but then turn our code into a script, and subsequently into a function. The students leave the lecture with two functions by which they can replicate the analysis that we did in class. They also leave with two more sets of commands and a data set (a magnetic time series collected by us on campus) which we ask them to analyze - e.g. for diurnal variations, i.e. the 24-hour period, as part of a lab assignment.

Under 19 below, we list a set of figures output by various invocations of the command-line goings on, scripts, and ultimately, functions. Under 20 below, we give the Matlab source files and the sample data set.

Week 6

We are professionals! Forget correlation and regression analysis, we are now going to do spectral analysis via the windowed periodogram. Move over, Fourier! We spend a great deal of time on the massaging of data (column csv format, as in previous weeks), and the proper date formatting. We breeze past the correlation-coefficient approach from the previous week (writing a first script with the students) and then move on to the Fourier analysis approach (writing a second script with the students).

We perform crude significance analysis on the power spectral density. It's incredibly gratifying that we should see the diurnal cycle (in temperature records!), but even more so, that we can teach Freshmen to conduct sophisticated time series analysis using state-of-the-art statistical methods using, ultimately, just a few lines of code.

Under 21 below, we give two figures output by the two main pieces of code that we write with the students in class, and under 22, we give all the source code.

Teaching Materials:

01 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 1.2MB Jul14 16) Week 01 Lecture materials

02 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 304kB Jul14 16) Week 01 Cheat sheet

03 From zero to Matlab in six weeks - with Freshmen ( 3kB Jul14 16) Week 01 Cheat sheet (LaTeX source)

04 From zero to Matlab in six weeks - with Freshmen (Matlab File 2kB Jul16 16) Week 01 Matlab function 1 (rename to varves2.m)

05 From zero to Matlab in six weeks - with Freshmen (Matlab File 1015bytes Jul16 16) Week 01 Matlab function 2 (rename to varves3.m)

06 Week 01 Image file (rename to H1W-18_35-test2-small.jpg; image shown inline!)

07 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 47kB Jul16 16) Week 02 Lecture materials

08 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 2MB Jul16 16) Week 02 Lab assignment I

09 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 394kB Jul16 16) Week 02 Lab assignment II

10 From zero to Matlab in six weeks - with Freshmen (Matlab File 2kB Jul16 16) Week 02 Matlab script (rename to simplescript.m)

11 From zero to Matlab in six weeks - with Freshmen (Comma Separated Values 134bytes Jul16 16) Week 02 Data template (rename to template.csv)

12 From zero to Matlab in six weeks - with Freshmen (Comma Separated Values 327bytes Jul16 16) Week 02 Template GPS data set (rename to fjsimonsl02a.csv)

13 From zero to Matlab in six weeks - with Freshmen (Comma Separated Values 16kB Jul16 16) Week 02 Sample GPS data set (rename to adriantl01a_H.csv so it can be read by simplescript.m above)

14 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 9kB Jul16 16) Week 03 Lecture materials

15 From zero to Matlab in six weeks - with Freshmen (Matlab File 1kB Sep12 16) Week 03 Matlab function (rename to emgps.m)

16 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 995kB Oct16 16) Week 04 Lecture materials

17 From zero to Matlab in six weeks - with Freshmen (Zip Archive 4.2MB Oct16 16) Week 04 Matlab scripts for topographic analysis, generated figures,sample topography data set, and lecture LaTeX source

18 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 562kB Oct16 16) Week 04 Lab assignment

19 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 1.2MB Oct16 16) Week 05 Lecture materials

20 From zero to Matlab in six weeks - with Freshmen (Zip Archive 1MB Oct16 16) Week 05 Matlab scripts for time-series analysis, and sample geomagnetic data set

21 From zero to Matlab in six weeks - with Freshmen (Acrobat (PDF) 64kB Oct16 16) Week 06 Lecture materials

22 From zero to Matlab in six weeks - with Freshmen (Zip Archive 371kB Oct16 16) Week 06 Matlab scripts for power-spectral density (Fourier) analysis, and sample temperature data set

References and Notes:

The Elements of MATLAB Style by Richard K. Johnson

Essential MATLAB for Engineers and Scientists by Brian H. Hahn and Daniel T. Valentine