30 years on: Is the current generation of students learning programming too late?

Glenn Thompson, Geosciences, University of South Florida

I was lucky. In 1984 I was part of a wave of British schoolkids who learned to program in BBC BASIC. I was 13. The BBC had made an ambitious decision to invest in a computer (the Acorn "Proton" - but rebranded as the BBC microcomputer) to accompany a series of television programmes to get Brits programming. Initially my parents had signed up for a nightclass to learn BBC BASIC programming. But my mum quickly lost interest and I took her place. At 14 I moved to a different school and for the next 4 years I had weekly programming classes on a network of the BBC micros.

Later as I went through undergrad, masters and PhD programs at university, I learned a variety of other languages: Algol, Fortran and C on the way, I was still helping my peers with their programs. But in physics labs throughout my university education in the 1990s, I often found those versatile old BBC model B's connected to laboratory equipment and logging data.

In 1998 I first came across MATLAB. For the first time I had a language that could replace every other language I had learned before. Since then I've learned Perl, PhP, Python, C++, Visual Basic and the basics of Java and Javascript too, and I've dabbled with R and IDL, but for all my scientific work, MATLAB is unsurpassed.

From the very beginning I was helping my peers with their programs. This has continued throughout my career as a postdoc, a staff seismologist involved in real-time seismic monitoring operations at various seismological and volcanological observatories, and now as an assistant professor at a university.

It became clear to me many years ago that the most useful language for students to learn is MATLAB. It is the simplest language to learn, and also the most effective. Compiled languages such as Fortran and C, while still very important for supercomputing, are too steep a learning curve for most scientists who are just looking for something more effective than Excel. And dealing with compiling and linking, finding the right libraries, segmentation faults and unhelpful error messages, as I did throughout my PhD, that's a recipe for driving most scientists away from programming forever. But once they have mastered MATLAB, transitioning to C, should the need ever arise, becomes so much easier.

Throughout my career I've been helping graduate students with their MATLAB programming, and in 2009 I co-developed a MATLAB programming course at University of Alaska Fairbanks, but I only ever taught a few classes from this course. Since moving to the University of South Florida 2 years ago, I am now a faculty member with teaching responsibilities for the first time. For the first year I was setting up the computational infrastructure for our Seismology group, but in year 2 I taught my first formal class. This was a combination of MATLAB programming, and using the GISMO toolbox in conjunction with the Antelope toolbox, to do some basic seismic data analysis. I am now trying to expand the GISMO toolbox in collaboration with the original developer, Celso Reyes, while simultaneously developing a set of teaching materials that will really help students (and their professors) exploit GISMO effectively in their research.

Perhaps the biggest surprise I have found is that even undergraduate students from STEM fields such as physics, mathematics, geophysics and engineering often enter graduate programs with no programming experience, and no experience of using a command line. So while computers today might be 1000 times more processing power, memory and storage than computers of 20 years ago, students entering graduate programs – even though they have grown up with computers everywhere, are less familiar with programming compared to students of a generation ago. And without programming skills, students cannot effectively conduct research on the large datasets required to earn PhD's these days and compete in an age of academia where funding is becoming more and more difficult to obtain.

I think part of the challenge in tackling this problem is that as professors advance in their careers – and in years – it becomes increasingly difficult for them to keep pace with advances computational science, so it becomes difficult not only for them to teach their students effectively, but it actually becomes overwhelming and intimidating. Since I do not generally work with students until they are in the first year of graduate school, I feel the best I can do is encourage them to learn programming in their first semester of graduate school. But while it is definitely a case of better late than never, I wonder if learning an interpreted programming language, such as MATLAB or Python, should be a core requirement for undergraduate scientists? And how broadly in programming taught in high schools? Do we need a nationwide drive for high school kids to learn programming, indeed to encourage everyone to program, like the BBC instigated in Britain in the 1980s? The modern equivalent of the BBC microcomputer is the Raspberry Pi. Like the BBC, the Pi can be interfaced with a variety of laboratory equipment. But how widely is it being used to teach programming at schools?

As a new professor with very little prior teaching experience, I am a novice when it comes to assessing students' quantitative skills. But I do see students create highly convoluted and laborious workflows, which might cost them weeks or months, where an ability to write simple functions in MATLAB could get the same task done in minutes. So when I do teach MATLAB, I focus on writing reusable functions from the outset, starting with a simple but complete template, and gradually evolve to increasingly more complex tasks. I emphasize a top-down approach of writing function usage statements and comments before writing the code, and using proper indenting. I write specs for the functions and test plans at the outset of each homework, and so when students send me their code for assessment, I can quickly test if their code gives the correct output, and has been written and formatted in a way that will enhance long-term development & maintenance.

To summarize, having spent so much of my career as a geophysicist developing software in a variety of languages, and having had a regular stream of graduate students come to my door with a wide range of scientific programming questions, I think MATLAB is the easiest and most useful for geoscience students to learn. So I am planning to develop a suite of courses here at USF that cover introductory MATLAB programming, seismic data analysis with GISMO, and time series analysis with MATLAB. This workshop is going to be a great opportunity for me to learn from more experienced colleagues nationwide (and beyond?) what they have found effective for teaching MATLAB to geoscience students, and get their insights for making GISMO more accessible by and more useful to a broader audience.

Downloadable version of this essay