# Constructing a data-driven rainfall-runoff model

#### Introduction

In this step, students will be instructed to execute the provided sample MATLAB codes ddm.m. Next, students will learn to evaluate the performance of constructed rainfall-runoff models.

## Conceptual Outcomes

Learn the concept of model performance evaluation

Learn data-driven modeling

Learn data-driven modeling

## Practical Outcomes

Students will be able to use existing software packages (e.g., MATLAB) to apply machine learning algorithms

## Time Required

2 hours

## Computing/Data Inputs

Sample codes to perform the analysis (Matlab File 4kB Jul21 16)

Cross validation analysis (Matlab File 5kB Jul21 16)

## Computing/Data Outputs

Shown below is a sample plot showing the performance of the trained data-driven model. For both the training and testing period, the model prediction in general compares well with observed streamflow (see Instructions).

×

## Hardware/Software Required

MATLAB/R

## Instructions

There exist various conceptual and physically-based rainfall-runoff models. These models are constructed via a top-down approach, starting from a conceptualization based on the physical understanding of hydrologic systems. Here, we will explore a bottom-up, inductive way to construct rainfall-runoff models. The idea is to discover the functional relationship between rainfall (and possibly other climate data) and stream discharge, by inductively learn from data.

Machine learning algorithms are powerful tools of inductive learning. Given a set of training data, they can learn complex, nonlinear relations between input (i.e., rainfall and other climate data) and output (i.e., discharge). In this step, we will use the Support vector machine regression (SVM) algorithm to build our rainfall-runoff model. SVM (Wikipedia) comprises a relatively new class of learning algorithm. One attracting feature of SVM is its robust generalization power. In addition, the solution of SVM is globally optimal, while many other machine learning tools (e.g. ANNs) are subject to local minima.

We will use LIBSVM codes to train and test SVM model. LIBSVM provides a simple interface where users can easily link it to their own programs on a variety of platforms, including MATLAB, JAVA and Python. The LIBSVM package can be downloaded at https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

In this step, students will be instructed to execute the provided sample MATLAB codes ddm.m. We use the first six years of data (2005-2010) to train SVR, and test on the remaining four years of data (2011-2014). The ddm.m file will generate plots such as the one shown above under Computing/Data outputs. The grey line is the observed daily mean discharge from HydroClient, and blue and red lines show the SVM model simulated discharge during the training and testing period, respectively.

Lastly, students will learn to evaluate the performance of constructed rainfall-runoff models. The sample code ddm.m will calculate two statistics: the mean error and the coefficient of determination.

Machine learning algorithms are powerful tools of inductive learning. Given a set of training data, they can learn complex, nonlinear relations between input (i.e., rainfall and other climate data) and output (i.e., discharge). In this step, we will use the Support vector machine regression (SVM) algorithm to build our rainfall-runoff model. SVM (Wikipedia) comprises a relatively new class of learning algorithm. One attracting feature of SVM is its robust generalization power. In addition, the solution of SVM is globally optimal, while many other machine learning tools (e.g. ANNs) are subject to local minima.

We will use LIBSVM codes to train and test SVM model. LIBSVM provides a simple interface where users can easily link it to their own programs on a variety of platforms, including MATLAB, JAVA and Python. The LIBSVM package can be downloaded at https://www.csie.ntu.edu.tw/~cjlin/libsvm/.

In this step, students will be instructed to execute the provided sample MATLAB codes ddm.m. We use the first six years of data (2005-2010) to train SVR, and test on the remaining four years of data (2011-2014). The ddm.m file will generate plots such as the one shown above under Computing/Data outputs. The grey line is the observed daily mean discharge from HydroClient, and blue and red lines show the SVM model simulated discharge during the training and testing period, respectively.

Lastly, students will learn to evaluate the performance of constructed rainfall-runoff models. The sample code ddm.m will calculate two statistics: the mean error and the coefficient of determination.

## Additional Activities and Variants

Interested students are encouraged to try out other machine learning algorithms such as artificial neural network and Gaussian process by following the sample codes ddm.m.

Interested students can further explore how to tune the hyperparameters of machine learning algorithms by trying out the CV.m code. Additional resources can be found in https://wiki.cites.illinois.edu/wiki/display/mlgwm/Tutorial.