Plotting the Flood Frequency Curve using Gumbel Distribution

Siddharth Saksena, Purdue University-Main Campus

Initial Publication Date: December 14, 2016

Introduction

The objective of this step is to make the students familiar with the concept of flood frequency analysis. This step assumes that the user already has information on the peak streamflow data for a USGS gauge station. This section presents a brief overview on the terminology used by hydrologists during flood frequency analysis. Along with the theoretical definitions, this step also describes some potential applications of learning this approach.

Conceptual Outcomes

Student demonstrates the understanding of return periods and flood frequency analysis
Student demonstrates the statistical interpretations of return periods and parameters used in flood frequency analysis

Practical Outcomes

Statistical Analysis using MS Excel
Use of Gumbel distribution in creating a flood frequency curve

Time Required

1 hour 15 min

Computing/Data Outputs

Numerical Values: Theoretical and Estimated Return Periods
Graphical: Flood Frequency Curve

Hardware/Software Required

MS Excel

Instructions

The input data required in this step can be obtained using this link: Annual Peak Streamflow Data (Excel 2007 (.xlsx) 9kB Dec7 16).

After downloading the excel sheet, select the streamflow values in column B, and click on 'Sort and Filter' tool in the Home tab in excel and select 'Sort Smallest to Largest'. The excel toolbar will popup which will ask to expand the selection as shown below. Click on 'Expand the Selection' and save the excel file.

After saving the file, name a new column as 'Rank (i)' and start ranking the data in decreasing order (From 35 to 1). You can use the excel pull down menu to do this. Now create another column called q_i in the excel sheet. In order to calculate the estimates of exceedance probabilities associated with historic observations in the excel sheet, Gringorten's plotting position formula is used as shown below:

Where,

q_i= Exceedance probability associated with a specific observation;

N= Number of annual maxima observations (35 in this case);

i= Rank of specific observation with i=1 being the largest to i=N being the smallest Column (C);

a= constant for estimation=0.44 using Gringorten's method

Using this formula, calculate the exceedance probabilities for all the observations in the excel sheet. The updated sheet should look as shown below:

Now make another column labeled p_i and make it equal to 1-q_i. This value refers to the non-exceedance probability of the distribution. In order to proceed with this tutorial further, the concept of statistical definition of return periods is described below:

Statistical Definition of Return Period

Assuming that X is a random variable which has a cumulative distribution function F_x(x). The probability that X is less than equal to a given event x_p is given as:

F_x(x) = P (X≤x_p) = p

The probability that this event will be exceeded is then equal to 1-p and the percent exceedance is denoted as 100(1-p). For such an event x_p, the return period corresponding to this exceedance probability is denoted by T.

Here, T = 1/(1-p)

Using this definition, the 100-year return period can be understood as an event with a probability of exceedance 1-p = 0.01 or a non-exceedance probability p=0.99. In other words, there is a 99% chance that this event will not be exceeded within a given year.

Using this concept of T, create another column labeled 'T_p estimated' and evaluate the values using the values in p_i using the equation described above (T = 1/(1-p)). Your excel sheet should have six columns as shown below:

'Tp estimated' represents the estimated distribution of the 35 years of data. Now we will assume that the data follows a specific distribution and estimate the parameters of the distribution. In this step, the data is assumed to follow the 'Gumbel' or Extreme Value Type 1' distribution. The CDF of the Extreme Value Type I or Gumbel distribution is given as follows:

Where x is the observed discharge data, and u and α are the calculated parameters of the distribution. We will use this distribution to calculate the theoretical estimate of 'p'. Create two columns labeled '(x-u)/α' and 'p theoretical'. In order to calculate 'p theoretical', we will need to calculate the value of (x-u)/α first. Using the equations given below, calculate the values of x̄, s_x,u and α.

The values of statistical parameters are shown below:

After calculating these values of u and α, use the peak streamflow values (x) and populate the column (x-u)/α as shown below:

After calculating (x-u)/α, calculate the value of 'p theoretical' using the CDF of the Gumbel Distribution described above 'p theoretical = EXP[-EXP{-1*((x-u)/α)}]'. After calculating 'p theoretical', use the same equation used to calculate 'T_p estimated' and calculate 'T_p theoretical'. After calculating 'T_p theoretical', the flood frequency analysis part is complete. We need to create a flood frequency curve to present the results but before that, check if the final excel sheet resembles as shown below:

Go to the 'Insert' tab on excel, select charts and plot T_p Estimated vs Annual Streamflow. Similarly, on the same graph, plot T_p theoretical vs Annual Streamflow. Add chart title, axis title and legend. The graph should look as shown below:

Now, right click on the curve and select 'change chart type'. Change the chart type to 'Scatter with smooth lines' for theoretical and 'Scatter' for estimated. Right click again and select 'Format Chart Area'. Use the axis options command for the X-axis and select Logarithmic scale. The final chart will contain return periods displayed from 1 to 100 in log scale as shown below:

Here, the orange line represents the theoretical distribution and the blue dots represent the fit of the annual peak streamflow data with respect to a Gumbel distribution. Using this curve, you can predict streamflow values corresponding to any return period from 1 to 100. It should be noted that the curve follows the distribution very well for low flows but drifts away from the theoretical distribution at higher flows. This is why it is wise to use multiple distributions such as log-normal and log pearson type III and check which distribution works best for a specific site.

Okay you are done creating a flood frequency curve using annual peak streamflow data!

Downloading Annual Peak Streamflow Data from USGS

« Flood Frequency Analysis using RWater Downloading Annual Peak Streamflow Data from USGS »