Plotting the Flood Frequency Curve using Gumbel Distribution

Siddharth Saksena, Purdue University-Main Campus
Author Profile
Initial Publication Date: December 14, 2016


The objective of this step is to make the students familiar with the concept of flood frequency analysis. This step assumes that the user already has information on the peak streamflow data for a USGS gauge station. This section presents a brief overview on the terminology used by hydrologists during flood frequency analysis. Along with the theoretical definitions, this step also describes some potential applications of learning this approach.

Conceptual Outcomes

  • Student demonstrates the understanding of return periods and flood frequency analysis
  • Student demonstrates the statistical interpretations of return periods and parameters used in flood frequency analysis

Practical Outcomes

  • Statistical Analysis using MS Excel
  • Use of Gumbel distribution in creating a flood frequency curve

Time Required

1 hour 15 min

Computing/Data Outputs

  • Numerical Values: Theoretical and Estimated Return Periods
  • Graphical: Flood Frequency Curve

Hardware/Software Required

MS Excel


The input data required in this step can be obtained using this link: Annual Peak Streamflow Data (Excel 2007 (.xlsx) 9kB Dec7 16).

After downloading the excel sheet, select the streamflow values in column B, and click on 'Sort and Filter' tool in the Home tab in excel and select 'Sort Smallest to Largest'. The excel toolbar will popup which will ask to expand the selection as shown below. Click on 'Expand the Selection' and save the excel file.

After saving the file, name a new column as 'Rank (i)' and start ranking the data in decreasing order (From 35 to 1). You can use the excel pull down menu to do this. Now create another column called qi in the excel sheet. In order to calculate the estimates of exceedance probabilities associated with historic observations in the excel sheet, Gringorten's plotting position formula is used as shown below:


qi= Exceedance probability associated with a specific observation;

N= Number of annual maxima observations (35 in this case);

i= Rank of specific observation with i=1 being the largest to i=N being the smallest Column (C);

a= constant for estimation=0.44 using Gringorten's method

Using this formula, calculate the exceedance probabilities for all the observations in the excel sheet. The updated sheet should look as shown below:

Now make another column labeled pi and make it equal to 1-qi. This value refers to the non-exceedance probability of the distribution. In order to proceed with this tutorial further, the concept of statistical definition of return periods is described below:

Statistical Definition of Return Period

Assuming that X is a random variable which has a cumulative distribution function Fx(x). The probability that X is less than equal to a given event xp is given as:

Fx(x) = P (X≤xp) = p

The probability that this event will be exceeded is then equal to 1-p and the percent exceedance is denoted as 100(1-p). For such an event xp, the return period corresponding to this exceedance probability is denoted by T.

Here, T = 1/(1-p)

Using this definition, the 100-year return period can be understood as an event with a probability of exceedance 1-p = 0.01 or a non-exceedance probability p=0.99. In other words, there is a 99% chance that this event will not be exceeded within a given year.

Using this concept of T, create another column labeled 'Tp estimated' and evaluate the values using the values in pi using the equation described above (T = 1/(1-p)). Your excel sheet should have six columns as shown below:

'Tp estimated' represents the estimated distribution of the 35 years of data. Now we will assume that the data follows a specific distribution and estimate the parameters of the distribution. In this step, the data is assumed to follow the 'Gumbel' or Extreme Value Type 1' distribution. The CDF of the Extreme Value Type I or Gumbel distribution is given as follows:

Where x is the observed discharge data, and u and α are the calculated parameters of the distribution. We will use this distribution to calculate the theoretical estimate of 'p'. Create two columns labeled '(x-u)/α' and 'p theoretical'. In order to calculate 'p theoretical', we will need to calculate the value of (x-u)/α first. Using the equations given below, calculate the values of x̄, sx,u and α.

The values of statistical parameters are shown below:

After calculating these values of u and α, use the peak streamflow values (x) and populate the column (x-u)/α as shown below:

After calculating (x-u)/α, calculate the value of 'p theoretical' using the CDF of the Gumbel Distribution described above 'p theoretical = EXP[-EXP{-1*((x-u)/α)}]'. After calculating 'p theoretical', use the same equation used to calculate 'Tp estimated' and calculate 'Tp theoretical'. After calculating 'Tp theoretical', the flood frequency analysis part is complete. We need to create a flood frequency curve to present the results but before that, check if the final excel sheet resembles as shown below:

Go to the 'Insert' tab on excel, select charts and plot Tp Estimated vs Annual Streamflow. Similarly, on the same graph, plot Tp theoretical vs Annual Streamflow. Add chart title, axis title and legend. The graph should look as shown below:

Now, right click on the curve and select 'change chart type'. Change the chart type to 'Scatter with smooth lines' for theoretical and 'Scatter' for estimated. Right click again and select 'Format Chart Area'. Use the axis options command for the X-axis and select Logarithmic scale. The final chart will contain return periods displayed from 1 to 100 in log scale as shown below:

Here, the orange line represents the theoretical distribution and the blue dots represent the fit of the annual peak streamflow data with respect to a Gumbel distribution. Using this curve, you can predict streamflow values corresponding to any return period from 1 to 100. It should be noted that the curve follows the distribution very well for low flows but drifts away from the theoretical distribution at higher flows. This is why it is wise to use multiple distributions such as log-normal and log pearson type III and check which distribution works best for a specific site.

Okay you are done creating a flood frequency curve using annual peak streamflow data!

Related Steps