Numerical Differentiation for Chemical Engineers

Roman Voronov, NJIT, Chemical and Materials Engineering

Author Profile


In this activity students learn to apply appropriate numerical differentation formulas to non-uniformly spaced data. The concept is applied to a chemical engineering problem where the goal is to measure how the rate with which ethanol is leaking out of a spherical reactor changes with time, in order to determine if there is a danger of an explosion

Learning Goals

The learning outcomes are two-fold: 1) Numerical Differentiation - the students should learn when to apply the correct finite difference formula (forward, backward, centered, low order or high order) to achieve the maximum accuracy; 2) Coding - the students should learn how to use a combination of a loop and if/then structures to cycle over nonuniformly-spaced data and detect subregions of constant step-size. Then have the code select the best formula for each specific region. Finally, they learn how to deal with overlapping data points and validating their answers visually by plotting against the results from the Matlab's "gradient" command.

There are definitely higher-order thinking skills involved, as the students have to visualize in their head the different subsets and how the differentiation results are stored in a two dimensional matrix (overwhich they then have to average out the overlapping derivative values). Visual aids are included with the problem to assist them with these nontrivial tasks. They also have to be very careful defining many finite difference formulas and applying them in a correct manner.

Context for Use

This activity is intended to be used as a hands-on in-class or virtual lab during a single class time (3-hour lab). We use it in an upper-level undergraduate course in chemical engineering computing, but chemical engineering context is easy enough that it can be used in any math, physics or chemistry course as well. Typically a group of 4-5 students works together, with one of them designated as the "leader" (casts own matlab screen to the others).

Students should come to class having solved the problem by hand on paper. The goal is that they understand the math, before they implement the numerical methods using programming.

Students also need to be familiar with Matlab basics: for loops, anonymous functions, using other people's subroutines (from file exchange), indexing into matricies, if/elseif/else structures, plotting and formatting plots, using matlab commands like disp/gradient/mean, creating matricies filled with nans, and clearing the workspace/command window/closing figures at the start of the code.

The activity could be easily adapted to other settings, such as a take-home exam or assignment.

Description and Teaching Materials

A spherical reactor filled with ethanol.

At t = 0, a leak in the bottom of the reactor causes ethanol to flow out of the reactor. The following data is collected for the flow rate out of the tank as a function of time. We need to measure the rate of change for the flow rate (dQ/dt) at every time point given below, in order to determine if there is a danger of an explosion.

Create a script that takes in the above values and determines the best differentiation method for each time. For some of the data points, you will need to take the average of two different differentiation methods. Please note that you should use the method that provides the most accuracy while requiring the least numerical input. In other words, if you have the choice between 3-point forward/backward formulas or 2-point center formulas, both of which have errors of O(), you should choose the 2-point center formula (because it gives you teh same level of accuracy as the 3-point forward/back formulas, but requires less information and less work to calculate).

In addition, for any datapoints that are at the verge between two intervals (e.g. at t=1500s, the interval is 500s on one side and 700s on the other side), you should find both possible derivatives for that point and average them.

Lastly, please take careful note of the units for your given values. Solve for dQ/dt in terms of !

The graphic below should help you to determine how you should set up your conditional statements. Note that you should be setting up your code to determine the appropriate methods using conditional statements, not by hard-wiring the below methods into your code for each individual data point.

First derivative formulas from your Chapra textbook are shown below.

Forward Derivatives:

Backward Derivatives:

Center Derivatives:



1. Input the values from the above table into two arrays, converting the time from seconds to hours. Then use the diff, runlength, and numel functions to determine the intervals between each datapoint, and the number of repetitions for each of those intervals. Note that this portion of the lab is almost entirely the same as your code from Lab 8: Numerical Integration. You can copy-paste the same code for this section, with small adjustments (see Guidance steps

2. Define anonymous functions for the six different First Derivative formulas from above. The functions should accept an array of Q values, the interval spacing value (h), and however many points are necessary for that differentiation method.

3. Create a new If/Elseif/Else structure within your existing For loop that will determine the optimal method (i.e. lowest error with least numerical input) for differentiation at that point, and use the anonymous functions to find the derivative, taking the average for values that have two potential methods

4. Use Matlab's gradient function to find Matlab's differentiation for each point

5. Plot your results vs Matlab's results on the same graph.



1. Open your code with your name, clc, clear all, and close all.

2. Create two arrays, t and Q, which will contain the data from the above table. Keep the values in terms of seconds until after you complete Step 4, where you find the run length. Otherwise, the values will be incorrect

2A) Make 6 anonymous functions that can calculate the six different types of first derivatives. It should accept an array named f, and it should accept a value h which is the interval. Please use the following names: dF_fwd_2p,dF_fwd_3p,dF_bck_2p,dF_bck_3p,dF_cntr_2p,dF_cntr_4p. Also, please define the x inputs from left to right, for example: imagine an X axis ordered like this: xi_minus2, xi_minus1, xi, xi_plus1, xi_plus2

so when you define your anonymous functions, the inputs should be ordered like this:

dF_fwd_2p = @(f,h,x_i,x_i_plus1) ...


instead of like this:

dF_fwd_2p = @(f,h,x_i_plus1,x_i) ...


here is another example:

dF_bck_2p = @(f,h,x_i_minus1,x_i) ...


Note how I started with x_i_minus1 and then went to x_i, as opposed to putting x_i before x_i_minus1. This is what I mean by "from left to right".

3. Use the diff function to find the intervals between all of the data points and save them to a variable spacing. Remember that the number of intervals will be one less than the number of data points.

4. Use the embedded function runlength to determine the number of repetitions for each interval length. You will need to use the function numel, which determines the number of elements in an array, as the "maximum run length." The value from the function that we are interested in is the "l" output value, which will give an array of the run lengths (i.e. how many times each particular interval occurs in a row). Save this value into a variable named repetitions. The second output of the runlength function will contain the values of each unique interval. Save this value into a variable called intervals

  • The structure of the runlength function is: [l,s] = runlength(x,R), where x is the array of data in question, R is the maximum run length, l is the length of the runs (i.e. the number of repetitions for each unique interval), and s is the value of the run (i.e. the unique intervals)
  • After calculating these values, convert both the array 't' and the intervals from seconds into hours.
  • If you're still a bit confused about the intervals and run lengths, the graphic below should be helpful- it shows the intervals, and how the runlength function determines the repetitions values. Also, here is a little video trying to explain what runlength does and how to use it.

5. The basics of your For loop are essentially unchanged from Lab 8. You should still set up a variable named num_reps, which should be the number of times each interval repeats in a subset (e.g. repetitions(1), repetitions(2), etc.). You also need to have a variable named num_points, which will be the number of data points in each subset. This should be easy, since the number of points is always going to be one more than the number of repetitions of the interval.

  • You should also set a variable that is named subset_start, which will be defined as 1 for the first loop (since we're starting at the beginning), and on each subsequent loop will be changed to the final subset value (subset_end) from your previous loop. subset_end should be the subset_start value plus the number of repetitions (num_reps). The above graphic shows a visual example of the subset start and ends.
  • Prior to your loop, you should also set up a matrix whose number of rows (i.e. the first dimension) is equal to the length of your repetitions array (i.e.,the number of colors in the image above), and the number of columns (i.e., the second dimension) is equal to the number of t values in your data. Prior to the start of your loop, it should be entirely filled with 'nan'. The reason for this will become clear in Step 9. Name this matrix 'dfdx_ALL'

7. To match with the variables from the numerical differentiation methods' equations, set up the following variables:

  • f should be the Q values for your subset (i.e. Q from your subset_start to subset_end for each iteration of the loop.)
  • h should be the interval value for each iteration of the loop. COMMON MISTAKE: students forget to use the index i when defining the h and end up with the h equaling many interval sizes instead of just one.
  • You can also create variables x1, x2, x3, x4, and x5, which will be equal to 1 through 5 respectively. These are useful for providing inputs to your anonymous functions, when you are calculating the derivatives. For example, dfdx1 = dF_fwd_2p(f,h,x1,x2) instead of dfdx1 = dF_fwd_2p(f,h,1,2); However it is not required. It is just meant to help you keep track of things when looking at the numerical differentiation equations. NOTE: this is different from the last lab, where we defined x0 through x4.

8. Create a conditional statement that will take the number of points for that subset (num_points) and will determine the best method for differentiation, and calculate the derivative estimate for each point in the subset. Save these variables into the i row (i.e. on iteration 1, it will save into row 1, iteration 2 into row 2, etc.), and into the columns from subset_start to subset_end within your dfdx_ALL matrix. For example, dfdx_ALL(i,subset_start:subset_end) = [dfdx1 dfdx2];

  • Visually, it should look something like the array below. You will see that for the points that are not overlapping between two subsets, there will only be one derivative in the column. However, for values that ARE overlapping between two different subsets (see columns that belong to two different subset numbers), there will be two derivative values in each of those columns (marked with a circle).
  • Since we have a maximum of 5 points, as per the graphics above showing the subsets, your conditional should only accept values of num_points = 2,3,4, and 5. If anything else occurs, your code should return an error (either using the disp function or the error function).

9. After your loop finishes running, you should use the mean command on dfdx_ALL. This will take the average of all of the columns in your matrix (i.e., each column will be avareged in the direction of the arrows in the image above), omitting any values of nan (use the 'omitnan' flag in the mean command). The function will then return the averages of the columns as a vector, which you should save as dfdx_calc

10. Use Matlab's gradient command on the Q and t arrays to find Matlab's derivatives for these points. Save this to a variable dfdx_matlab. COMMON MISTAKE: students calculate the derivative of t with respect to Q, instead of Q with respect to t by providing the inputs to the gradient command in the wrong order. First it expects what you are taking the derivative of, and second what you are taking the derivative with respect to.

11. On the same graph, create a scatter plot of your t values with your calculated derivative values (dfdx_calc), with a line plot of your t values with Matlab's derivative values (dfdx_matlab). Add a legend and a grid, as well as labels to the x and y axes, and a title.

Problem Description (Microsoft Word 2007 (.docx) 353kB Sep17 21) 

Teaching Notes and Tips

An embedded function "runlength" (from the mathworks file exchange) is required to detect which data subsets have a constant step size. It is nontrivial to understand, so I include a video explanation with the instructions:


Matlab Grader checks if the students have set up their arrays correctly; used loops, if/then statements, plots, and various Matlab commands (e.g., 'gradient' to check their answers), etc. It also checks the correctness of their numerical answers at different points in the code.

References and Resources

The activity is based on P19.23 from "Applied Numerical Methods W / MATLAB: for Engineers & Scientists" (4th edition) by Steven Chapra.