To Loop or Not to Loop: Speeding up your code in Matlab
Summary
In this activity, we will perform a single computational experiment multiple times. Each time, we will use different coding techniques, and we will compare the time that it takes Matlab to solve. The experiment that we will be solving is a Monte Carlo Simulation. Keywords: Vectors, Loops, Built-in functions, Tic, Toc, Speed, Monte Carlo
Learning Goals
The students will learn about how, in Matlab , it can be advantageous to use linear-algebra formulations, memory pre-allocation, and built-in functions to speed up their code. They will be coding a Monte-Carlo simulation to solve a fun, and non-intuitive, probability riddle and therefore will be introduced to the idea of Monte-Carlo simulations.
Context for Use
This would be a fun activity for an introductory programming course using Matlab. Keep in mind that, due to the small size problem that we are dealing with in this activity, it may seem, to the students, that the time grain by these techniques is not worth the time to optimize. It is important to remind the students that 1.) They could be dealing with much larger, higher-dimensional, problems in the future where these techniques will save them a lot of time, and 2.) They may be solving problems, such as real-time control algorithm for automation, where they only have 100 microseconds to solve the problem and therefore optimization of the algorithm is also crucial or your machine/robot will crash. So, keep in mind, the goal of this activity is to introduce them to concepts that will speed up their code, but the gain margin may not be as obviously great as it could be in the future when solving more advanced problems.
Description and Teaching Materials
The Story (what are we trying to solve):
Setup: Imagine there is a Mom with two kids. Given: You are told that one of the children is a girl. Question: What is the probablity that the other child is also a girl? Note: We will assume that there is a 50-50 probability of having a girl or a boy. Without using computers or googling anything, what do students think is the correct answer for this riddle?
Student Answer and Correct Answer:
The majority of students will say that the answer is 50% probability that the second child is a girl. But it turns out that the correct answer is that the probability is actually 33%. At this point, students are confused about this answer.
Theoretical Explanation:
It can be explained using basic probability concepts. Think about all the possibilities that could happen if you had two kids: Boy-Boy (BB), BG, GB, and finally GG. Note that there are only 4 possibilities. Out of those 4, one is thrown out (BB), since you have already been told that one of the children is a girl. So, the only options that are left are BG, GB, and GG. Finally, the probability of having a GG is 1/3 using basic probability concepts GG/(GG+BG+GB)
Still Skeptical:
You will still have many students who are skeptical and argue that BG and GB are the same thing, so therefore it GG/(GG+BG) = 1/2.
Experimental Simulation Idea:
Now the programming begins! How would you 1.) Simulate a birth 2.) Simulate two births 3.) Simulate two births for two moms 4.) Extend to N moms 5.) Finally, how do you keep track of the numbers (counting BB, BG, GB, and GG) to estimate the probability numerically. You, as the teacher, should guide them into writing something like the following code, which will be our baseline speed comparison when varying the speed:
BASIC CODE FILE HERE
%%% Initialize %%%
Moms = 100000000;
GG = 0;
BB = 0;
GBorBG = 0;
%%% Loop %%%%%
for ii = 1:Moms
Kid1 = rand<.5; % First Birth is a Boy if 0 and Girl if 1
Kid2 = rand<.5; % Second Birth is a Boy if 0 and Girl if 1
Total = Kid1 + Kid2; % 0 is two Boys and 2 is two Girls
if Total == 2; %Count how many families have two Girls
GG = GG + 1;
elseif Total == 0; %Count how many families have two BOYS
BB = BB + 1;
else
GBorBG = GBorBG + 1; %Count how many families have one GIRL
end
end
toc
GG/(GG+GBorBG) % Calculate Probability
Ludicrous Speed!:
Now we will play with different variations of the above basic algorithm to explore possible speeding up of the code. The following is the order of variations:
- Instead of counting while looping, we will store all the births in a vector and then use the built-in "sum()" function to do the adding at the end. In this code, we will still be using a For-Loop. The benefit of this is a much shorter length of code. It turns out that bit of code is actually slower than the Basic code:
tic for ii = 1:Moms Births(ii) = (rand<.5) + (rand<.5); end toc sum(Births==2)/(Moms - sum(Births==0)) - Note that in Variation 1, Matlab needs to find a new contiguous unbroken block of memory to store the vector at every iteration. This becomes quite cumbersome for Matlab, especially when data becomes large. In this variation, we will add an initial pre-allocation of the memory location, so that Matlab doesn't need to find a location at every loop. This code Variation 2 is now fast and shorter than the Basic code:
Births = zeros(Moms,1); tic for ii = 1:Moms Births(ii) = (rand<.5) + (rand<.5); end toc sum(Births==2)/(Moms - sum(Births==0)) Finally, we will eliminate the For-Loop entirely and take advantage of Matlab's built-in vectorizable functions to run this code without loops. This happens to be the fastest of all the codes and the shortest length of them all also:
tic Births = sum(rand(Moms,2)<.5,2); toc sum(Births==2)/(Moms - sum(Births==0))- One final idea, if time permits, is to realize that this code is parallelizable and students could explore the GPU functionality in Matlab.
Teaching Notes and Tips
- Speeds will vary due to students using different computers.
- No GPU access on Mac Silicon.