Regression Methods on real time and synthetic datasets

Rizwan Qureshi, City University of Hong Kong, Electrical Engineering
Author Profile

Summary

In this activity, we will learn five popular regression algorithms. We also test their behavior for different number of data sizes, With/without outliers and different orders.
- Least Squares (LS)
- Regularized Least Squares RLS
- Least Absolute Shrinkage and selection operator (LASSO)
- Robust Regression (RR)
- Bayesian Regression (BR)

Learning Goals

The main objective of this activity is to apply the knowledge of regression models and probability distributions learnt in the course of machine learning to a real life problem.
MATLAB is used as the primary programming language to solve the problem. The problem not only develops good understanding of the regression models but also critical thinking, data analysis and data visualization skills. The problems are developed in a flexible that allows the students to test a number of parameter and their effect on the results. So, that they can understand the concept more clearly.

Context for Use

This activity is useful for Machine Learning CS5487. We have covered linear algebra, probability distribution functions, Gaussian mixture models and least square algorithm. This can also be used in Probability and Stochastic Processes. This activity can be used to teach graduate level machine learning course. The pre-requisite are basic programming, linear algebra, calculus and probability distributions.
Students should have good programming skills in MATLAB, writing their own functions, using loops and matrix multiplications.
Good knowledge of regression models, programming skills and critical thinking are desired.

Description and Teaching Materials

This activity can be used as a programming assignment for Machine Learning class.
Materials included:

Problem set for activity in PDF
Data set as .mat file. Data set
Solution Set PDF

Solution Set Zip file

Problem Statement (Acrobat (PDF) 219kB Aug19 19)

ProgAssignmentPart1 (Matlab File 2kB Aug19 19)

Count Data (Matlab .MAT File 68kB Aug19 19)

poly_data.mat (Matlab .MAT File 3kB Oct24 19)

Part 2 (Matlab File 2kB Aug19 19)

Teaching Notes and Tips

This assignment includes separate files for regression models and a main file. The code is very easy to understand. The values of the order of the function, size of the data can be changes by the user to see the effect on the performance of the model.

Assessment

1. By examining their code, results and data visualization.
2. Comments on the model and effect of changing the parameter.
3. Critical analysis of the data and analyze regression model is suitable for the data pattern.

References and Resources

1. Chan, Antoni B., and Nuno Vasconcelos. "Counting people with low-level features and Bayesian regression." IEEE Transactions on Image Processing 21.4 (2011): 2160-2177.