Data classification using Support Vector Machine and geographical data plotting

Ruchita Sharma, Morgan State University,
Author Profile
Initial Publication Date: October 12, 2023

Summary

This activity is designed for undergraduate students to plot geographically various data points using MATLAB representing various locations or countries. The idea stemmed from presenting Coronavirus data of 176 countries. Geographical plots can be created for any data set.
Further, the advance coding has been designed for higher level students to use Support Vector Machines for data classification using Quadratic SVM and kernel tricks.

Share your modifications and improvements to this activity through the Community Contribution Tool »

Learning Goals

Students are required to know the basic MATLAB programming skills including the uploading of excel files in MATLAB. MATLAB is being used for different data set plotting and classification.
The modules can be expanded for various other data sets.

Context for Use

This activity is mainly directed to undergraduate courses to present the data set geographically on maps. Business major math courses can use the same data to represent various locations for sale-purchase data sets.
Various undergraduate courses can use different data sets from different fields to present the data on world map using MATLAB.

The advanced part of data representation using Support Vector Machines focuses on data classification using kernel trick and changing the dimension of data sets.

Description and Teaching Materials

In this activity, 176 countries that were affected by Coronavirus and applied linear and nonlinear SVM on the classification based on the number of cases and deaths in each country.
Geographical plotting was generated in MATLAB to show all the countries affected by Coronavirus and to show number of deaths in each country by the size variation in the affected countries.
Another graph was generated where Blue dots represent 'Large' ratio, Red dots represent 'Medium' ratio and Yellow dots represent 'Small' ratio of cases vs. deaths in MATLAB using Classification learner.

To advance the study of data classification, Kernel trick is used to change the dimensionality of the data from two dimensions to three dimensions in order to classify the petals of Iris flower using well-known Fisher iris data.




Data of 176 countries affected by coronavirus (Excel 2007 (.xlsx) 22kB Oct12 23)

Teaching Notes and Tips

For advance classification, students need to be introduced to SVM and kernel theory.
Data classification techniques can be illustrated theoretically before applications.

Assessment

Students can be given any data set at the end of the course to plot geographically and represent various data aspects depending on the type of data set.

References and Resources

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 111–132.


C. Cortes and V. N. Vapnik (1995) Support vector networks. Machine Learning, (3rd ed, vol. 20), 273-297.

This teaching activity was created as a part of the Teaching Computation with MATLAB Workshop held in 2023 at Carleton College.