Data classification using Support Vector Machine and geographical data plotting
Ruchita Sharma, Morgan State University,
Initial Publication Date: October 12, 2023
Summary
This activity is designed for undergraduate students to plot geographically various data points using MATLAB representing various locations or countries. The idea stemmed from presenting Coronavirus data of 176 countries. Geographical plots can be created for any data set.
Further, the advance coding has been designed for higher level students to use Support Vector Machines for data classification using Quadratic SVM and kernel tricks.
Topics
Mathematics
Grade Level
College Lower (13-14), College Upper (15-16), Graduate/Professional
Readiness for Online Use
Online Adaptable
Share your modifications and improvements to this activity through the Community Contribution Tool »
Learning Goals
Students are required to know the basic MATLAB programming skills including the uploading of excel files in MATLAB. MATLAB is being used for different data set plotting and classification.
The modules can be expanded for various other data sets.
Context for Use
This activity is mainly directed to undergraduate courses to present the data set geographically on maps. Business major math courses can use the same data to represent various locations for sale-purchase data sets.
Various undergraduate courses can use different data sets from different fields to present the data on world map using MATLAB.
The advanced part of data representation using Support Vector Machines focuses on data classification using kernel trick and changing the dimension of data sets.
Description and Teaching Materials
In this activity, 176 countries that were affected by Coronavirus and applied linear and nonlinear SVM on the classification based on the number of cases and deaths in each country.
Geographical plotting was generated in MATLAB to show all the countries affected by Coronavirus and to show number of deaths in each country by the size variation in the affected countries.
Another graph was generated where Blue dots represent 'Large' ratio, Red dots represent 'Medium' ratio and Yellow dots represent 'Small' ratio of cases vs. deaths in MATLAB using Classification learner.
To advance the study of data classification, Kernel trick is used to change the dimensionality of the data from two dimensions to three dimensions in order to classify the petals of Iris flower using well-known Fisher iris data.
This plot shows the world map of countries affected by coronavirus.
Provenance: Ruchita Sharma, Morgan State University
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
In this plot, big dots represents more death cases and small dots represent less death cases in various countries affected by coronavirus.
Provenance: Ruchita Sharma, Morgan State University
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Data of 176 countries affected by coronavirus (Excel 2007 (.xlsx) 22kB Oct12 23)
Provenance: Ruchita Sharma, Morgan State University
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Data set's dimension was changed from 2 D to 3D to make the classification using SVM easier.
Provenance: Ruchita Sharma, Morgan State University
Reuse: This item is offered under a Creative Commons Attribution-NonCommercial-ShareAlike license http://creativecommons.org/licenses/by-nc-sa/3.0/ You may reuse this item for non-commercial purposes as long as you provide attribution and offer any derivative works under a similar license.
Teaching Notes and Tips
For advance classification, students need to be introduced to SVM and kernel theory.
Data classification techniques can be illustrated theoretically before applications.
Assessment
Students can be given any data set at the end of the course to plot geographically and represent various data aspects depending on the type of data set.
References and Resources
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 111–132.
C. Cortes and V. N. Vapnik (1995) Support vector networks. Machine Learning, (3rd ed, vol. 20), 273-297.
This teaching activity was created as a part of the Teaching Computation with MATLAB Workshop held in 2023 at Carleton College.