Introductory Regression Fits of Nuclear Binding Energies
Summary
This is a project based assignment that is a practical demonstration of MATLAB's Regression Learner App. The project attachments include two files containing training and test datasets of nuclear binding energy from the Atomic Mass Evaluation (AME) 2020.
Learning Goals
This is intended to provide possible insights on the different machine learning approaches, specifically how to compare the accuracy of the models, how fast they can compute, and ultimately how well they work.
Context for Use
This assignment is intended for students that are first-year students (predominately science majors) in their first semester of college. This assignment requires no programming knowledge or proficiency.
Description and Teaching Materials
The following includes step-by-step instructions intended to guide you to load training and test datasets of nuclear binding energy, and to train and test a variety of machine learning approaches using MATLAB Online. Please note that there are many additional options available within MATLAB. These instructions will intentionally overlook and use the default settings.
These first steps download and upload files, start MATLAB, start the Regression Learner App, load training and test files, test many machine learning models, and generate a results table.
Step 1: Download the binding energy training Binding Energy training data (Text File 33kB Oct7 25) and test data files Binding energy test data (Text File 8kB Oct7 25).
Step 2: Go to MATLAB's Website and open MATLAB Online.
Step 3: Upload the two downloaded files of binding energy data into your MATLAB Drive.
Step 4: In the Apps tab click on the Regression Learner button in the Machine Learning and Deep Learning group to begin the Regression Learner App.
Step 5: Click on the New Session button and select From File and then choose to open the BETrain.txt file.
Step 6: Click Import Selection in the next window that opens and then Start Session in the next window.
Step 7: Click on the Test Tab, then the Test Data button, and then choose to open the BETest.txt file.
Step 8: Click on the Learn Tab and then click on the All button (which will train multiple ML models).
Step 9: Click Train All button. (This will train the multiple models which may take a few minutes to complete.)
Step 10: Click on the Test Tab, then Test All button
Step 11: Click on the Results Table button. (This data will need to be rearranged and sorted to gain insights. )
Step 12: Please use the buttons above the results table to add columns (as needed) and use the scroll bar at the bottom of the table to help navigate in the process of addressing the subsequent questions.
Answers to the following questions will be submitted for this assignment. Please answer:
Question 1: What is the value of best at reproducing the validation data (based on the RMSE standard deviation metric)?
Question 2: What is the value of best at reproducing the training data?
Question 3: Which model(s) generated the best results?
Question 4: Which model produced the fastest prediction speed?
Question 5: How do the fastest and slowest prediction speeds compare? (In other words, how many multiples of the fastest is the slowest?)
Question 6: Which model resulted in the smallest compact model size?
Question 7: How many multiples of the smallest compact model size is the largest?
Step 13: Lastly, produce a global Shapley Summary plot (which can be found in the Explain tab).
Question 8: Describe what information is shown in the Shapley Summary plot.
Now, let's discuss the best machine learning model and how to make it better.
Question 9: How well does the best model reproduce the binding energies? (In other words, what percentage of the average value is this?)
Question 10: What else could be done to the data and/or the training process to produce a more accurate model?
Teaching Notes and Tips
All students will likely get similar results for each question.
Assessment
Rubric for grading:
- 70% completion of all the tasks, and
- 3% per answer to 1-10.
References and Resources
This project uses AME 2020 data.
This data set is used in an example in chapter 3 of Neural Networks for Physical Sciences.
And the same grouping of test and training data have been used in High precision binding energies from physical-feature-instructed machine learning and Further exploration of binding energy residuals using machine learning and the development of a composite ensemble model.