Publications
Publications
- February 2021
Assessing Prediction Accuracy of Machine Learning Models
By: Michael Toffel and Natalie Epstein
Abstract
This video describes how to assess the accuracy of machine learning prediction models, primarily in the context of machine learning models that predict binary outcomes, such as logistic regression, random forest, or nearest neighbor models. After introducing and differentiating the concepts of training and testing data, the video presents the confusion matrix and uses it to describe a series of accuracy metrics including true/false positives/negatives, true positive rate (sensitivity or recall), false negative rate (Type II error rate), precision, true negative rate (specificity), and false positive rate (Type I error rate). It also addresses the impact of setting thresholds to convert continuous predictions to binary classifications, and describes the receiver operating characteristic curve (ROC curve) and area under the curve (AUC). Several examples are provided, including one where during the middle of a course, an instructor is viewing students’ interim grades, attendance, and participation, and needs to decide which students should seek tutoring to avoid a poor final grade. This video can be assigned in conjunction with the “Assessing Prediction Accuracy of Machine Learning Models” technical note (HBS No. 621045).
Keywords
Citation
Toffel, Michael, and Natalie Epstein. Assessing Prediction Accuracy of Machine Learning Models. Harvard Business School Tutorial 621-706, February 2021.