Student Voice

Achieving algorithmic fairness in machine learning models of student performance

By Eve Bracken-Ingram

At student voice, we aim to help students succeeded in higher education. A key aspect of this goal in general terms is identifying the students most at risk of underperformance, allowing for preventative measures to be put in place. Machine learning can be used to identify patterns within data and suggest probabilities certain outcomes, and therefore may provide the crucial ability of identifying at risks students early in their higher education journey.

A paper by Karimi-Haghighi et al. [Source] developed a machine learning method which could predict the risks of university dropout and underperformance based on factors which are known at university enrolment. Several factors have been identified as indicators for potential academic struggles including student demographic, high school type and location, and average admission grade were considered. Although these indicators have been linked to higher education performance in literature, it is important to note that underperformance may be due to a variety of personal or institutional reasons which are difficult to quantify. In addition, several subgroups are often over-represented in university dropout rates, a fact which must be carefully considered when assessing potential underperformance.

Algorithmic fairness in machine learning is extremely important. It has several, sometimes conflicting definitions, as explored in a further Student Voice blog article [1]. However, in its simplest form, algorithmic fairness means ensuring that models do not display a discriminatory bias towards certain groups. As machine learning models are trained upon input data, bias can be inadvertently built into a model. In Karimi-Haghighi et al.’s exploration of student underperformance, fairness is measured through the error rate metrics generalised false positive rate (GFPR) and generalised false negative rate (GFNR). In addition to these error metrics, calibration was also monitored. Calibration can be defined as an estimate of the probability of an outcome and maintaining calibration allows for results to be more easily interpreted across groups. Accuracy was also considered in the development of the model as Area Under the ROC Curve (AUC). Models were passed through a bias mitigation procedure which aimed to equalise error rates whilst maintaining constant calibration [2].

The model was trained and tested using a dataset of 881 computer science students. This dataset was analysed per group, considering age, gender, nationality, academic performance, and high school type. In this set, foreign students were significantly more likely to underperform than their national counterparts. Additionally, students who failed a course or were required to resit an exam in first year showed greater risk of dropout. This dataset had an incredibly high gender imbalance and therefore the SMOTE algorithm [3] was used to balance distribution by randomly sampling existing minority cases using interpolation.

A Multi-Layer Perceptron with a hidden layer of 100 neutrons was found to show the best results. It showed good AUC when compared to existing studies, although males and students with lower admission grades were found to have higher accuracy than their counterparts. Across groups, models showed good equity in GFNR. This means that models were equally as likely to falsely predict a negative outcome regardless of gender, age, nationality, high school or academic performance. The GFPR showed greater disparity, particularly biased against students with low admission grades, but fairness was increased following bias mitigation. Through this process, equity was also increased for GFNR and AUC across most groups.

The creation of a model which was able to fairly predict dropout and underperformance of higher education students based on information known at enrolment allows for opportunities for educational improvement to be identified. In addition to providing additional support for at risk students, there is also the possibility for further opportunities to be offered to potentially successful students. Accurate prediction allows for resources to be carefully and meaningfully allocated, and fairness ensures that certain student groups are not subject to unfair bias.

References

[Source] Marzieh Karimi-Haghighi, Carlos Castillo, Davinia Hernández-Leo and Veronica Moreno Oliver (2021) Predicting Early Dropout: Calibration and Algorithmic Fairness Considerations. Companion Proceedings 11th International Conference on Learning Analytics & Knowledge
DOI: 10.48550/arXiv.2103.09068

[1] David Griffin, Definitions of Fairness in Machine Learning Explained Through Examples. Student Voice

[2] Pleiss, G. a. (2017). On fairness and calibration. Advances in Neural Information Processing
DOI: 10.48550/arXiv.1709.02012

[3] Chawla, N. V. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 321-357
DOI: 10.48550/arXiv.1106.1813

Related Entries