Enhancing Student Success Prediction Using XGBoost
There is a growing concern about academic performance in higher education institutions. This project aims to predict student dropout and success using XGBoost, focusing on early identification of at-risk students to provide personalized support. Leveraging data from Polytechnic Institute of Portalegre, Portugal, techniques such as feature selection, data augmentation, and hyperparameter tuning will be employed to improve predictive accuracy.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Predict students' dropout and academic success Sreeja Cheekireddy Mariah S Banu Randall Krabbe Laya Karimi Shubhechchha Niraula
Background There is a rising concern regarding increasing rates underperformance higher education institutions. Past research has often relied on mid- term or end-term results to predict student performance. Higher education institutions possess substantial amounts of data from the point of student enrollment. of student and dropout in
Motivation Identification of at-risk students at the start of their academic journey enables timely interventions. Providing personalized support to help students reach potential. Expanding from include "relative nuanced understanding. A model must be built that is not limited to a specific field of study but can be generalized across courses in the institution. their academic "fail/success" success" to a for
Past research A dataset created from students enrolled in undergraduate courses of Polytechnic Institute of Portalegre, Portugal. The data refers to records of students enrolled between academic years 2008/09 2018/2019 and from different undergraduate degrees[2]. Each record was classified as Success, Relative Success and Failure, depending on the time that the student took to obtain her degree[2] Logistic Regression (LR) , Support Vector Machines (SVM) , Decision Trees (DT) and an ensemble method, Random Forests (RF) . Boosting Classifications :Gradient Boosting, Extreme Gradient Boosting, Logit Boost, CatBoost
Dataset Polytechnic Institute of Portalegre, Portugal Academic years 2008-09 through 2018-19 4424 instances 36 features The features consist of: Demographic data Application info Course info Student info Results from the first two semesters Home country info The response is whether the student dropped out, graduated, or is still enrolled.
Methods XGBoost is an enhanced version of gradient boosting, which is an ensemble learning technique that pays attention to misclassified results to iteratively improve the performance and minimize loss using the gradient descent function. We will be using XGBoost Classification technique for our dataset. Although existing literature has used it, our focus will be on improving the accuracy by employing techniques like: 1. Feature Selection 2. Data Augmentation to balance the dataset 3. Hyperparameter tuning
Methods (Cont.) K-means algorithm for partitioning data into clusters by maximizing inside a group similarity outside a group clustering is a popular similarity minimizing and Existing literature does not try using clustering methods. We will not only try to better understand our dataset through clustering methods (which can also in turn help in feature selection) but also derive inferences about relations between the data.
Intended experiments Feature Selection Some unnecessary features can be omitted, to see if it improves accuracy. Clustering Clustering the features into different groups. Data Augmentation This is to balance the dataset. This might improve the performance. Hyperparameter tuning 80-20 split 75-25 split Different types of methods to resolve overfitting, such as K-fold cross-validation, early stopping, and regularization.
Metrics Accuracy F1 Score Recall Correct predictions divided by the total number of predictions across all classes Computes The number of times a model made a correct prediction across the entire dataset TP/(TP+FN) : ratio of true positives to the total number of positive samples
References Realinho,Valentim, Vieira Martins,M nica, Machado,Jorge, and Baptista,Lu s. (2021). Predict students' dropout and academic success. UCI Machine Learning Repository. https://doi.org/10.24432/C5MC89. Realinho, Valentim, Jorge Machado, Lu s Baptista, and M nica V. Martins. 2022. "Predicting Student Dropout and Academic Success" Data 7, no. 11: 146. https://doi.org/10.3390/data7110146 https://www.siue.edu/inrs/factbook/ Fig. 3 a General Architecture of XGBoost. ResearchGate, www.researchgate.net/figure/A-general-architecture-of-XGBoost_fig3_335483097. Harezlak, Armando Teixeira-Pinto & Jaroslaw. 2 K-Means Clustering | Machine Learning for Biostatistics. Bookdown.org, bookdown.org/tpinto_home/Unsupervised-learning/k- means-clustering.html.