Enhancing Student Success Prediction Using XGBoost

Sreeja Cheekireddy

Mariah S Banu

Randall Krabbe

Laya Karimi

Shubhechchha Niraula

●

There is a rising concern regarding

increasing rates of student

underperformance and dropout in

higher education institutions.

●

Past research has often relied on mid-

term or end-term results to predict

student performance.

●

Higher education institutions possess

substantial amounts of data from the

point of student enrollment.

●

Identification of at-risk students at the

start of their academic journey enables

timely interventions.

●

Providing personalized support to help

students reach their academic

potential.

●

Expanding from "fail/success" to

include "relative success" for a

nuanced understanding.

●

 model must be built that is not

limited to a specific field of study but

can be generalized across courses in

the institution.

•

A dataset created from

students enrolled in undergraduate courses of

Polytechnic Institute of Portalegre, Portugal. The data refers to records of

students enrolled between academic years 2008/09–2018/2019 and from

different undergraduate degrees[2].

•

Each record was classified as Success, Relative Success and Failure,

depending on the time that the student took to obtain her degree[2]

•

Logistic Regression (LR) , Support Vector Machines (SVM) , Decision Trees

(DT)  and an ensemble method, Random Forests (RF) .

•

Boosting Classifications :Gradient Boosting, Extreme Gradient Boosting,

Logit Boost, CatBoost

Polytechnic Institute of Portalegre, Portugal

Academic years 2008-09 through 2018-19

4424 instances

36 features

The features consist of:

•

Demographic data

•

Application info

•

Course info

•

Student info

•

Results from the first two semesters

•

Home country info

The response is whether the student dropped out,

graduated, or is still enrolled.

XGBoost is an enhanced version of

gradient boosting, which is an ensemble

learning technique that pays attention to

misclassified results to iteratively improve

the performance and minimize loss using

the gradient descent function.

We will be using XGBoost Classification

technique for our dataset. Although

existing literature has used it, our focus

will be on improving the accuracy by

employing techniques like:

1.

Feature Selection

2.

Data Augmentation to balance the

dataset

3.

Hyperparameter tuning

●

K-means clustering is a popular

algorithm for partitioning data into

clusters by maximizing similarity

inside a group and minimizing

similarity outside a group

●

Existing literature does not try using

clustering methods. We will not only

try to better understand our dataset

through clustering methods (which

can also in turn help in feature

selection) but also derive inferences

about relations between the data.

Feature Selection

Some unnecessary features can

be omitted, to see if it improves

accuracy.

Clustering

Clustering the features into

different groups.

Hyperparameter tuning

•

80-20 split

•

75-25 split

•

 Different types of methods to resolve

overfitting, such as K-fold cross-validation,

early stopping, and regularization.

Data Augmentation

This is to balance the dataset.

This might improve the

performance.

Metrics

Accuracy

orrect predictions

divided by the total

number of predictions

across all classes

F1 Score

Computes The number of

times a model made a

correct prediction across

the entire dataset

Recall

TP/(TP+FN) : ratio of true

positives to the total

number of positive

samples

●

Realinho,Valentim, Vieira Martins,Mónica, Machado,Jorge, and Baptista,Luís. (2021).

Predict students' dropout and academic success. UCI Machine Learning Repository.

https://doi.org/10.24432/C5MC89

●

Realinho, Valentim, Jorge Machado, Luís Baptista, and Mónica V. Martins. 2022.

"Predicting Student Dropout and Academic Success"

Data

 7, no. 11: 146.

https://doi.org/10.3390/data7110146

●

https://www.siue.edu/inrs/factbook/

●

“Fig. 3 a General Architecture of XGBoost.” ResearchGate,

www.researchgate.net/figure/A-general-architecture-of-XGBoost_fig3_335483097.

●

Harezlak, Armando Teixeira-Pinto & Jaroslaw. 2 K-Means Clustering | Machine Learning

for Biostatistics. Bookdown.org, bookdown.org/tpinto_home/Unsupervised-learning/k-

means-clustering.html.

Slide Note

Embed Share

Download

There is a growing concern about academic performance in higher education institutions. This project aims to predict student dropout and success using XGBoost, focusing on early identification of at-risk students to provide personalized support. Leveraging data from Polytechnic Institute of Portalegre, Portugal, techniques such as feature selection, data augmentation, and hyperparameter tuning will be employed to improve predictive accuracy.

aest Follow

Uploaded on Sep 15, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Predict students' dropout and academic success Sreeja Cheekireddy Mariah S Banu Randall Krabbe Laya Karimi Shubhechchha Niraula

Background There is a rising concern regarding increasing rates underperformance higher education institutions. Past research has often relied on mid- term or end-term results to predict student performance. Higher education institutions possess substantial amounts of data from the point of student enrollment. of student and dropout in

SIUE Facts

Motivation Identification of at-risk students at the start of their academic journey enables timely interventions. Providing personalized support to help students reach potential. Expanding from include "relative nuanced understanding. A model must be built that is not limited to a specific field of study but can be generalized across courses in the institution. their academic "fail/success" success" to a for

Past research A dataset created from students enrolled in undergraduate courses of Polytechnic Institute of Portalegre, Portugal. The data refers to records of students enrolled between academic years 2008/09 2018/2019 and from different undergraduate degrees[2]. Each record was classified as Success, Relative Success and Failure, depending on the time that the student took to obtain her degree[2] Logistic Regression (LR) , Support Vector Machines (SVM) , Decision Trees (DT) and an ensemble method, Random Forests (RF) . Boosting Classifications :Gradient Boosting, Extreme Gradient Boosting, Logit Boost, CatBoost

Dataset Polytechnic Institute of Portalegre, Portugal Academic years 2008-09 through 2018-19 4424 instances 36 features The features consist of: Demographic data Application info Course info Student info Results from the first two semesters Home country info The response is whether the student dropped out, graduated, or is still enrolled.

Data Categorization

Methods XGBoost is an enhanced version of gradient boosting, which is an ensemble learning technique that pays attention to misclassified results to iteratively improve the performance and minimize loss using the gradient descent function. We will be using XGBoost Classification technique for our dataset. Although existing literature has used it, our focus will be on improving the accuracy by employing techniques like: 1. Feature Selection 2. Data Augmentation to balance the dataset 3. Hyperparameter tuning

Methods (Cont.) K-means algorithm for partitioning data into clusters by maximizing inside a group similarity outside a group clustering is a popular similarity minimizing and Existing literature does not try using clustering methods. We will not only try to better understand our dataset through clustering methods (which can also in turn help in feature selection) but also derive inferences about relations between the data.

Intended experiments Feature Selection Some unnecessary features can be omitted, to see if it improves accuracy. Clustering Clustering the features into different groups. Data Augmentation This is to balance the dataset. This might improve the performance. Hyperparameter tuning 80-20 split 75-25 split Different types of methods to resolve overfitting, such as K-fold cross-validation, early stopping, and regularization.

Metrics Accuracy F1 Score Recall Correct predictions divided by the total number of predictions across all classes Computes The number of times a model made a correct prediction across the entire dataset TP/(TP+FN) : ratio of true positives to the total number of positive samples

References Realinho,Valentim, Vieira Martins,M nica, Machado,Jorge, and Baptista,Lu s. (2021). Predict students' dropout and academic success. UCI Machine Learning Repository. https://doi.org/10.24432/C5MC89. Realinho, Valentim, Jorge Machado, Lu s Baptista, and M nica V. Martins. 2022. "Predicting Student Dropout and Academic Success" Data 7, no. 11: 146. https://doi.org/10.3390/data7110146 https://www.siue.edu/inrs/factbook/ Fig. 3 a General Architecture of XGBoost. ResearchGate, www.researchgate.net/figure/A-general-architecture-of-XGBoost_fig3_335483097. Harezlak, Armando Teixeira-Pinto & Jaroslaw. 2 K-Means Clustering | Machine Learning for Biostatistics. Bookdown.org, bookdown.org/tpinto_home/Unsupervised-learning/k- means-clustering.html.

Thank you

Enhancing Student Success Prediction Using XGBoost

Download Presentation

Presentation Transcript

Related

More Related Content