Enhancing Student Success Prediction Using XGBoost

P
r
e
d
i
c
t
 
s
t
u
d
e
n
t
s
'
d
r
o
p
o
u
t
 
a
n
d
a
c
a
d
e
m
i
c
 
s
u
c
c
e
s
s
Sreeja Cheekireddy
Mariah S Banu
Randall Krabbe
Laya Karimi
Shubhechchha Niraula
There is a rising concern regarding
increasing rates of student
underperformance and dropout in
higher education institutions.
Past research has often relied on mid-
term or end-term results to predict
student performance.
Higher education institutions possess
substantial amounts of data from the
point of student enrollment.
B
a
c
k
g
r
o
u
n
d
 
S
I
U
E
 
F
a
c
t
s
Identification of at-risk students at the
start of their academic journey enables
timely interventions.
Providing personalized support to help
students reach their academic
potential.
Expanding from "fail/success" to
include "relative success" for a
nuanced understanding.
A
 model must be built that is not
limited to a specific field of study but
can be generalized across courses in
the institution.
M
o
t
i
v
a
t
i
o
n
A dataset created from 
students enrolled in undergraduate courses of
Polytechnic Institute of Portalegre, Portugal. The data refers to records of
students enrolled between academic years 2008/09–2018/2019 and from
different undergraduate degrees[2].
Each record was classified as Success, Relative Success and Failure,
depending on the time that the student took to obtain her degree[2]
Logistic Regression (LR) , Support Vector Machines (SVM) , Decision Trees
(DT)  and an ensemble method, Random Forests (RF) .
Boosting Classifications :Gradient Boosting, Extreme Gradient Boosting,
Logit Boost, CatBoost
P
a
s
t
 
r
e
s
e
a
r
c
h
D
a
t
a
s
e
t
Polytechnic Institute of Portalegre, Portugal
Academic years 2008-09 through 2018-19
4424 instances
36 features
The features consist of:
Demographic data
Application info
Course info
Student info
Results from the first two semesters
Home country info
The response is whether the student dropped out,
graduated, or is still enrolled.
 
D
a
t
a
 
C
a
t
e
g
o
r
i
z
a
t
i
o
n
XGBoost is an enhanced version of
gradient boosting, which is an ensemble
learning technique that pays attention to
misclassified results to iteratively improve
the performance and minimize loss using
the gradient descent function.
We will be using XGBoost Classification
technique for our dataset. Although
existing literature has used it, our focus
will be on improving the accuracy by
employing techniques like:
1.
Feature Selection
2.
Data Augmentation to balance the
dataset
3.
Hyperparameter tuning
M
e
t
h
o
d
s
K-means clustering is a popular
algorithm for partitioning data into
clusters by maximizing similarity
inside a group and minimizing
similarity outside a group
Existing literature does not try using
clustering methods. We will not only
try to better understand our dataset
through clustering methods (which
can also in turn help in feature
selection) but also derive inferences
about relations between the data.
M
e
t
h
o
d
s
 
(
C
o
n
t
.
)
I
n
t
e
n
d
e
d
 
e
x
p
e
r
i
m
e
n
t
s
Feature Selection
Some unnecessary features can
be omitted, to see if it improves
accuracy.
Clustering
Clustering the features into
different groups.
Hyperparameter tuning
80-20 split
75-25 split
 Different types of methods to resolve
overfitting, such as K-fold cross-validation,
early stopping, and regularization.
Data Augmentation
This is to balance the dataset.
This might improve the
performance.
Metrics
Accuracy
C
orrect predictions
divided by the total
number of predictions
across all classes
F1 Score
Computes The number of
times a model made a
correct prediction across
the entire dataset
Recall
TP/(TP+FN) : ratio of true
positives to the total
number of positive
samples
Realinho,Valentim, Vieira Martins,Mónica, Machado,Jorge, and Baptista,Luís. (2021).
Predict students' dropout and academic success. UCI Machine Learning Repository.
https://doi.org/10.24432/C5MC89
.
Realinho, Valentim, Jorge Machado, Luís Baptista, and Mónica V. Martins. 2022.
"Predicting Student Dropout and Academic Success" 
Data
 7, no. 11: 146.
https://doi.org/10.3390/data7110146
https://www.siue.edu/inrs/factbook/
“Fig. 3 a General Architecture of XGBoost.” ResearchGate,
www.researchgate.net/figure/A-general-architecture-of-XGBoost_fig3_335483097.
Harezlak, Armando Teixeira-Pinto & Jaroslaw. 2 K-Means Clustering | Machine Learning
for Biostatistics. Bookdown.org, bookdown.org/tpinto_home/Unsupervised-learning/k-
means-clustering.html.
R
e
f
e
r
e
n
c
e
s
T
h
a
n
k
 
y
o
u
Slide Note
Embed
Share

There is a growing concern about academic performance in higher education institutions. This project aims to predict student dropout and success using XGBoost, focusing on early identification of at-risk students to provide personalized support. Leveraging data from Polytechnic Institute of Portalegre, Portugal, techniques such as feature selection, data augmentation, and hyperparameter tuning will be employed to improve predictive accuracy.

  • Student Success
  • Dropout Prediction
  • XGBoost
  • Higher Education
  • Data Analysis

Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Predict students' dropout and academic success Sreeja Cheekireddy Mariah S Banu Randall Krabbe Laya Karimi Shubhechchha Niraula

  2. Background There is a rising concern regarding increasing rates underperformance higher education institutions. Past research has often relied on mid- term or end-term results to predict student performance. Higher education institutions possess substantial amounts of data from the point of student enrollment. of student and dropout in

  3. SIUE Facts

  4. Motivation Identification of at-risk students at the start of their academic journey enables timely interventions. Providing personalized support to help students reach potential. Expanding from include "relative nuanced understanding. A model must be built that is not limited to a specific field of study but can be generalized across courses in the institution. their academic "fail/success" success" to a for

  5. Past research A dataset created from students enrolled in undergraduate courses of Polytechnic Institute of Portalegre, Portugal. The data refers to records of students enrolled between academic years 2008/09 2018/2019 and from different undergraduate degrees[2]. Each record was classified as Success, Relative Success and Failure, depending on the time that the student took to obtain her degree[2] Logistic Regression (LR) , Support Vector Machines (SVM) , Decision Trees (DT) and an ensemble method, Random Forests (RF) . Boosting Classifications :Gradient Boosting, Extreme Gradient Boosting, Logit Boost, CatBoost

  6. Dataset Polytechnic Institute of Portalegre, Portugal Academic years 2008-09 through 2018-19 4424 instances 36 features The features consist of: Demographic data Application info Course info Student info Results from the first two semesters Home country info The response is whether the student dropped out, graduated, or is still enrolled.

  7. Data Categorization

  8. Methods XGBoost is an enhanced version of gradient boosting, which is an ensemble learning technique that pays attention to misclassified results to iteratively improve the performance and minimize loss using the gradient descent function. We will be using XGBoost Classification technique for our dataset. Although existing literature has used it, our focus will be on improving the accuracy by employing techniques like: 1. Feature Selection 2. Data Augmentation to balance the dataset 3. Hyperparameter tuning

  9. Methods (Cont.) K-means algorithm for partitioning data into clusters by maximizing inside a group similarity outside a group clustering is a popular similarity minimizing and Existing literature does not try using clustering methods. We will not only try to better understand our dataset through clustering methods (which can also in turn help in feature selection) but also derive inferences about relations between the data.

  10. Intended experiments Feature Selection Some unnecessary features can be omitted, to see if it improves accuracy. Clustering Clustering the features into different groups. Data Augmentation This is to balance the dataset. This might improve the performance. Hyperparameter tuning 80-20 split 75-25 split Different types of methods to resolve overfitting, such as K-fold cross-validation, early stopping, and regularization.

  11. Metrics Accuracy F1 Score Recall Correct predictions divided by the total number of predictions across all classes Computes The number of times a model made a correct prediction across the entire dataset TP/(TP+FN) : ratio of true positives to the total number of positive samples

  12. References Realinho,Valentim, Vieira Martins,M nica, Machado,Jorge, and Baptista,Lu s. (2021). Predict students' dropout and academic success. UCI Machine Learning Repository. https://doi.org/10.24432/C5MC89. Realinho, Valentim, Jorge Machado, Lu s Baptista, and M nica V. Martins. 2022. "Predicting Student Dropout and Academic Success" Data 7, no. 11: 146. https://doi.org/10.3390/data7110146 https://www.siue.edu/inrs/factbook/ Fig. 3 a General Architecture of XGBoost. ResearchGate, www.researchgate.net/figure/A-general-architecture-of-XGBoost_fig3_335483097. Harezlak, Armando Teixeira-Pinto & Jaroslaw. 2 K-Means Clustering | Machine Learning for Biostatistics. Bookdown.org, bookdown.org/tpinto_home/Unsupervised-learning/k- means-clustering.html.

  13. Thank you

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#