Core Methods in Educational Data Mining - EDU691 Spring 2019

 
Core Methods in
Educational Data Mining
 
EDUC691
Spring 2019
 
The Homework
 
Let’s go over basic homework 1
 
The Homework
 
Let’s go over basic homework 1
 
Who did the assignment in Python?
Who did the assignment in RapidMiner?
 
RapidMiner folks
 
How well did you succeed in making the tool
work?
 
What were some of the biggest challenges?
 
Python folks
 
How well did you succeed in making the tool
work?
 
What were some of the biggest challenges?
 
Did it make a difference?
 
When you ran Decision Tree/W-J48 with an
without student as a variable in the data set?
What was the difference?
 
 
Did it make a difference?
 
When you ran Decision Tree/W-J48 with an
without student as a variable in the data set?
What was the difference?
 
Why might RapidMiner and Python produce
different results for this?
 
Removing student from the model
 
How did you remove student from the model?
 
There were multiple ways to accomplish this
 
How would you know…
 
If you were over-fitting to student?
 
Or any variable, for that matter?
 
What are some variables…
 
That could cause your model not to apply to
new data sets you might be interested in?
 
Student is one example… what else?
 
Did it make a difference?
 
What happens when you turn on cross-
validation?
 
Questions? Comments? Concerns?
 
 
How are you liking
RapidMiner and Python?
 
 
Other RapidMiner or Python
questions?
 
 
Note…
 
Python and RapidMiner have a different set of
algorithms available
 
Python’s set tends to be more recent
 
But it’s not totally clear they are *better*
 
We’ll come back to this when we discuss Hand
 
What is the difference between a
classifier and a regressor?
 
 
What are some things
you might use a classifier for,
in education?
 
 
Bonus points for examples other than those in
the BDE videos
 
Any questions about any
classification algorithms?
 
 
Do folks feel like they understood
logistic regression?
 
Any questions?
 
Logistic Regression
 
m = 0.5A - B + C
 
Logistic Regression
 
m = 0.5A - B + C
 
Logistic Regression
 
m = 0.5A - B + C
 
Logistic Regression
 
m = 0.5A - B + C
 
Logistic Regression
 
m = 0.5A - B + C
 
Why would someone
 
Use a decision tree rather than, say, logistic
regression?
 
Has anyone
 
Used any classification algorithms outside the
set discussed/recommended in the videos?
 
Say more?
 
Other questions, comments, concerns
about lectures?
 
 
Did anyone read Hand article?
 
Thoughts?
 
What is Hand’s main thesis?
 
 
What is Hand’s main thesis?
 
Who thinks it makes sense?
 
Who thinks he’s probably wrong?
 
What is Hand’s main thesis?
 
Who thinks it makes sense?
 
Who thinks he’s probably wrong?
 
Please present arguments in favor of each
perspective
 
If he is wrong
 
Why do simple algorithms work well for many
problems?
 
If he is right
 
Why have some algorithms like recurrent
neural networks become so popular?
 
If he is right
 
Why have some algorithms like recurrent
neural networks become so popular?
 
Note that many of the key successes have
been in very large scale data sets like voice
recognition
 
One of Hand’s key arguments
 
Data points trained on are not usually drawn
from the same distribution
As the data points where the classifier will be
applied
 
One of Hand’s key arguments
 
Data points trained on are not usually drawn
from the same distribution
As the data points where the classifier will be
applied
 
Is this a plausible argument for educational
data mining?
 
One of Hand’s key arguments
 
Data points trained on are not usually drawn
from the same distribution
As the data points where the classifier will be
applied
 
Is this a plausible argument for large-scale
voice recognition technology?
 
Another of Hand’s key arguments
 
Data points trained on are often treated as
certainly true and objective
But they are often arbitrary and uncertain
 
Another of Hand’s key arguments
 
Data points trained on are often treated as
certainly true and objective
But they are often arbitrary and uncertain
 
Is this a plausible argument for educational
data mining?
 
Another of Hand’s key arguments
 
Data points trained on are often treated as
certainly true and objective
But they are often arbitrary and uncertain
 
Is this a plausible argument for large-scale
speech recognition?
 
Note
 
Hand refers to these issues as over-fitting
 
But they are a specific type of over-fitting that
is relevant to some problems and not to
others
 
And is different than the common idea that
over-fitting comes from limited data
 
Another of Hand’s key arguments
 
Researchers and practitioners usually do best
when working with an algorithm they know
very well
 
And therefore more recent algorithms win
competitions
Because those are the algorithms the researcher
knows best 
and
 wants to prove are better
 
Momentary digression
 
Who here is familiar with data competitions
like the KDD Cup, Kaggle competitions, and
ASSISTments Longitudinal Challenge?
 
Some counter-evidence to Hand
 
Recent algorithms win a lot of data mining
competitions these days (where lots of people
are trying their best)
 
Some counter-evidence to Hand
 
Recent algorithms win a lot of data mining
competitions these days (where lots of people
are trying their best)
 
Those of you who like Hand, how would you
respond to this?
 
Some counter-evidence to Hand
 
One possible rejoinder: These are usually well-
defined problems where the training set and
eventual test set resemble each other a lot
 
Another practical question
 
 
Should you
 
Pick one algorithm that seems really
appropriate?
Run every algorithm that will actually run for
your data?
 
Something in between?
 
My typical lab practice
 
Pick a small number of algorithms that
Have worked on past similar problems
Fit different kinds of patterns from each other
 
Is it really the algorithm?
 
Or is it the data you put into it?
 
We’ll come back to this in the Feature
Engineering lecture in a month
 
Questions? Comments?
 
 
Creative HW 1
 
 
Questions about Creative HW 1?
 
 
Questions? Concerns?
 
 
Other questions or comments?
 
 
Next Class
 
February 13
 
Behavior Detection
 
Baker, R.S. (2015) Big Data and Education. Ch.1, V5. Ch. 3, V1, V2.
 
Sao Pedro, M.A., Baker, R.S.J.d., Gobert, J., Montalvo, O. Nakama, A.
(2013) Leveraging Machine-Learned Detectors of Systematic Inquiry
Behavior to Estimate and Predict Transfer of Inquiry Skill. 
User Modeling
and User-Adapted Interaction, 23
 (1), 1-39.
Kai, S., Paquette, L., Baker, R.S., Bosch, N., D'Mello, S., Ocumpaugh, J.,
Shute, V., Ventura, M. (2015) A Comparison of Face-based and Interaction-
based Affect Detectors in Physics Playground. Proceedings of the 8th
International Conference on Educational Data Mining, 77-84.
 
Creative HW 1 due
 
The End
 
Slide Note
Embed
Share

Delve into the world of educational data mining with Core Methods in Educational Data Mining course content from Spring 2019. From basic homework assignments to analyzing the impacts of variables on decision tree algorithms, discover the challenges and differences between Python and RapidMiner tools. Explore the nuances of overfitting, variable selection, model applicability, and the effects of cross-validation in data analysis.

  • Data Mining
  • Educational Data
  • Python
  • RapidMiner
  • Decision Tree

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Core Methods in Educational Data Mining EDUC691 Spring 2019

  2. The Homework Let s go over basic homework 1

  3. The Homework Let s go over basic homework 1 Who did the assignment in Python? Who did the assignment in RapidMiner?

  4. RapidMiner folks How well did you succeed in making the tool work? What were some of the biggest challenges?

  5. Python folks How well did you succeed in making the tool work? What were some of the biggest challenges?

  6. Did it make a difference? When you ran Decision Tree/W-J48 with an without student as a variable in the data set? What was the difference?

  7. Did it make a difference? When you ran Decision Tree/W-J48 with an without student as a variable in the data set? What was the difference? Why might RapidMiner and Python produce different results for this?

  8. Removing student from the model How did you remove student from the model? There were multiple ways to accomplish this

  9. How would you know If you were over-fitting to student? Or any variable, for that matter?

  10. What are some variables That could cause your model not to apply to new data sets you might be interested in? Student is one example what else?

  11. Did it make a difference? What happens when you turn on cross- validation?

  12. Questions? Comments? Concerns?

  13. How are you liking RapidMiner and Python?

  14. Other RapidMiner or Python questions?

  15. Note Python and RapidMiner have a different set of algorithms available Python s set tends to be more recent But it s not totally clear they are *better* We ll come back to this when we discuss Hand

  16. What is the difference between a classifier and a regressor?

  17. What are some things you might use a classifier for, in education? Bonus points for examples other than those in the BDE videos

  18. Any questions about any classification algorithms?

  19. Do folks feel like they understood logistic regression? Any questions?

  20. Logistic Regression m = 0.5A - B + C A B C M P(M) 0 0 0

  21. Logistic Regression m = 0.5A - B + C A B C M P(M) 0 0 1

  22. Logistic Regression m = 0.5A - B + C A B C M P(M) 0 1 1

  23. Logistic Regression m = 0.5A - B + C A B C M P(M) 4 1 1

  24. Logistic Regression m = 0.5A - B + C A B C M P(M) 100 -100 100

  25. Why would someone Use a decision tree rather than, say, logistic regression?

  26. Has anyone Used any classification algorithms outside the set discussed/recommended in the videos? Say more?

  27. Other questions, comments, concerns about lectures?

  28. Did anyone read Hand article? Thoughts?

  29. What is Hands main thesis?

  30. What is Hands main thesis? Who thinks it makes sense? Who thinks he s probably wrong?

  31. What is Hands main thesis? Who thinks it makes sense? Who thinks he s probably wrong? Please present arguments in favor of each perspective

  32. If he is wrong Why do simple algorithms work well for many problems?

  33. If he is right Why have some algorithms like recurrent neural networks become so popular?

  34. If he is right Why have some algorithms like recurrent neural networks become so popular? Note that many of the key successes have been in very large scale data sets like voice recognition

  35. One of Hands key arguments Data points trained on are not usually drawn from the same distribution As the data points where the classifier will be applied

  36. One of Hands key arguments Data points trained on are not usually drawn from the same distribution As the data points where the classifier will be applied Is this a plausible argument for educational data mining?

  37. One of Hands key arguments Data points trained on are not usually drawn from the same distribution As the data points where the classifier will be applied Is this a plausible argument for large-scale voice recognition technology?

  38. Another of Hands key arguments Data points trained on are often treated as certainly true and objective But they are often arbitrary and uncertain

  39. Another of Hands key arguments Data points trained on are often treated as certainly true and objective But they are often arbitrary and uncertain Is this a plausible argument for educational data mining?

  40. Another of Hands key arguments Data points trained on are often treated as certainly true and objective But they are often arbitrary and uncertain Is this a plausible argument for large-scale speech recognition?

  41. Note Hand refers to these issues as over-fitting But they are a specific type of over-fitting that is relevant to some problems and not to others And is different than the common idea that over-fitting comes from limited data

  42. Another of Hands key arguments Researchers and practitioners usually do best when working with an algorithm they know very well And therefore more recent algorithms win competitions Because those are the algorithms the researcher knows best and wants to prove are better

  43. Momentary digression Who here is familiar with data competitions like the KDD Cup, Kaggle competitions, and ASSISTments Longitudinal Challenge?

  44. Some counter-evidence to Hand Recent algorithms win a lot of data mining competitions these days (where lots of people are trying their best)

  45. Some counter-evidence to Hand Recent algorithms win a lot of data mining competitions these days (where lots of people are trying their best) Those of you who like Hand, how would you respond to this?

  46. Some counter-evidence to Hand One possible rejoinder: These are usually well- defined problems where the training set and eventual test set resemble each other a lot

  47. Another practical question

  48. Should you Pick one algorithm that seems really appropriate? Run every algorithm that will actually run for your data? Something in between?

  49. My typical lab practice Pick a small number of algorithms that Have worked on past similar problems Fit different kinds of patterns from each other

  50. Is it really the algorithm? Or is it the data you put into it? We ll come back to this in the Feature Engineering lecture in a month

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#