Teaching Generalized Linear Models (GLMs) to Undergraduates and Graduates: Challenges and Successes
Teaching GLMs at the University of Auckland involves a collaborative effort, utilizing reproducible research techniques and foundational linear modeling concepts. The courses cover trend analysis, factor variables, mixing variables, and handling exceptions like curves and exponential relationships.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Teaching GLMs to Undergraduates and Graduates - Challenges and Successes Andrew Balemi University of Auckland
Some background Uni of AKL teaches two undergrad courses with GLMs 20x/ 330 stage 2/stage 3 respectively/ leads to PG Team teaching undertaken with consensus from all lecturers 1. - you don t like the notes then change them after discussion and consensus! - efficiencies gained and notes refreshed 300 students in 20x S1 2023 / 130 in 330 in S2 2023 - Course administrator in 20x/ 330 lecture in charge - PG students assist with students questions and marking
20x getting started Reproducible research encouraged so R studio is required Download R/Rstudio Added an extra assignment to get over fear of programming /new environment (name/simple plot/ simple analysis) Foundational idea: start with the linear model In the beginning there was the straight line + random stuff
Trend + scatter & the model statement Y cntns X cntns If it looks like a straight line then fit a straight line R2 is % of variation explained by using only intercept and slope (two numbers).
Factor variables (two levels) Two sample t-test is SLR with a dummy Variable (Attend=Yes/No)
Mixing it up.. Factors with an X (cntns) Two straight lines for each level of dummy variable. Not parallel in this case. Extends to factors with more then two levels.
Exceptions - curves If it doesn t look like a straight line then don t fit a straight line. Do something else. Does it have constant scatter or not
Exceptions exponential relationships If it doesn t look like a straight line then don t fit a straight line. Depreciation is about % decrease Does it have constant scatter or not
Exceptions exponential relationships Torture it until it submits to a straight line Does it have constant scatter or not? Interpret in terms of median prices
Big ideas (linear model) Two/One way ANOVA applications of the ideas above Log-log = power law models another way to torture the data until it submits to the straight line MLR is really a mash-up of the above ideas Don t use names like ANCOVA as it makes it seem more daunting Collinearity is really about redundancy of variables having adjusted for other variables. The price we pay for (mostly) encountering observational data Yi|Xi~Normal( i, 2) where E(Yi|Xi) = i = 0+ 1xi+
Transition to GLMs (Poisson) Submissions of packages to CRAN. Count data are not normally distributed, since they are discrete Yi|Xi~Poisson( i) where log( i)= 0+ 1xi+
Transition to GLMs (Binomial) Haddock retention by fork-length of fish in codend/cover experiment. Proportion data are not normally distributed, since they are Bernoulli (for each fish) Yi|Xi~Binomial(ni, i) where log(oddsi)=logit( i)= 0+ 1xi+
Issues with LM to GLMs Insight of linear models (model statement) can be transferred to this environment If GOF test is inadequate the we suggest they `go quasi Students want EOV (pred/raw res plot) to as for lm Interpretation is on the log scale (mean/odds) and tends to throw students Back transformation to a predicted mean/proportion seems to be challenging as algebra is involved
Issues we encounter for 20x We have lots of case studies with Rmd code (which we think are awesome) However: Assignments are about copying (topic) adjacent code and hoping to get a good mark Getting students to think about why they are seeing and therefore how the analysis will pan out seems to be a stretch Discord online copying and pasting you can hack your way to a good grade and really have no idea what s really going on Algebra is the biggest source of terror globally!
Transitioning to 330 We generalise in the first few lectures and tell them this is what we have and will do
Transitioning to 330 If data is not too sparse* then the pred/deviance residual plot comes back to help us Factory defaults of Normal/Poisson/Binomial don t have to be adhered to (e.g. Negative Binomial instead of Poisson) It s all about the model statement and offsets/ subsetting adds dynamic modelling If your data is `sparsish you then simulate/bootstrap a reference distribution you don t have to accept approximations! You can tell them that most things work as CLT pixie dust makes most things better
The example that keeps giving mass killing events USA
Issues we encounter for 330 and PG We have fewer of case studies with Rmd code However: Assignments are about copying (topic) adjacent code and hoping to get a good mark Getting students to think about why they are seeing and therefore how the analysis will pan out seems to be a stretch Discord online copying and pasting you can hack your way to a good grade and really have no idea what s really going on Algebra is the biggest source of terror globally!
Concerns Attendance is optional at UoA as we have recorded lectures (a moronic idea IMHO) Students have a talent for forgetting Massively distracted student population Failure is not a catastrophe it s a signal that s all Silo thinking theory and practice are unrelated! Economy of thought is not prized (be smart about being lazy)