Insights on Standardized Test Criticisms and Testing Effects

Slide Note
Embed
Share

This presentation discusses both overused and underused criticisms of standardized tests, highlighting issues such as time lost from learning, narrowing of the curriculum, and the testing effect on student learning. It delves into the benefits of testing and presents findings on the impact of testing with stakes and feedback. Meta-analysis methods are explored for summarizing research literature.


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.



Uploaded on Apr 17, 2024 | 1 Views


Presentation Transcript


  1. Over- and Under-used Criticisms of Standardized Tests Richard P. Phelps University of Bucharest 29 May 2023 1

  2. If a thing exists, it exists in some amount. If it exists in some amount, then it is capable of being measured. Ren Descartes, Principles of Philosophy, 1664 2

  3. Some Overused Criticisms Time Lost from Learning Teaching to the Test Narrowing the Curriculum Distorts Instruction 3

  4. Some Overused Criticisms Time Lost from Learning Teaching to the Test Narrowing the Curriculum Distorts Instruction 4

  5. The Testing Effect Generally, one learns more by testing than by restudy. Aristotle ~350 BCE Edward Thorndike ~1900 The active recall of a fact from within us is, as a rule, better than its impressions without" C. C. Ross 1941 The act of testing alone, irrespective of other factors, tends to improve achievement. Experiments: 1909, 1917, 1923, 1924, 1939 5

  6. The effect of testing on student learning > 3,000 documents 700 separate studies, > 1,600 separate effects 2,000 other studies were reviewed and found incomplete or inappropriate A thousand other studies remain to be reviewed 6

  7. The effect of testing on student learning 245 Qualitative studies 813 Survey or Poll questions 640 Quantitative Effects: Experiments: School- and classroom-level Multivariate studies: Large-scale testing programs 7

  8. Meta-analysis A method for summarizing a large research literature, with a single, comparable measure. ( 0.5 effect size 1 grade level of learning ) 8

  9. Findings from Phelps (2012): Survey study effect sizes average > 1.0 Over 90% of qualitative studies positive For quantitative studies, effect sizes vary between 0.55 and 0.88: + testing or testing more + testing with stakes + testing with feedback 9

  10. Findings from Phelps Meta-Regression (2019) To raise achievement: Add a test Add feedback Add consequences consequences + feedback is the strongest treatment 10

  11. Some Overused Criticisms Time Lost from Learning Teaching to the Test Narrowing the Curriculum Distorts Instruction 11

  12. John J. Cannell, M.D. Residency in rural, poor Appalachia, 1980s Surprised by claims that state and school district scored above average on national tests Investigated, all US states claimed to be above average 12

  13. Welcome to Lake Wobegon, where all the women are strong, all the men are good-looking, and all the children are above average. - Garrison Keillor, A Prairie Home Companion 13

  14. Dr. Cannells suspects Lax security Outdated or invalid norms Deliberate educator manipulation (i.e., cheating) 14

  15. CRESSTs Lake Wobegon suspects Outdated or invalid norms High stakes, that induce teaching to the test (i.e., test coaching) under pressure Problem: only one of Cannell s many tests had any stakes 15

  16. Harms of misinformation 1. Unfairly discredits useful evaluation tool 2. Test security (in U.S.) remains poor 3. Teachers given mixed messages 4. Now spreading worldwide via the OECD 16

  17. Some Overused Criticisms Time Lost from Learning Teaching to the Test Narrowing the Curriculum Distorts Instruction 17

  18. Some Overused Criticisms Time Lost from Learning Teaching to the Test Narrowing the Curriculum Distorts Instruction 18

  19. Studies of the reliability of teacher grading, 1890s to 1920s e.g., Starch & Elliot, 1912 Two actual English examination papers Sent to 142 teachers to grade Grades ranged from 50 to 98% One paper: 14 grades < 80% & 14 > 94% 19

  20. Studies of the reliability of teacher grading, 1890s to 1920s Starch & Elliot, 1912 Two actual Geometry examination papers Sent to 116 teachers to grade Grades ranged from 28 to 92% One paper: 20 grades < 60% & 9 > 85% 20

  21. Why Standardized tests? In some places, the only objective measure available to the public (i.e., not under the control of insiders). 21

  22. How can those outside a school or classroom judge the quality of a school, its instruction, or its students? Schools vary in quality Courses vary in quality Grade comparisons are not reliable Standardized tests most important feature is standardization. 22

  23. Some Underused Criticisms Choosing the Wrong Test Type Lax Security Threats to Privacy Not Testing When Benefits are Clear 23

  24. Some Underused Criticisms Choosing the Wrong Test Type Lax Security Threats to Privacy Not Testing When Benefits are Clear 24

  25. 1. Three types of large-scale tests Achievement Aptitude Non-cognitive 25

  26. PSU: una prueba en guerra consigo misma (una prueba de salida de la educaci n cient fica humanista, presentado como un veh culo para evaluar la cobertura curricular que hoy es empleada como prueba de admisi n para todos los estudiantes (incluyendo a los de la ense anza media TP ) Se espera que haga demasiadas cosas ninguna la hace bien, & empeora algunas importantes

  27. Multiplicidad de Propsitos en la PSU: 1. Medir la implementaci n de un nuevo curr culo; 2. Medir bien el dominio de dos curr culos muy distintos entre s ; 3. Incentivar a los liceos a implementar el nuevo curr culo 4. Incentivar a los alumnos a estudiar m s 5. Predecir el xito en la universidad; 6. Predecir xito en programas universitarios muy distintos entre s 7. Proveer puntos de corte para el ingreso a la universidad, para becas y ayudas financieras.

  28. Comparing Achievement & Aptitude tests Achievement Aptitude Measure past learning potential Development content analysis job/skills analysis Validation retrospective predictive Content dependent independent Coachable? very much not much 28

  29. Non-cognitive tests More recently developed measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment fit 29

  30. Comparing Achievement, Aptitude, & Non-Cognitive Tests Achievement Aptitude Non-Cognitive attitudes, values, preferences Measure past learning potential Development content analysis job/skills analysis surveys Validation retrospective predictive predictive Content dependent independent independent Coachable? very much very little can be faked 30

  31. Some Underused Criticisms Choosing the Wrong Test Type Lax Security Threats to Privacy Not Testing When Benefits are Clear 31

  32. Large-scale test, tight security 32

  33. Large-scale test, lax security 33

  34. Some Underused Criticisms Choosing the Wrong Test Type Lax Security Threats to Privacy Not Testing When Benefits are Clear 34

  35. Threats to Privacy Increase in the Internet Age Lavishly-funded government-sponsored hacking Ransomware Commercial incentives to collect personal data Schools often store psychological and medical data on students Socio-Emotional Learning (SEL) programs will increase amount 35

  36. Some Underused Criticisms Choosing the Wrong Test Type Lax Security Threats to Privacy Not Testing When Benefits are Clear 36

  37. Cognitive Scientists 6 Strategies for Effective Learning Retrieval Practice Interleaving Spaced Practice Concrete Examples Dual Coding Elaboration 37

  38. 10 benefits of testing and their applications to education Roediger, Putnam and Smith Benefit 1: The Testing Effect: Retrieval Aids Later Retention Benefit 2: Testing Identifies Gaps in Knowledge Benefit 3: Testing Causes Students to Learn More from the Next Study Episode Benefit 4: Testing Produces Better Organization of Knowledge Benefit 5: Testing Improves Transfer of Knowledge to New Contexts Benefit 6: Testing can Facilitate Retrieval of Material That was not Tested Benefit 7: Testing Improves Metacognitive Monitoring Benefit 8: Testing Prevents Interference from Prior Material when Learning New Material Benefit 9: Testing Provides Feedback to Instructors Benefit 10: Frequent Testing Encourages Students to Study SOURCE: Roediger, Putnam, & Smith, Ten benefits of testing and their applications to educational practice, Psychology of Learning and Motivation, 55, 2011. 38

  39. Whyconsequential tests? Most respond to both intrinsic and extrinsic motivators and the proportion varies from individual to individual. consequential tests provide both forms of inducement. consequential tests tend to be taken more seriously and administered with tighter security.

  40. Large-scale tests are needed for other purposes, such as monitoring and system diagnosis selection to programs workforce planning accountability credentialing 40

  41. Some large-scale test advantages On per-student basis, inexpensive Cognitive laboratory pre-testing possible Standardization offers comparisons across schools and regions. May produce high-quality test items that schools and teachers can use. 41

  42. Figure 1: Average TIMSS Score and Number of Quality Control Measures Used, by Country 80 Average Percent Correct (grades 7&8) 70 60 50 40 30 20 10 0 0 5 10 15 20 Number of Quality Control Measures Used Top-Performing Countries Bottom-Performing Countries SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001 42

  43. Figure 2: Average TIMSS Score and Number of Quality Control Measures Used (each adjusted for GDP/capita), by Country Average Percent Correct (grades 7& 8) (per GDP/capita) Number of Quality Control Measures Used (per GDP/capita) SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001 43

  44. US State of California University Admission Test Decision Faculty conducts large study showing clear information benefits of admission testing. Votes in favor of continuing to use them. Board of Directors overrules them, will ban use of admission tests. Cite usual, many-times disproven equity arguments. 44

  45. 45

  46. 46

  47. 47

  48. Over- and Under-used Criticisms of Standardized Tests https://nonpartisaneducation.org richard {at} nonpartisaneducation {dot} org 48

Related


More Related Content