Unlocking Potential: The Value of Standardized Testing in Education

26 May 2023

Richard P. Phelps

“If a thing exists, it

exists in some

amount. If it exists in

some amount, then it

is capable of being

measured.”

−−René Descartes,

Principles of

Philosophy

, 1664

Learning Curve

Forgetting Curve (1870s)

Ebbinghaus:

“Learning usually

requires rehearsal

or repetition”

Cognitive Load Theory

John Sweller, 1980s

Working Memory Capacity

George Miller, 1950s

Knowledge is Unlimited?

•

It may be, but there are limits

to the amount that we can

store and use.

•

So, we

filter

it.

Working Memory:

Ability to temorarily hold and

manipulate information for

cognitive tasks

Working Memory is challenged by:

new, unfamiliar information and

quantity of discrete bits of information

Two centuries of research on learning concludes

…

“

…

repeated retrieval during learning is the key to

long-term retention.”

— Henry L. “Roddy” Roediger

10 benefits of testing and their applications to education

Roediger, Putnam and Smith

SOURCE: Roediger, Putnam, & Smith, Ten benefits of testing and their applications to educational practice,

Psychology of

Learning and Motivation, 55

, 2011.

Benefit 1: The Testing Effect: Retrieval Aids Later Retention

Benefit 2: Testing Identifies Gaps in Knowledge

Benefit 3: Testing Causes Students to Learn More from the Next Study Episode

Benefit 4: Testing Produces Better Organization of Knowledge

Benefit 5: Testing Improves Transfer of Knowledge to New Contexts

Benefit 6: Testing can Facilitate Retrieval of Material That was not Tested

Benefit 7: Testing Improves Metacognitive Monitoring

Benefit 8: Testing Prevents Interference from Prior Material when Learning

New Material

Benefit 9: Testing Provides Feedback to Instructors

Benefit 10: Frequent Testing Encourages Students to Study

10 benefits of testing and their applications to education

Roediger, Putnam and Smith

Indirect

 effects of testing

SOURCE: Roediger, Putnam, & Smith, Ten benefits of testing and their applications to educational practice,

Psychology of

Learning and Motivation, 55

, 2011.

Students tested frequently study more and with more regularity.

Tests permit students to discover gaps in their knowledge and adjust their

study efforts to focus on difficult material.

Students who study after taking a test learn more than if they had not taken a

test.

Students who self-test or are tested more frequently in class learn more.

John Hattie’s meta-analyses of meta-analyses

Student self-assessment/self-grading

Response to intervention

Teacher credibility

Providing formative assessments

Classroom discussion

Teacher clarity

Feedback

Reciprocal teaching

Teacher-student relationships fostered

Spaced vs. mass practice

John Hattie’s list of education interventions, in

order of effectiveness  (

those with testing

Concept mapping

Cooperative vs individualistic learning

Direct instruction

Tactile stimulation programs

Mastery learning

Worked examples

Visual-perception programs

Peer tutoring

Cooperative vs competitive learning

Phonics instruction

Acceleration

Classroom behavioral techniques

Vocabulary programs

Repeated reading programs

Creativity programs

Student prior achievement

Self-questioning by students

Study skills

Problem-solving teaching

Not labeling students

Student-centered teaching

Classroom cohesion

Pre-term birth weight

Peer influences

Classroom management techniques

Outdoor-adventure programs

Home environment

Socio-economic status

1.

11.

21.

31.

The effect of testing on student learning

•

> 3,000 documents

•

700 separate studies, > 1,600 separate

effects

•

2,000 other studies were reviewed and

found incomplete or inappropriate

•

A thousand other studies remain to be

reviewed

245 Qualitative studies

813 Survey or Poll questions

640 Quantitative Effects:

Experiments:

School- and classroom-level

Multivariate studies:

Large-scale testing programs

The effect of testing on student learning

Meta-analysis

A method for

summarizing a large

research literature, with

a single, comparable

measure

( 0.5 effect size ≈ 1 grade level of learning )

•

Survey study effect sizes average > 1.0

•

Over 90% of qualitative studies positive

•

For quantitative studies, effect sizes vary between 0.55

and 0.88:

+ testing or testing more

+ testing with stakes

+ testing with feedback

Findings from Phelps (2012):

Cognitive Scientists’

6 Strategies for Effective Learning

Interleaving

Concrete Examples

Elaboration

Retrieval Practice

Spaced Practice

Dual Coding

10 benefits of testing and their applications to education

Roediger, Putnam and Smith

Most teachers should be testing much more

frequently, …with smaller, shorter tests.

Students learn more when they test. But

learn best when the tests are “spaced”.

What is the optimal lapse of time between tests?

Implications for Teachers 1

Most teachers should test more

frequently,

…with smaller,

shorter, low-stakes tests

Understand that useful

assessment can be short and

simple.

Implications for Teachers 2

Does the test format

matter?

•

multiple-choice?

•

essay?

•

short answer?

•

oral?

•

demonstration?

•

…etc.?

Not much.

Tests provide

feedback to teachers

about what works

and what does not

Implications for Teachers 3

Just like students can learn by testing each other;

teachers can help each other by reviewing each

others’ tests.

Why

Standardized

tests?

n some places

, the only

objective measure

available to the public (i.e.,

not under the control of

insiders).

Studies of the reliability of teacher grading,

1890s to 1920s

e.g., Starch & Elliot, 1912

•

wo actual English examination papers

•

Sent to 142 teachers to grade

•

Grades ranged from 50 to 98%

•

One paper:  14 grades < 80%  &  14 > 94%

Starch & Elliot, 1912

•

wo actual Geometry examination papers

•

Sent to 116 teachers to grade

•

Grades ranged from 28 to 92%

•

One paper:  20 grades < 60%  &  9 > 85%

Studies of the reliability of teacher grading,

1890s to 1920s

Schools vary in quality

Courses vary in quality

Grade comparisons are not reliable

How can those outside a school or classroom judge

the quality of a school, its instruction, or its students?

Standardized tests’ most important feature is standardization.

Why

consequential

tests

Most respond to both intrinsic and extrinsic motivators

and the proportion varies from individual to individual.

consequential tests provide both forms of inducement.

consequential tests tend to be taken more seriously and

administered with tighter security.

Findings from Phelps Meta-Regression (2019)

To raise achievement:

–

Add a test

–

Add feedback

–

Add consequences

consequences

 feedback

…is the strongest treatment

Cognitive Psychology

experiments were

conducted with

“formative” tests in

schools and classrooms

Large-scale tests are needed for other

purposes, such as

…

…monitoring and system diagnosis

…

selection to programs

…workforce planning

…accountability

…credentialing

SOURCE: Phelps, Benchmarking to the best in mathematics,

Evaluation Review

, 2001

SOURCE: Phelps, Benchmarking to the best in mathematics,

Evaluation Review

, 2001

Some large-scale test advantages

On per-student basis, inexpensive

Cognitive laboratory pre-testing possible

Standardization offers comparisons across schools and regions.

May produce high-quality test items that schools and teachers can

use

Large-scale test, tight security

Large-scale test, lax security

“Teaching to the Test”

Teachers will teach only material that will appear on

a standardized test.

Counter-argument

In the absence of common standards and tests, the

curriculum becomes arbitrary and of uncertain

origin. Why is that better than teaching to a required

curriculum?

“Narrowing the Curriculum”

A common curriculum prescribed by standards has

less content than a teacher-made curriculum.

Counter-argument

What teachers and schools do in the classroom

without common standards is not necessarily

“broader.” In fact, it can often be “narrower”—

governed in the absence of other criteria by personal

preferences.

Preferred Instructional Methods

Classrooms governed by standards are barren, dreary

places where only factoids are learned.

Rebuttal

A curriculum will always rely on some sort of standard

or criteria for inclusion. The question is, Do we want

formal, open standards, openly arrived at, or should

their origins be more obscure or idiosynchratic to each

teacher?

Opposition to Norm-Referenced Tests

Norm-referenced standardized tests are unfair. (I.e.,

it is unfair to simply rank kids, rather than measure

them against standards.)

Rebuttal

The alternative, grade-point-averages, are norm-

referenced measures, normed at the school level.

Preference for Teacher-Made Classroom Testing

Standardized tests are imposed from outside by

persons and committees unfamiliar with and perhaps

insensitive to the local students and community.

Rebuttal

Standardized tests are developed by testing and

measurement Ph.D.’s. The most capable measurement

experts in the world work in developing standardized

tests.

https://nonpartisaneducation.org

ichard {at} nonpartisaneducation {dot} org

Slide Note

I can make these slides available, so there should be no reason to take notes.

Embed Share

Download Presentation

Delve into the significance of standardized testing in education, exploring its benefits in identifying knowledge gaps, enhancing retention, and improving student learning outcomes. Learn how testing aids in knowledge organization, transfer, and retrieval, promoting metacognitive monitoring and providing valuable feedback for both students and instructors.

pranav Follow

Uploaded on Mar 04, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Why standardized tests? Richard P. Phelps 26 May 2023 1

If a thing exists, it exists in some amount. If it exists in some amount, then it is capable of being measured. Ren Descartes, Principles of Philosophy, 1664 2

Learning Curve 3

Forgetting Curve (1870s) 4

Ebbinghaus: Learning usually requires rehearsal or repetition 6

Working Memory Capacity George Miller, 1950s Cognitive Load Theory John Sweller, 1980s 7

Knowledge is Unlimited? It may be, but there are limits to the amount that we can store and use. So, we filter it.

Working Memory: Ability to temorarily hold and manipulate information for cognitive tasks Working Memory is challenged by: new, unfamiliar information and quantity of discrete bits of information 9

Two centuries of research on learning concludes repeated retrieval during learning is the key to long-term retention. Henry L. Roddy Roediger 10

10 benefits of testing and their applications to education Roediger, Putnam and Smith Benefit 1: The Testing Effect: Retrieval Aids Later Retention Benefit 2: Testing Identifies Gaps in Knowledge Benefit 3: Testing Causes Students to Learn More from the Next Study Episode Benefit 4: Testing Produces Better Organization of Knowledge Benefit 5: Testing Improves Transfer of Knowledge to New Contexts Benefit 6: Testing can Facilitate Retrieval of Material That was not Tested Benefit 7: Testing Improves Metacognitive Monitoring Benefit 8: Testing Prevents Interference from Prior Material when Learning New Material Benefit 9: Testing Provides Feedback to Instructors Benefit 10: Frequent Testing Encourages Students to Study SOURCE: Roediger, Putnam, & Smith, Ten benefits of testing and their applications to educational practice, Psychology of Learning and Motivation, 55, 2011. 11

10 benefits of testing and their applications to education Roediger, Putnam and Smith Indirect effects of testing Students tested frequently study more and with more regularity. Tests permit students to discover gaps in their knowledge and adjust their study efforts to focus on difficult material. Students who study after taking a test learn more than if they had not taken a test. Students who self-test or are tested more frequently in class learn more. SOURCE: Roediger, Putnam, & Smith, Ten benefits of testing and their applications to educational practice, Psychology of Learning and Motivation, 55, 2011. 12

John Hatties meta-analyses of meta-analyses 13

John Hatties list of education interventions, in order of effectiveness ( those with testing ) Student self-assessment/self-grading Response to intervention Teacher credibility Providing formative assessments Classroom discussion Teacher clarity Feedback Reciprocal teaching Teacher-student relationships fostered Spaced vs. mass practice Concept mapping Cooperative vs individualistic learning Direct instruction Tactile stimulation programs Mastery learning Worked examples Visual-perception programs Peer tutoring Cooperative vs competitive learning Phonics instruction 1. 21. Acceleration Classroom behavioral techniques Vocabulary programs Repeated reading programs Creativity programs Student prior achievement Self-questioning by students Study skills Problem-solving teaching Not labeling students 11. 31. Student-centered teaching Classroom cohesion Pre-term birth weight Peer influences Classroom management techniques Outdoor-adventure programs Home environment Socio-economic status 14

The effect of testing on student learning > 3,000 documents 700 separate studies, > 1,600 separate effects 2,000 other studies were reviewed and found incomplete or inappropriate A thousand other studies remain to be reviewed 15

The effect of testing on student learning 245 Qualitative studies 813 Survey or Poll questions 640 Quantitative Effects: Experiments: School- and classroom-level Multivariate studies: Large-scale testing programs 16

Meta-analysis A method for summarizing a large research literature, with a single, comparable measure. ( 0.5 effect size 1 grade level of learning ) 17

Findings from Phelps (2012): Survey study effect sizes average > 1.0 Over 90% of qualitative studies positive For quantitative studies, effect sizes vary between 0.55 and 0.88: + testing or testing more + testing with stakes + testing with feedback 18

Cognitive Scientists 6 Strategies for Effective Learning Retrieval Practice Interleaving Spaced Practice Concrete Examples Dual Coding Elaboration 19

10 benefits of testing and their applications to education Roediger, Putnam and Smith Most teachers should be testing much more frequently, with smaller, shorter tests. Students learn more when they test. But learn best when the tests are spaced . What is the optimal lapse of time between tests? 20

Implications for Teachers 1 Most teachers should test more frequently, with smaller, shorter, low-stakes tests Understand that useful assessment can be short and simple. 23

Implications for Teachers 2 Does the test format matter? multiple-choice? essay? short answer? oral? demonstration? etc.? Not much. 24

Implications for Teachers 3 Tests provide feedback to teachers about what works and what does not Just like students can learn by testing each other; teachers can help each other by reviewing each others tests. 25

Why Standardized tests? In some places, the only objective measure available to the public (i.e., not under the control of insiders). 26

Studies of the reliability of teacher grading, 1890s to 1920s e.g., Starch & Elliot, 1912 Two actual English examination papers Sent to 142 teachers to grade Grades ranged from 50 to 98% One paper: 14 grades < 80% & 14 > 94% 27

Studies of the reliability of teacher grading, 1890s to 1920s Starch & Elliot, 1912 Two actual Geometry examination papers Sent to 116 teachers to grade Grades ranged from 28 to 92% One paper: 20 grades < 60% & 9 > 85% 28

How can those outside a school or classroom judge the quality of a school, its instruction, or its students? Schools vary in quality Courses vary in quality Grade comparisons are not reliable Standardized tests most important feature is standardization. 29

Why consequential tests? Most respond to both intrinsic and extrinsic motivators and the proportion varies from individual to individual. consequential tests provide both forms of inducement. consequential tests tend to be taken more seriously and administered with tighter security. 30

Findings from Phelps Meta-Regression (2019) To raise achievement: Add a test Add feedback Add consequences consequences + feedback is the strongest treatment 31

Cognitive Psychology experiments were conducted with formative tests in schools and classrooms 32

Large-scale tests are needed for other purposes, such as monitoring and system diagnosis selection to programs workforce planning accountability credentialing 33

Figure 1: Average TIMSS Score and Number of Quality Control Measures Used, by Country 80 Average Percent Correct (grades 7&8) 70 60 50 40 30 20 10 0 0 5 10 15 20 Number of Quality Control Measures Used Top-Performing Countries Bottom-Performing Countries SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001 34

Figure 2: Average TIMSS Score and Number of Quality Control Measures Used (each adjusted for GDP/capita), by Country Average Percent Correct (grades 7& 8) (per GDP/capita) Number of Quality Control Measures Used (per GDP/capita) SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001 35

Some large-scale test advantages On per-student basis, inexpensive Cognitive laboratory pre-testing possible Standardization offers comparisons across schools and regions. May produce high-quality test items that schools and teachers can use. 36

Large-scale test, tight security 37

Large-scale test, lax security 38

Teaching to the Test Teachers will teach only material that will appear on a standardized test. Counter-argument In the absence of common standards and tests, the curriculum becomes arbitrary and of uncertain origin. Why is that better than teaching to a required curriculum? 40

Narrowing the Curriculum A common curriculum prescribed by standards has less content than a teacher-made curriculum. Counter-argument What teachers and schools do in the classroom without common standards is not necessarily broader. In fact, it can often be narrower governed in the absence of other criteria by personal preferences. 41

Preferred Instructional Methods Classrooms governed by standards are barren, dreary places where only factoids are learned. Rebuttal A curriculum will always rely on some sort of standard or criteria for inclusion. The question is, Do we want formal, open standards, openly arrived at, or should their origins be more obscure or idiosynchratic to each teacher? 42

Opposition to Norm-Referenced Tests Norm-referenced standardized tests are unfair. (I.e., it is unfair to simply rank kids, rather than measure them against standards.) Rebuttal The alternative, grade-point-averages, are norm- referenced measures, normed at the school level. 43

Preference for Teacher-Made Classroom Testing Standardized tests are imposed from outside by persons and committees unfamiliar with and perhaps insensitive to the local students and community. Rebuttal Standardized tests are developed by testing and measurement Ph.D. s. The most capable measurement experts in the world work in developing standardized tests. 44