Standardized Testing:. The Interplay of Achievement & Aptitude Testing

University

of

Bucharest

May

Richard P. Phelps

Standardized Testing:

The Interplay of Achievement & Aptitude Testing

Achievement

Aptitude

Two types of standardized tests

Standardized tests - History

At least 2,000 years old:

Chinese civil service

exam:

• tested

achievement

& aptitude

• philosophy of

Confucius

• poetry and

literature

Achievement tests – Modern History

Historically, were larger versions of classroom tests

~ 1900 - “scientific” achievement tests developed

(Germany (Prussia) & USA)

J.M. Rice -

systematically analyzed

test structures & effects

E.L. Thorndike -

developed scoring

scales

Achievement tests

– General use

Objective:  To measure mastery of content or skill

Development:  Covers the curriculum taught in school

Key fairness issue:

Alignment of the content

of the test with the

content of the curriculum.

“Opportunity to learn”

principle

More on achievement tests

Requires a mastery of content

prior

 to test.

Coachable – specific content is known in advance

(so, easy to cheat if test security is lax)

How validated:

 RETROSPECTIVE

validity

(correlation with past measures, such as high school grades)

Aptitude tests

- Modern History

Pre-school children with severe mental disabilities

Impossible to evaluate them with achievement test

Developed content-free test of mental abilities

(association, attention, memory, motor skills, reasoning)

A. Binet  &  T. Simon

  1890s (France)

Aptitude Tests – Modern History

Like Romania, USA entered

World War I late, Paris was

threatened, needed to build

a large army quickly.

US Army wanted to assign

new recruits efficiently to

the various jobs – “allocative

efficiency.”

Army ”Alpha” & “Beta” tests

Aptitude tests, general use

Objective:  predict how well a person can perform in future

Development:

•

First, analyze in detail the knowledge and skills needed

•

Second, keep improving test by correlating test items with

future performance, select items that “discriminate”

Aptitude tests, general use

Widely used in industry

Sometimes called reasoning or readiness tests

Intelligence tests are a type of aptitude test, some still use

scale originally developed by Binet & Simon

Key fairness issue:  Alignment of

content of the test with the

knowledge and skills needed to

perform. “Opportunity to

perform” principle.

More on aptitude tests

Some content independence. Measures:

… what student does with content provided

… how student applies skills & abilities developed

over a lifetime

Not easily coachable – the content is either…

… not known in advance,

… basic, broad, commonly known by all,

curriculum-free

How validated:

… PREDICTIVE validity, correlation with future

activity (e.g., university or job evaluations)

Comparing Achievement & Aptitude tests

Aptitude tests for university admission – history

1930s—1940s

President of Harvard, J. Conant …

Wanted test to identify

students from low socio-

economic class with potential

to succeed at Harvard:

“diamonds in the rough”

Developed first Scholastic

Aptitude Test (SAT)

students bored by secondary school

courses, who study what interests them on

their own

students poorly adapted to secondary

school culture, but well adapted to

university culture

high-ability students poorly served by poor

quality secondary schools

Aptitude tests for university admission

Aptitude tests, well constructed, can identify:

More information is better, usually

If university were just like secondary school, perhaps high-

school grades and a retrospective content test would suffice

for admission purposes.

But university is not

just more challenging

academically, it is very

different from

secondary school in

other respects, too.

Predictive validity

(values from -1.0 to +1.0)

…measures how well higher scores

on admission test match better

outcomes at university (e.g., grades,

completion)

A test with low predictive validity provides

little information.

Source: NIST, Engineering Statistics Handbook

A positive correlation between two measures

PSU: una cronología de errores

, propuesta inicial: Proyecto SIES



PSU

 Project

2001 (Préstamo Banco Mundial a  MINEDUC para

financiar la reforma educacional

Pruebas más eficientes y modernas (IRT)

Mejorar articulación de la educación media con la universitaria

2005 (Banco Mundial)

Ligar ayuda financiera a puntajes PSU va a favorecer el acceso

de los sectores de menores recursos  a la universidad

Alinear la prueba con el currículo de enseñanza media



convertirlo eventualmente en un examen de salida de

la enseñanza media

Multiplicidad de Propósitos en la PSU:

1.

Medir la implementación de un nuevo curriculum;

2.

Medir bien el dominio de dos curriculos muy distintos

entre sí;

3.

Incentivar a los liceos a implementar el nuevo currículo

4.

Incentivar a los alumnos a estudiar más

5.

Predecir el éxito en la universidad;

6.

Predecir éxito en programas universitarios muy

distintos entre sí

7.

Proveer puntos de corte para el ingreso a la

universidad, para becas y ayudas financieras.

PSU:

una prueba en guerra consigo misma

Se espera que haga

demasiadas cosas…

…ninguna la hace bien,

…& empeora algunas

importantes

una prueba de salida de la educación científica humanista, presentado como

un vehículo para evaluar la cobertura curricular que hoy es empleada como

prueba de admisión para todos los estudiantes (incluyendo a los de la

enseñanza media TP )

Estándares Internacionales Pruebas

Educacionales:

El o los propósitos de una

prueba deben ser claros para

todos las partes interesadas.

Una prueba con consecuencias

debe ser empíricamente

validada para cada propósito

Un ejemplo: argumentos de equidad

Promotores señalaban que la PSU reduciría la ventaja de los preuniversitarios.

Pero las pruebas de contenidos son más entrenables que las de aptitud porque

la base de contenidos es específica y conocida. La base de contenidos de las

pruebas de aptitud es más amplia.

Promotores señalaban que al estar la PSU basada en el currículo, todos los

estudiantes tendrían acceso a éste. Pero los estudiantes TP no, y muchos

colegios científicos no alcanzan a cubrir la totalidad del currículo.

Promotores señalaban que los colegios estarían obligados a cubrir todo el

currículo o quedarían en evidencia si no lo hacían, por el pobre desempeño de

sus alumnos. Eso no se cumple porque algunos estudiantes parten con lagunas

en sus conocimientos que sus maestros no logran nivelar.

Validez p

redictiv

Mide si las puntuaciones en una prueba de admisión se

correlacionan con los resultados

Mide la cantidad de información única que provee una

prueba

de admisión

, más allá de la información disponible

a partir de otras medidas.

Validez p

redictiv

a incremental

Incremental Predictive validity

SAT & PSU

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; SAT data from College Board.

SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Enseñanza Media

Lenguaje y Comunicación – Matemática, Septiembre 2012

Percentage of Chilean Schools reporting complete curriculum

coverage of mathematics and language arts: 2012

Primary and Secondary Levels

Drop in the numbers of students from municipal

high schools accepted at 4 major universities

SOURCE

: Simonsen (2008),

Zuñiga (2005), & El Mercurio (April 16th, 2006)

SOURCE: N. Lacourly, M. Silva, & K. Diaz, 2016.

SOURCE: N. Lacourly, M. Silva, & K. Diaz, 2016.

Incremental Predictive validity

SAT & PSU

SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; SAT data from College Board.

Achievement and Aptitude in International Tests

TIMSS and PIRLS are ACHIEVEMENT tests

What is PISA?

Gracias !

Non-cognitive tests

More recently developed

– measure values, attitudes, preferences

Types:

integrity tests

career exploration

matchmaking

employment “fit”

Non-cognitive tests

Purpose: to identify “fit” with others or a situation

Developed using: surveys, personal interviews

How validated?  success rate in future activities

Content is personal, not learned

“Faking” can be an issue (e.g., “honesty” tests)

Comparing Achievement, Aptitude, &

Non-Cognitive Tests

If more information is better, we should maximize

the information available about a student at

university entry, in order to make the best match

between the student and the institution

3 measures are important:

1. Predictive validity

2.

Sub-group differences

3. Content coverage

Achievement tests in the context of university

entry:

How are they validated?

Through alignment with the content taught in secondary school.

Achievement tests are retrospective, aligned with content

learned in the past.

Test scores are typically highly correlated with other available

student measures, such as grades and class rank.

Achievement tests are typically summative, or exit exams,

administered at the end of a program.

Fairness assumes that all students have had the same

opportunity to learn the content tested.

Aptitude tests for university admission:

How are they validated?

Predictive validity: the correlation between test scores and

measures of future outcomes, such as university grades

Less emphasis on content:

Aptitude test content is basic, common knowledge

Test measures how well students reason and solve

problem; what they do with the information provided

Aptitude tests are not easily coachable:

- The content domain is too broad for focused study

Predictive Validity

Measures how well scores on an admission test correlates

with desirable outcomes

Measures how much unique information an admission test

provides, beyond what is available from other measures.

Incremental Predictive Validity

Incremental predictive validity

(engineering):

(controlling for secondary school grades)

SOURCE S.A. Prado,

Estudio de Validez Predictiva de la PSU y Comparacion con el Sistema PAA

Universidad de Chile

Slide Note

Embed Share

Download

Explore the evolution of standardized testing, from the ancient Chinese civil service exam to modern achievement and aptitude tests. Discover the purposes, development, and validation methods of achievement and aptitude tests, shedding light on their role in assessing mastery of content or skill and evaluating mental abilities.

laelia Follow

Uploaded on Apr 19, 2024 | 9 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Standardized Testing: The Interplay of Achievement & Aptitude Testing Richard P. Phelps University of Bucharest 25 May 2023 1

Two types of standardized tests Achievement Aptitude

Standardized tests - History At least 2,000 years old: Chinese civil service exam: tested achievement & aptitude philosophy of Confucius poetry and literature

Achievement tests Modern History Historically, were larger versions of classroom tests ~ 1900 - scientific achievement tests developed (Germany (Prussia) & USA) J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales

Achievement tests General use Objective: To measure mastery of content or skill Development: Covers the curriculum taught in school Key fairness issue: Alignment of the content of the test with the content of the curriculum. Opportunity to learn principle.

More on achievement tests Requires a mastery of content prior to test. Coachable specific content is known in advance (so, easy to cheat if test security is lax) How validated: RETROSPECTIVE validity (correlation with past measures, such as high school grades)

Aptitude tests - Modern History A. Binet & T. Simon 1890s (France) Pre-school children with severe mental disabilities - Impossible to evaluate them with achievement test - Developed content-free test of mental abilities (association, attention, memory, motor skills, reasoning)

Aptitude Tests Modern History 1917 Like Romania, USA entered World War I late, Paris was threatened, needed to build a large army quickly. US Army wanted to assign new recruits efficiently to the various jobs allocative efficiency. Army Alpha & Beta tests

Aptitude tests, general use Objective: predict how well a person can perform in future Development: First, analyze in detail the knowledge and skills needed Second, keep improving test by correlating test items with future performance, select items that discriminate

Aptitude tests, general use Widely used in industry Sometimes called reasoning or readiness tests Intelligence tests are a type of aptitude test, some still use scale originally developed by Binet & Simon Key fairness issue: Alignment of content of the test with the knowledge and skills needed to perform. Opportunity to perform principle.

More on aptitude tests Some content independence. Measures: what student does with content provided how student applies skills & abilities developed over a lifetime Not easily coachable the content is either not known in advance, basic, broad, commonly known by all, curriculum-free How validated: PREDICTIVE validity, correlation with future activity (e.g., university or job evaluations)

Comparing Achievement & Aptitude tests Achievement Aptitude Measure past learning potential Development content analysis job/skills analysis Validation retrospective predictive Content dependent independent Coachable? very much not much

Aptitude tests for university admission history 1930s 1940s President of Harvard, J. Conant - Wanted test to identify students from low socio- economic class with potential to succeed at Harvard: diamonds in the rough - Developed first Scholastic Aptitude Test (SAT)

Aptitude tests for university admission Aptitude tests, well constructed, can identify: - students bored by secondary school courses, who study what interests them on their own - students poorly adapted to secondary school culture, but well adapted to university culture - high-ability students poorly served by poor quality secondary schools

More information is better, usually If university were just like secondary school, perhaps high- school grades and a retrospective content test would suffice for admission purposes. But university is not just more challenging academically, it is very different from secondary school in other respects, too.

Predictive validity (values from -1.0 to +1.0) measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides little information.

A positive correlation between two measures Source: NIST, Engineering Statistics Handbook

PSU: una cronologa de errores 2000, propuesta inicial: Proyecto SIES PSU Project Pruebas m s eficientes y modernas (IRT) Mejorar articulaci n de la educaci n media con la universitaria 2001 (Pr stamo Banco Mundial a MINEDUC para financiar la reforma educacional Alinear la prueba con el curr culo de ense anza media convertirlo eventualmente en un examen de salida de la ense anza media 2005 (Banco Mundial) Ligar ayuda financiera a puntajes PSU va a favorecer el acceso de los sectores de menores recursos a la universidad

Multiplicidad de Propsitos en la PSU: 1. Medir la implementaci n de un nuevo curriculum; 2. Medir bien el dominio de dos curriculos muy distintos entre s ; 3. Incentivar a los liceos a implementar el nuevo curr culo 4. Incentivar a los alumnos a estudiar m s 5. Predecir el xito en la universidad; 6. Predecir xito en programas universitarios muy distintos entre s 7. Proveer puntos de corte para el ingreso a la universidad, para becas y ayudas financieras.

PSU: una prueba en guerra consigo misma (una prueba de salida de la educaci n cient fica humanista, presentado como un veh culo para evaluar la cobertura curricular que hoy es empleada como prueba de admisi n para todos los estudiantes (incluyendo a los de la ense anza media TP ) Se espera que haga demasiadas cosas ninguna la hace bien, & empeora algunas importantes

Estndares Internacionales Pruebas Educacionales: El o los prop sitos de una prueba deben ser claros para todos las partes interesadas. Una prueba con consecuencias debe ser emp ricamente validada para cada prop sito

Un ejemplo: argumentos de equidad Promotores se alaban que la PSU reducir a la ventaja de los preuniversitarios. Pero las pruebas de contenidos son m s entrenables que las de aptitud porque la base de contenidos es espec fica y conocida. La base de contenidos de las pruebas de aptitud es m s amplia. Promotores se alaban que al estar la PSU basada en el curr culo, todos los estudiantes tendr an acceso a ste. Pero los estudiantes TP no, y muchos colegios cient ficos no alcanzan a cubrir la totalidad del curr culo. Promotores se alaban que los colegios estar an obligados a cubrir todo el curr culo o quedar an en evidencia si no lo hac an, por el pobre desempe o de sus alumnos. Eso no se cumple porque algunos estudiantes parten con lagunas en sus conocimientos que sus maestros no logran nivelar.

Validez predictiva Mide si las puntuaciones en una prueba de admisi n se correlacionan con los resultados Validez predictiva incremental Mide la cantidad de informaci n nica que provee una prueba de admisi n, m s all de la informaci n disponible a partir de otras medidas.

Incremental Predictive validity: SAT & PSU 0.6 0.5 0.4 SAT 0.3 0.2 PSU 2010 0.1 0 Language Mathematics SAT Writing PSU Social Science SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; SAT data from College Board.

Percentage of Chilean Schools reporting complete curriculum coverage of mathematics and language arts: 2012 Primary and Secondary Levels 100 75 Cobertura < 100% 50 Cobertura 100% 25 0 Matem ticas Lenguaje SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ense anza Media Lenguaje y Comunicaci n Matem tica, Septiembre 2012

Drop in the numbers of students from municipal high schools accepted at 4 major universities 60 50 PUC 40 U. CHILE 30 U. CONCE 20 PUCV 10 0 PAA PSU PSU PSU PSU PSU 2003 2004 2005 2006 2007 2008 SOURCE: Simonsen (2008), Zu iga (2005), & El Mercurio (April 16th, 2006)

Incremental Predictive Validity of the PSU (Math) - 2012 Ciencia Quim y Farmacia Agronomia y Forestal Enfermeria y Otros Medicina Veterinaria Tecno Ciencias Construcion Civil Ingenieria Otros Tecno Ingenieria Odontologia Derecho Ciencias Educacion General Ingenieria Civil Ingenieria Com Educacion Media Humanistas Tecno Adm Educacion Parvul/Basica Ciensias Sociales Diseno y Publicidad Arquitectura Periodismo Educacion Media Ciencias Arte 0 2 4 6 8 10 SOURCE: N. Lacourly, M. Silva, & K. Diaz, 2016.

Incremental Predictive Validity of the PSU (Language) - 2012 Ciencia Quim y Farmacia Enfermeria y Otros Veterinaria Tecno Ciencias Agronomia y Forestal Derecho Ciencias Ingenieria Otros Construcion Civil Educacion General Tecno Ingenieria Odontologia Medicina Ingenieria Com Educacion Media Humanistas Tecno Adm Ingenieria Civil Arquitectura Educacion Parvul/Basica Periodismo Educacion Media Ciencias Diseno y Publicidad Ciensias Sociales Arte 0 2 4 6 8 10 SOURCE: N. Lacourly, M. Silva, & K. Diaz, 2016.

Incremental Predictive validity: SAT & PSU 0.6 0.5 0.4 SAT 0.3 0.2 PSU 2010 0.1 0 Language Mathematics SAT Writing PSU Social Science SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; SAT data from College Board.

Achievement and Aptitude in International Tests TIMSS and PIRLS are ACHIEVEMENT tests What is PISA?

Gracias !

Non-cognitive tests More recently developed measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment fit

Non-cognitive tests Purpose: to identify fit with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned Faking can be an issue (e.g., honesty tests)

Comparing Achievement, Aptitude, & Non-Cognitive Tests Achievement Aptitude Non-Cognitive attitudes, values, preferences Measure past learning potential Development content analysis job/skills analysis surveys Validation retrospective predictive predictive Content dependent independent independent Coachable? very much very little can be faked

If more information is better, we should maximize the information available about a student at university entry, in order to make the best match between the student and the institution 3 measures are important: 1. Predictive validity 2. Sub-group differences 3. Content coverage

Achievement tests in the context of university entry: How are they validated? Through alignment with the content taught in secondary school. Achievement tests are retrospective, aligned with content learned in the past. Test scores are typically highly correlated with other available student measures, such as grades and class rank. Achievement tests are typically summative, or exit exams, administered at the end of a program. Fairness assumes that all students have had the same opportunity to learn the content tested.

Aptitude tests for university admission: How are they validated? Predictive validity: the correlation between test scores and measures of future outcomes, such as university grades Less emphasis on content: - Aptitude test content is basic, common knowledge - Test measures how well students reason and solve problem; what they do with the information provided Aptitude tests are not easily coachable: - The content domain is too broad for focused study

Predictive Validity Measures how well scores on an admission test correlates with desirable outcomes Incremental Predictive Validity Measures how much unique information an admission test provides, beyond what is available from other measures.

Incremental predictive validity (engineering): (controlling for secondary school grades) 35 30 25 20 PAA 15 PSU 10 5 0 U. Chile PUC U. Chile PUC Language & Math Language & Math + subject test SOURCE S.A. Prado, Estudio de Validez Predictiva de la PSU y Comparacion con el Sistema PAA, Universidad de Chile

Standardized Testing:. The Interplay of Achievement & Aptitude Testing

Download Presentation

Presentation Transcript

Related

More Related Content