Standardized Testing:. The Interplay of Achievement & Aptitude Testing
Explore the evolution of standardized testing, from the ancient Chinese civil service exam to modern achievement and aptitude tests. Discover the purposes, development, and validation methods of achievement and aptitude tests, shedding light on their role in assessing mastery of content or skill and evaluating mental abilities.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Standardized Testing: The Interplay of Achievement & Aptitude Testing Richard P. Phelps University of Bucharest 25 May 2023 1
Two types of standardized tests Achievement Aptitude
Standardized tests - History At least 2,000 years old: Chinese civil service exam: tested achievement & aptitude philosophy of Confucius poetry and literature
Achievement tests Modern History Historically, were larger versions of classroom tests ~ 1900 - scientific achievement tests developed (Germany (Prussia) & USA) J.M. Rice - systematically analyzed test structures & effects E.L. Thorndike - developed scoring scales
Achievement tests General use Objective: To measure mastery of content or skill Development: Covers the curriculum taught in school Key fairness issue: Alignment of the content of the test with the content of the curriculum. Opportunity to learn principle.
More on achievement tests Requires a mastery of content prior to test. Coachable specific content is known in advance (so, easy to cheat if test security is lax) How validated: RETROSPECTIVE validity (correlation with past measures, such as high school grades)
Aptitude tests - Modern History A. Binet & T. Simon 1890s (France) Pre-school children with severe mental disabilities - Impossible to evaluate them with achievement test - Developed content-free test of mental abilities (association, attention, memory, motor skills, reasoning)
Aptitude Tests Modern History 1917 Like Romania, USA entered World War I late, Paris was threatened, needed to build a large army quickly. US Army wanted to assign new recruits efficiently to the various jobs allocative efficiency. Army Alpha & Beta tests
Aptitude tests, general use Objective: predict how well a person can perform in future Development: First, analyze in detail the knowledge and skills needed Second, keep improving test by correlating test items with future performance, select items that discriminate
Aptitude tests, general use Widely used in industry Sometimes called reasoning or readiness tests Intelligence tests are a type of aptitude test, some still use scale originally developed by Binet & Simon Key fairness issue: Alignment of content of the test with the knowledge and skills needed to perform. Opportunity to perform principle.
More on aptitude tests Some content independence. Measures: what student does with content provided how student applies skills & abilities developed over a lifetime Not easily coachable the content is either not known in advance, basic, broad, commonly known by all, curriculum-free How validated: PREDICTIVE validity, correlation with future activity (e.g., university or job evaluations)
Comparing Achievement & Aptitude tests Achievement Aptitude Measure past learning potential Development content analysis job/skills analysis Validation retrospective predictive Content dependent independent Coachable? very much not much
Aptitude tests for university admission history 1930s 1940s President of Harvard, J. Conant - Wanted test to identify students from low socio- economic class with potential to succeed at Harvard: diamonds in the rough - Developed first Scholastic Aptitude Test (SAT)
Aptitude tests for university admission Aptitude tests, well constructed, can identify: - students bored by secondary school courses, who study what interests them on their own - students poorly adapted to secondary school culture, but well adapted to university culture - high-ability students poorly served by poor quality secondary schools
More information is better, usually If university were just like secondary school, perhaps high- school grades and a retrospective content test would suffice for admission purposes. But university is not just more challenging academically, it is very different from secondary school in other respects, too.
Predictive validity (values from -1.0 to +1.0) measures how well higher scores on admission test match better outcomes at university (e.g., grades, completion) A test with low predictive validity provides little information.
A positive correlation between two measures Source: NIST, Engineering Statistics Handbook
PSU: una cronologa de errores 2000, propuesta inicial: Proyecto SIES PSU Project Pruebas m s eficientes y modernas (IRT) Mejorar articulaci n de la educaci n media con la universitaria 2001 (Pr stamo Banco Mundial a MINEDUC para financiar la reforma educacional Alinear la prueba con el curr culo de ense anza media convertirlo eventualmente en un examen de salida de la ense anza media 2005 (Banco Mundial) Ligar ayuda financiera a puntajes PSU va a favorecer el acceso de los sectores de menores recursos a la universidad
Multiplicidad de Propsitos en la PSU: 1. Medir la implementaci n de un nuevo curriculum; 2. Medir bien el dominio de dos curriculos muy distintos entre s ; 3. Incentivar a los liceos a implementar el nuevo curr culo 4. Incentivar a los alumnos a estudiar m s 5. Predecir el xito en la universidad; 6. Predecir xito en programas universitarios muy distintos entre s 7. Proveer puntos de corte para el ingreso a la universidad, para becas y ayudas financieras.
PSU: una prueba en guerra consigo misma (una prueba de salida de la educaci n cient fica humanista, presentado como un veh culo para evaluar la cobertura curricular que hoy es empleada como prueba de admisi n para todos los estudiantes (incluyendo a los de la ense anza media TP ) Se espera que haga demasiadas cosas ninguna la hace bien, & empeora algunas importantes
Estndares Internacionales Pruebas Educacionales: El o los prop sitos de una prueba deben ser claros para todos las partes interesadas. Una prueba con consecuencias debe ser emp ricamente validada para cada prop sito
Un ejemplo: argumentos de equidad Promotores se alaban que la PSU reducir a la ventaja de los preuniversitarios. Pero las pruebas de contenidos son m s entrenables que las de aptitud porque la base de contenidos es espec fica y conocida. La base de contenidos de las pruebas de aptitud es m s amplia. Promotores se alaban que al estar la PSU basada en el curr culo, todos los estudiantes tendr an acceso a ste. Pero los estudiantes TP no, y muchos colegios cient ficos no alcanzan a cubrir la totalidad del curr culo. Promotores se alaban que los colegios estar an obligados a cubrir todo el curr culo o quedar an en evidencia si no lo hac an, por el pobre desempe o de sus alumnos. Eso no se cumple porque algunos estudiantes parten con lagunas en sus conocimientos que sus maestros no logran nivelar.
Validez predictiva Mide si las puntuaciones en una prueba de admisi n se correlacionan con los resultados Validez predictiva incremental Mide la cantidad de informaci n nica que provee una prueba de admisi n, m s all de la informaci n disponible a partir de otras medidas.
Incremental Predictive validity: SAT & PSU 0.6 0.5 0.4 SAT 0.3 0.2 PSU 2010 0.1 0 Language Mathematics SAT Writing PSU Social Science SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; SAT data from College Board.
Percentage of Chilean Schools reporting complete curriculum coverage of mathematics and language arts: 2012 Primary and Secondary Levels 100 75 Cobertura < 100% 50 Cobertura 100% 25 0 Matem ticas Lenguaje SOURCE: Centro de Estudios Mineduc, Cobertura Curricular en Ense anza Media Lenguaje y Comunicaci n Matem tica, Septiembre 2012
Drop in the numbers of students from municipal high schools accepted at 4 major universities 60 50 PUC 40 U. CHILE 30 U. CONCE 20 PUCV 10 0 PAA PSU PSU PSU PSU PSU 2003 2004 2005 2006 2007 2008 SOURCE: Simonsen (2008), Zu iga (2005), & El Mercurio (April 16th, 2006)
Incremental Predictive Validity of the PSU (Math) - 2012 Ciencia Quim y Farmacia Agronomia y Forestal Enfermeria y Otros Medicina Veterinaria Tecno Ciencias Construcion Civil Ingenieria Otros Tecno Ingenieria Odontologia Derecho Ciencias Educacion General Ingenieria Civil Ingenieria Com Educacion Media Humanistas Tecno Adm Educacion Parvul/Basica Ciensias Sociales Diseno y Publicidad Arquitectura Periodismo Educacion Media Ciencias Arte 0 2 4 6 8 10 SOURCE: N. Lacourly, M. Silva, & K. Diaz, 2016.
Incremental Predictive Validity of the PSU (Language) - 2012 Ciencia Quim y Farmacia Enfermeria y Otros Veterinaria Tecno Ciencias Agronomia y Forestal Derecho Ciencias Ingenieria Otros Construcion Civil Educacion General Tecno Ingenieria Odontologia Medicina Ingenieria Com Educacion Media Humanistas Tecno Adm Ingenieria Civil Arquitectura Educacion Parvul/Basica Periodismo Educacion Media Ciencias Diseno y Publicidad Ciensias Sociales Arte 0 2 4 6 8 10 SOURCE: N. Lacourly, M. Silva, & K. Diaz, 2016.
Incremental Predictive validity: SAT & PSU 0.6 0.5 0.4 SAT 0.3 0.2 PSU 2010 0.1 0 Language Mathematics SAT Writing PSU Social Science SOURCE: Pearson, Final Report Evaluation of the Chile PSU, January 2013; SAT data from College Board.
Achievement and Aptitude in International Tests TIMSS and PIRLS are ACHIEVEMENT tests What is PISA?
Non-cognitive tests More recently developed measure values, attitudes, preferences Types: integrity tests career exploration matchmaking employment fit
Non-cognitive tests Purpose: to identify fit with others or a situation Developed using: surveys, personal interviews How validated? success rate in future activities Content is personal, not learned Faking can be an issue (e.g., honesty tests)
Comparing Achievement, Aptitude, & Non-Cognitive Tests Achievement Aptitude Non-Cognitive attitudes, values, preferences Measure past learning potential Development content analysis job/skills analysis surveys Validation retrospective predictive predictive Content dependent independent independent Coachable? very much very little can be faked
If more information is better, we should maximize the information available about a student at university entry, in order to make the best match between the student and the institution 3 measures are important: 1. Predictive validity 2. Sub-group differences 3. Content coverage
Achievement tests in the context of university entry: How are they validated? Through alignment with the content taught in secondary school. Achievement tests are retrospective, aligned with content learned in the past. Test scores are typically highly correlated with other available student measures, such as grades and class rank. Achievement tests are typically summative, or exit exams, administered at the end of a program. Fairness assumes that all students have had the same opportunity to learn the content tested.
Aptitude tests for university admission: How are they validated? Predictive validity: the correlation between test scores and measures of future outcomes, such as university grades Less emphasis on content: - Aptitude test content is basic, common knowledge - Test measures how well students reason and solve problem; what they do with the information provided Aptitude tests are not easily coachable: - The content domain is too broad for focused study
Predictive Validity Measures how well scores on an admission test correlates with desirable outcomes Incremental Predictive Validity Measures how much unique information an admission test provides, beyond what is available from other measures.
Incremental predictive validity (engineering): (controlling for secondary school grades) 35 30 25 20 PAA 15 PSU 10 5 0 U. Chile PUC U. Chile PUC Language & Math Language & Math + subject test SOURCE S.A. Prado, Estudio de Validez Predictiva de la PSU y Comparacion con el Sistema PAA, Universidad de Chile