Effective Evaluation of Teaching: Understanding IDEA Scores and Student Diversity

Interpreting IDEA

Heather McGovern

Director of the Institute for Faculty Development

November 2009

add info on criterion vs. normed scores

Add info on standard deviation:Standard deviations of about 0.7 are typical. When

these values exceed 1.2, the class exhibits unusual

diversity. Especially in such cases, it is suggested that

the distribution of responses be examined closely,

primarily to detect tendencies toward a bimodal

distribution (one in which class members are about

equally divided between the “high” and “low” end of the

scale, with few “in-between.” Bimodal distributions

suggest that the class contains two types of students

who are so distinctive that what “works” for one group

will not for the other. For example, one group may have

an appropriate background for the course while the other

may be under-prepared; or one group may learn most

easily through “reading/writing” exercises while another

may learn more through activities requiring motor

performance. In any event, detailed examination of

individual items can suggest possible changes in prerequisites,

sectioning, or versatility in instructional

methods.

Overview

I.

The appropriate role of IDEA in the context of evaluation

of teaching at Stockton

A.

Validity

B.

Reliability and representativeness

II.

Interpreting data

A.

Mean

B.

Halo effect

C.

Error of central tendency

D.

Faculty selection of objectives

III.

Making comparisons

A.

IDEA, Stockton, and disciplinary comparisons

B.

Converted scores

C.

Adjusted scores

D.

Norming

E.

Comparisons to SET data

IV.

Other things to consider

V.

References

The appropriate role of IDEA in

overall evaluation of teaching

The IDEA Center “strongly recommends that

additional sources of evidence be used when

teaching is evaluated and that student ratings

constitute only 30% to 50% of the overall

evaluation of teaching.” Primary reasons:

“some components of effective teaching are

best judged by peers and not students”

“it is always useful to triangulate information...”

no instrument is fully valid

no instrument is fully reliable

Validity

What Stockton candidates provide

for evaluation of teaching

Stockton policy states that “evidence of

teaching performance should be

demonstrated by a teaching portfolio, as

outlined below, which should contain the

following:



A self-evaluation of teaching



Student evaluations of teaching and

preceptorial teaching



Peer evaluations of teaching



Other evidence of effectiveness in teaching”

Correlations of multiple measures

of teaching excellence

Student ratings correlate with faculty,

alumni, and administrator ratings:

Administrator

.39 to .62

Colleagues

.48 to .69

Alumni

.40 to .70

Trained observers

.50 to .76

Student comments

.75 to .93

Student ratings are valid

The validity of student ratings has been

checked with correlation studies of multi-

section course instructor ratings to

external tests. These have indicated

validity ratings that are “practically useful”

(between .30 and .49) as follow:

Achievement of learning

.47

Overall course

.47

Overall instructor

.44

Students cannot validly rate all

important qualities of teaching

 Of 26 factors Cashin (1989) identifies as

relevant to teaching effectiveness, there are

eleven which students cannot assess.

Keig and Waggoner (1994) grouped these into

three categories:

“(1) the goals, content, and organization of

course design, (2) methods and materials

used in delivery, and (3) evaluation of

student work, including grading practices.”

How Stockton defines “excellence

in teaching” and what students rate



“A thorough and current command of the subject matter, teaching

techniques and methodologies of the discipline one teaches



Sound

course design

and

delivery

 in all teaching assignments…as evident in

clear learning goals and expectations

, content reflecting the best available

scholarship or artistic practices, and

teaching techniques aimed at student

learning



The ability to organize course material and to

communicate this

information effectively

The development of

a comprehensive syllabus for

each course taught, including expectations, grading and attendance policies,

and the timely provision of copies to students



…

respect for students as members of the Stockton academic community

the

effective response to student questions,

and the

timely evaluation of

and feedback to students

.”

“Where appropriate, additional measures of teaching excellence are



Ability to use technology in teaching



The capacity to relate the subject matter to other fields of knowledge



Seeking opportunities outside the classroom to enhance student learning

of the subject matter

”

False assumptions about IDEA



Effective teaching=students make

progress on all 12 learning objectives



Effective teachers= teachers who employ

all 20 teaching methods

Reliability and

representativeness

A number of classes are needed to

draw accurate conclusions

File reviewers should keep in mind that the

IDEA Center “recommends using six to

eight classes, not necessarily all from the

same academic year, that are representative

of all of an instructor’s teaching

responsibilities.”

The number of student responders

affects reliability

The number of student respondents affects reliability. In this

context, reliability refers to consistency,  interrater reliability.

Fewer than ten students are unreliable—evaluators should pay

scant attention to numerical data from classes with fewer than

ten respondents. IDEA reports the following median rates:

10 raters

.69 reliability

15 raters

.83 reliability

20 raters

.83 reliability

30 raters

.88 reliability

40 raters

.91 reliability

Reliability ratings below .70 are highly suspect.

The number of student responders

affects representativeness

Higher response rates provide more representative

data. Lower response rates provide less

representative data.

This is especially an area of concern for classes using

the online IDEA which has a lower response rate.

Dennis Fotia informs me that in Fall 2008, the

response rate was 62.9%. Spring 2009 data

◦

Number of Surveys Processed: 50

◦

Number of Respondents: 1045

◦

Number of Responses: 714

◦

Average Response Rate: 71.5%

Interpreting data

Mean scores can be affected by

outliers

Note that average scores as provided on

the IDEA form are mean scores. As such,

they can be affected by outliers. Careful

evaluators will check the statistical detail on

page 4 to note the presence of outliers.

Scores can be affected by the halo

effect

Ranters and Ravers, or the halo effect

 “the tendency of raters to form a general

opinion of the person being rated and then let

that opinion color all specific ratings. If the

general impression is favorable, the "halo effect" is

positive and the individual receives higher ratings

on many items than a more objective evaluation

would justify. The "halo effect" can also be

negative; an unfavorable general impression will

lead to low marks "across the board", even in

areas where performance is strong.”

How can you know?

Look at the student forms themselves. If a

form gives someone a 5 all the way

down—halo effect! In most cases, also

true with a 1 or any other number all the

way down…

The Error of Central Tendency

can affect scores

“Most people have a tendency to avoid the

extremes (very high and very low) in making

ratings.  As a result, ratings tend to pile up

more toward the middle of the rating scale

than might be justified. In many cases, ratings

which are "somewhat below average" or

"somewhat above average" may represent

subdued estimates of an individual's status

because of the "Error of Central Tendency.”

How faculty select objectives can

affect scores

The “Summary Evaluation” provided on

page one of the IDEA report weights

Progress toward Relevant Objectives at

50% and Excellent Teacher and Excellent

Course at 25%.

Therefore, evaluators should pay attention

to the objectives a faculty member

selected.

Things evaluators should check



 The teacher selected objectives. If not, by

default, all will be considered “important.”

This makes most information on the first

summary page of the report worthless.



 The objectives the teacher chose seem

reasonable for the course.



The teacher discusses problematic

objective choices or irregularities in the

class.

Consider external factors in

objective selection

Most of the time at Stockton, the ultimate

decisions about objectives are the

teacher’s choice. However, some (and a

growing number of) programs encourage

(or for all practical purposes, especially

for untenured faculty, require) some

objectives to be held in common.  This is

particularly the case with courses of

which there are multiple sections.

Faculty can help evaluators…

Logically, faculty members creating their files

should note if they



forgot to select objectives, which seriously

impacts the results,



later see that they chose objectives poorly.,



were using objectives in common with a

larger group of courses, but those were

problematic for their class, or



need to report an unusual situation that

likely affected student progress towards

objectives or student perception of the class.

Making comparisons

IDEA compares class results to

three groups

1)

Three years of IDEA student ratings at 122

institutions in 73, 722 classes (excluding classes

with fewer than 10 students, limiting to no more

than 5% of database from any one institution,

excluding first time institutions)

2)

Classes at your institution in the most recent

five years (excluding classes with no objectives

selected, including classes of all sizes)

3)

Classes in the same discipline in the most

recent five years where at least 400 classes with

the same disciplinary code were rated

(excluding as in 1) plus courses with no selected

objectives)

The validity of comparisons varies

The validity of comparisons depends on a number of

factors, including how “typical” a class is, compared to

classes at Stockton or all classes in the IDEA

database or how well the class aligns with other

classes with the same IDEA disciplinary code.

Some classes at Stockton align poorly with “typical”

classes—say, a fieldwork class or a class with an

cutting-edge format.

External factors can affect

comparisons and ratings



Students in required courses tend to report lower.



Students in lower level classes tend to report lower.



Arts and humanities >social science > math (this may

be because of differences in teaching quality or due to

quantitative nature of courses, both, or other factors).



Gender/age/race/culture/height/physical attractiveness

and more may be factors, as they are in many other

areas of life.



If the students are told the evaluation will be used in

personnel decisions the scores are higher.



If the instructor is present during the evaluation the

scores are higher.

Some external factors don’t usually

affect ratings



Time of day of the course



Time in the term in which evaluations are

given (after midterm)



Age of student



Level of student



Student GPA

Some disciplinary comparisons are

suspect

Many classes align poorly with disciplinary

codes: CRIM stats here, which is

compared either with Criminal Justice or

with Mathematics.  Or developmental

writing here, which is higher level than

many but also for credit. Or most of our

G courses, perhaps particularly our GIS

courses.

We should use converted scores

IDEA states that “Institutions that want to

make judgments about teaching

effectiveness on a comparative basis

should use

converted scores.”

Why we should use converted

scores



The 5-point averages of progress ratings on

“Essential” or “Important” objectives vary across

objective. For instance, the average for “gaining factual

knowledge” is 4.00, while that for “gaining a broader

understanding and appreciation for

intellectual/cultural activity is 3.69.



Unconverted averages disadvantage “broad liberal

education” objectives.



Using converted averages “ensures that instructors

choosing objectives where average progress ratings

are relatively low will not be penalized for choosing

objectives that are particularly challenging or that

address complex cognitive skills.”

Why we should use adjusted

averages in most cases

Adjusted scores adjust for “student

motivation, student work habits, class size,

course difficulty, and student effort.

Therefore, in most circumstances, the IDEA

Center recommends using adjusted scores.”

How are they adjusted?

“Work Habits (mean of Item 43,

As a rule,

I put forth more effort than other students

on academic work

) is generally the most

potent predictor…Unless ratings are

adjusted, the instructors of such classes

would have an unfair advantage over

colleagues with less dedicated students.”

How are they adjusted, part II

“Course Motivation (mean of Item 39,

really wanted to take this course regardless

of who taught it

) is the second most

potent predictor. …unless ratings are

adjusted, the instructors of such classes

would have an unfair advantage over

colleagues with less motivated students.”

How are they adjusted, part III

“Size of Class…is not always statistically

significant; but when it was, it was always

negative – the larger the class, the lower

the expected rating.”

How are they adjusted, part IV

“Course Difficulty, as indicated by student ratings of item 35,

Difficulty of subject matter”

 is complicated because the

instructor influences students’ perception of difficulty.

Therefore, “A statistical technique was used to remove the

instructor’s influence on “Difficulty” ratings in order to

achieve a measure of a class’s (and often a discipline’s)

inherent difficulty. Generally, if the class is perceived as

difficult (after taking into account the impact of the

instructor on perceived difficulty), an  attenuated outcome

can be expected.”

Notable examples:  in “Creative capacities” and

“Communication skills” “high difficulty is strongly associated

with low progress ratings.”

In two cases, high difficulty leads to high ratings on progress

toward objectives: “Factual knowledge” and “Principles and

theories.”

How are they adjusted, part V

“Student Effort is measured with responses to item 37,

worked harder on this course than on most courses I have

taken

. “ Here, because response reflects the students’

general habits and how well the teacher motivated

students, the latter is statistically removed from the

ratings leaving  the fifth extraneous factor, “student

effort not attributable to the instructor.” Usually,

student effort is negatively related to ratings.

A special case is that in the cases of “Classes containing

an unusually large number of students who worked

harder than the instructor’s approach required”

which get low progress ratings, maybe because people

were unprepared for the class or lack self-confidence

and so under achieve “or under-estimate their

progress in a self-abasing manner.”

A critical exception to using

adjusted scores

“We recommend using the unadjusted score if the

average progress rating is high

(for example, 4.2

or higher).”

In these cases, students are so motivated and

hard-working that the teacher has little

opportunity to influence their progress, but

“instructors should not be penalized for having

success with a class of highly motivated

students with good work habits.”

Another exception to using

adjusted scores:  Assessment of

learning

“In deciding which ratings to use, it is

important to consider whether the focus

is on student outcomes or on instructor

contributions to those outcomes. For the

former, “Unadjusted” ratings are most

relevant; for the latter, “Adjusted” ratings

are generally more appropriate.”

Do not try to cut the scores more

precisely than IDEA does…

Because the instrument is not perfectly

valid or reliable, trying to compare scores

within the five major categories IDEA

provides is not recommended.

Norming sorts people into broad

categories

Scores are normed. Therefore, it is

unrealistic to expect most people to

score above the similar range. Statistically,

40% of people ALWAYS score in the

similar range and 30% above and 30%

below that range.

More thoughts on norming…



Many teachers teach well. Therefore, the

comparative standard is relatively high. Being

“similar” is not bad. It is fine.



If we made a list of 10 teachers at random at

Stockton, we’d expect that one would fall

into the “much lower” range, two into

“lower,” four into “similar,” two into “higher,”

and one into “much higher” if we think

Stockton teachers are basically comparable

to the teachers in the IDEA database.

Thoughts about comparing to SET

data

We can’t perform the most accurate

comparisons of IDEA data to SET data

because we don’t know the standard of

error for the SET.  The SET also did not

convert or adjust or norm scores. Questions

on it were not tested for validity and

reliability.  Most questions don’t compare to

the IDEA form (and those that do could be

differently influenced by the other questions

on the forms).

Mathematical conversion…

All that said, I’ve found what are supposed to

be fairly valid equations for adjusting from a

5 point to a 7 point scale and back.

That said, research indicates that scales with

fewer points (the IDEA compared to the

SET) allow for less precise measurement.

Apparently this mainly means that because

people can’t go as much higher, scores tend

to be lower even after being converted.

Other things to consider

Other items that relate to our

definition of excellent teaching

Primarily, pages one and two should be

used for summative evaluation of teaching.

Page 4, which provides raw data, should

be at least skimmed to note distribution

of scores and for responses to additional

questions.

Due to our definition of excellence in

teaching, we should also attend to item 17

on page 3 (“Provided timely and frequent

feedback…” for summative evaluation.

Items to consider for formative

evaluation

In cases where evaluators can or choose to

provide formative evaluation, page 3 is

essential.  Here, evaluators should note that

effective teaching methods and styles depend

upon the learning objectives for the class,

and IDEA notes these. IDEA also provides

suggestions for areas of strength, areas of

weakness, and areas that are ok but have

room for improvement. Evaluators can point

to these to see what behaviors to

recommend or applaud.

Teachers can use page 3

Teachers can look to the information on

page three to see what steps they might

take to improve student progress on

various objectives. See sample report.

References



Cashin, William. “Student Ratings of Teaching, the Research Revisited.” 1995. Idea paper 32.

http://www.theideacenter.org/sites/default/files/Idea_Paper_32.pdf



Cashin, William. “Student Ratings of Teaching: A Summary of the Research.” 1988. Idea

paper 20.

http://www.theideacenter.org/sites/default/files/Idea_Paper_20.pdf



Colman, Andrew,  Norris, Claire., and Preston, Carolyn.  “Comparing Rating Scales of

Different Lengths: Equivalence of Scores from 5-Point and 7-Point Scales.” 1997.

Psychological Reports 80: 355-362.



Hoyt, Donald and Pallett, William. “Appraising Teaching Effectiveness: Beyond Student

Ratings.” Idea paper 36.

http://www.theideacenter.org/sites/default/files/Idea_Paper_36.pdf



“Interpreting Adjusted Ratings of Outcomes.” 2002, updated 2008.

http://www.theideacenter.org/sites/default/files/InterpretingAdjustedScores.pdf



Pallet, Bill. “IDEA Student Ratings of Instruction.” Stockton College, May 2006.



“Using IDEA Results for Administrative Decision-making.” 2005.

http://www.theideacenter.org/sites/default/files/Administrative%20DecisionMaking.pdf

Slide Note

Embed Share

Download Presentation

Heather McGovern, Director of the Institute for Faculty Development, discusses interpreting IDEA scores and standard deviation values to understand student diversity and possible instructional improvements. The IDEA Center recommends using multiple evaluation sources, with student ratings comprising only a portion of overall teaching assessment. Stockton candidates must provide a teaching portfolio for evaluation. Correlations between student ratings and other measures of teaching excellence are highlighted.

fhess Follow

Uploaded on Sep 07, 2024 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Interpreting IDEA Heather McGovern Director of the Institute for Faculty Development November 2009 add info on criterion vs. normed scores Add info on standard deviation:Standard deviations of about 0.7 are typical. When these values exceed 1.2, the class exhibits unusual diversity. Especially in such cases, it is suggested that the distribution of responses be examined closely, primarily to detect tendencies toward a bimodal distribution (one in which class members are about equally divided between the high and low end of the scale, with few in-between. Bimodal distributions suggest that the class contains two types of students who are so distinctive that what works for one group will not for the other. For example, one group may have an appropriate background for the course while the other may be under-prepared; or one group may learn most easily through reading/writing exercises while another may learn more through activities requiring motor performance. In any event, detailed examination of individual items can suggest possible changes in prerequisites, sectioning, or versatility in instructional methods.

Overview The appropriate role of IDEA in the context of evaluation of teaching at Stockton A. Validity B. Reliability and representativeness Interpreting data A. Mean B. Halo effect C. Error of central tendency D. Faculty selection of objectives I. II. Making comparisons A. IDEA, Stockton, and disciplinary comparisons B. Converted scores C. Adjusted scores D. Norming E. Comparisons to SET data Other things to consider References III. IV. V.

The appropriate role of IDEA in overall evaluation of teaching The IDEA Center strongly recommends that additional sources of evidence be used when teaching is evaluated and that student ratings constitute only 30% to 50% of the overall evaluation of teaching. Primary reasons: o some components of effective teaching are best judged by peers and not students o it is always useful to triangulate information... o no instrument is fully valid o no instrument is fully reliable

Validity

What Stockton candidates provide for evaluation of teaching Stockton policy states that evidence of teaching performance should be demonstrated by a teaching portfolio, as outlined below, which should contain the following: A self-evaluation of teaching Student evaluations of teaching and preceptorial teaching Peer evaluations of teaching Other evidence of effectiveness in teaching

Correlations of multiple measures of teaching excellence Student ratings correlate with faculty, alumni, and administrator ratings: Administrator Colleagues Alumni Trained observers Student comments .75 to .93 .39 to .62 .48 to .69 .40 to .70 .50 to .76

Student ratings are valid The validity of student ratings has been checked with correlation studies of multi- section course instructor ratings to external tests. These have indicated validity ratings that are practically useful (between .30 and .49) as follow: Achievement of learning Overall course Overall instructor .47 .47 .44

Students cannot validly rate all important qualities of teaching Of 26 factors Cashin (1989) identifies as relevant to teaching effectiveness, there are eleven which students cannot assess. Keig and Waggoner (1994) grouped these into three categories: (1) the goals, content, and organization of course design, (2) methods and materials used in delivery, and (3) evaluation of student work, including grading practices.

How Stockton defines excellence in teaching and what students rate A thorough and current command of the subject matter, teaching techniques and methodologies of the discipline one teaches Sound course design and delivery in all teaching assignments as evident in clear learning goals and expectations, content reflecting the best available scholarship or artistic practices, and teaching techniques aimed at student learning The ability to organize course material and to communicate this information effectively. The development of a comprehensive syllabus for each course taught, including expectations, grading and attendance policies, and the timely provision of copies to students. respect for students as members of the Stockton academic community, the effective response to student questions, and the timely evaluation of and feedback to students. Where appropriate, additional measures of teaching excellence are Ability to use technology in teaching The capacity to relate the subject matter to other fields of knowledge Seeking opportunities outside the classroom to enhance student learning of the subject matter

False assumptions about IDEA Effective teaching=students make progress on all 12 learning objectives Effective teachers= teachers who employ all 20 teaching methods

Reliability and representativeness

A number of classes are needed to draw accurate conclusions File reviewers should keep in mind that the IDEA Center recommends using six to eight classes, not necessarily all from the same academic year, that are representative of all of an instructor s teaching responsibilities.

The number of student responders affects reliability The number of student respondents affects reliability. In this context, reliability refers to consistency, interrater reliability. Fewer than ten students are unreliable evaluators should pay scant attention to numerical data from classes with fewer than ten respondents. IDEA reports the following median rates: 10 raters .69 reliability 15 raters .83 reliability 20 raters .83 reliability 30 raters .88 reliability 40 raters .91 reliability Reliability ratings below .70 are highly suspect.

The number of student responders affects representativeness Higher response rates provide more representative data. Lower response rates provide less representative data. This is especially an area of concern for classes using the online IDEA which has a lower response rate. Dennis Fotia informs me that in Fall 2008, the response rate was 62.9%. Spring 2009 data: Number of Surveys Processed: 50 Number of Respondents: 1045 Number of Responses: 714 Average Response Rate: 71.5%

Interpreting data

Mean scores can be affected by outliers Note that average scores as provided on the IDEA form are mean scores. As such, they can be affected by outliers. Careful evaluators will check the statistical detail on page 4 to note the presence of outliers.

Scores can be affected by the halo effect Ranters and Ravers, or the halo effect the tendency of raters to form a general opinion of the person being rated and then let that opinion color all specific ratings. If the general impression is favorable, the "halo effect" is positive and the individual receives higher ratings on many items than a more objective evaluation would justify. The "halo effect" can also be negative; an unfavorable general impression will lead to low marks "across the board", even in areas where performance is strong.

How can you know? Look at the student forms themselves. If a form gives someone a 5 all the way down halo effect! In most cases, also true with a 1 or any other number all the way down

The Error of Central Tendency can affect scores Most people have a tendency to avoid the extremes (very high and very low) in making ratings. As a result, ratings tend to pile up more toward the middle of the rating scale than might be justified. In many cases, ratings which are "somewhat below average" or "somewhat above average" may represent subdued estimates of an individual's status because of the "Error of Central Tendency.

How faculty select objectives can affect scores The Summary Evaluation provided on page one of the IDEA report weights Progress toward Relevant Objectives at 50% and Excellent Teacher and Excellent Course at 25%. Therefore, evaluators should pay attention to the objectives a faculty member selected.

Things evaluators should check The teacher selected objectives. If not, by default, all will be considered important. This makes most information on the first summary page of the report worthless. The objectives the teacher chose seem reasonable for the course. The teacher discusses problematic objective choices or irregularities in the class.

Consider external factors in objective selection Most of the time at Stockton, the ultimate decisions about objectives are the teacher s choice. However, some (and a growing number of) programs encourage (or for all practical purposes, especially for untenured faculty, require) some objectives to be held in common. This is particularly the case with courses of which there are multiple sections.

Faculty can help evaluators Logically, faculty members creating their files should note if they forgot to select objectives, which seriously impacts the results, later see that they chose objectives poorly., were using objectives in common with a larger group of courses, but those were problematic for their class, or need to report an unusual situation that likely affected student progress towards objectives or student perception of the class.

Making comparisons

IDEA compares class results to three groups 1) Three years of IDEA student ratings at 122 institutions in 73, 722 classes (excluding classes with fewer than 10 students, limiting to no more than 5% of database from any one institution, excluding first time institutions) 2) Classes at your institution in the most recent five years (excluding classes with no objectives selected, including classes of all sizes) 3) Classes in the same discipline in the most recent five years where at least 400 classes with the same disciplinary code were rated (excluding as in 1) plus courses with no selected objectives)

The validity of comparisons varies The validity of comparisons depends on a number of factors, including how typical a class is, compared to classes at Stockton or all classes in the IDEA database or how well the class aligns with other classes with the same IDEA disciplinary code. Some classes at Stockton align poorly with typical classes say, a fieldwork class or a class with an cutting-edge format.

External factors can affect comparisons and ratings Students in required courses tend to report lower. Students in lower level classes tend to report lower. Arts and humanities >social science > math (this may be because of differences in teaching quality or due to quantitative nature of courses, both, or other factors). Gender/age/race/culture/height/physical attractiveness and more may be factors, as they are in many other areas of life. If the students are told the evaluation will be used in personnel decisions the scores are higher. If the instructor is present during the evaluation the scores are higher.

Some external factors dont usually affect ratings Time of day of the course Time in the term in which evaluations are given (after midterm) Age of student Level of student Student GPA

Some disciplinary comparisons are suspect Many classes align poorly with disciplinary codes: CRIM stats here, which is compared either with Criminal Justice or with Mathematics. Or developmental writing here, which is higher level than many but also for credit. Or most of our G courses, perhaps particularly our GIS courses.

We should use converted scores IDEA states that Institutions that want to make judgments about teaching effectiveness on a comparative basis should use converted scores.

Why we should use converted scores The 5-point averages of progress ratings on Essential or Important objectives vary across objective. For instance, the average for gaining factual knowledge is 4.00, while that for gaining a broader understanding and appreciation for intellectual/cultural activity is 3.69. Unconverted averages disadvantage broad liberal education objectives. Using converted averages ensures that instructors choosing objectives where average progress ratings are relatively low will not be penalized for choosing objectives that are particularly challenging or that address complex cognitive skills.

Why we should use adjusted averages in most cases Adjusted scores adjust for student motivation, student work habits, class size, course difficulty, and student effort. Therefore, in most circumstances, the IDEA Center recommends using adjusted scores.

How are they adjusted? Work Habits (mean of Item 43, As a rule, I put forth more effort than other students on academic work) is generally the most potent predictor Unless ratings are adjusted, the instructors of such classes would have an unfair advantage over colleagues with less dedicated students.

How are they adjusted, part II Course Motivation (mean of Item 39, I really wanted to take this course regardless of who taught it) is the second most potent predictor. unless ratings are adjusted, the instructors of such classes would have an unfair advantage over colleagues with less motivated students.

How are they adjusted, part III Size of Class is not always statistically significant; but when it was, it was always negative the larger the class, the lower the expected rating.

How are they adjusted, part IV Course Difficulty, as indicated by student ratings of item 35, Difficulty of subject matter is complicated because the instructor influences students perception of difficulty. Therefore, A statistical technique was used to remove the instructor s influence on Difficulty ratings in order to achieve a measure of a class s (and often a discipline s) inherent difficulty. Generally, if the class is perceived as difficult (after taking into account the impact of the instructor on perceived difficulty), an attenuated outcome can be expected. Notable examples: in Creative capacities and Communication skills high difficulty is strongly associated with low progress ratings. In two cases, high difficulty leads to high ratings on progress toward objectives: Factual knowledge and Principles and theories.

How are they adjusted, part V Student Effort is measured with responses to item 37, I worked harder on this course than on most courses I have taken. Here, because response reflects the students general habits and how well the teacher motivated students, the latter is statistically removed from the ratings leaving the fifth extraneous factor, student effort not attributable to the instructor. Usually, student effort is negatively related to ratings. A special case is that in the cases of Classes containing an unusually large number of students who worked harder than the instructor s approach required which get low progress ratings, maybe because people were unprepared for the class or lack self-confidence and so under achieve or under-estimate their progress in a self-abasing manner.

A critical exception to using adjusted scores We recommend using the unadjusted score if the average progress rating is high (for example, 4.2 or higher). In these cases, students are so motivated and hard-working that the teacher has little opportunity to influence their progress, but instructors should not be penalized for having success with a class of highly motivated students with good work habits.

Another exception to using adjusted scores: Assessment of learning In deciding which ratings to use, it is important to consider whether the focus is on student outcomes or on instructor contributions to those outcomes. For the former, Unadjusted ratings are most relevant; for the latter, Adjusted ratings are generally more appropriate.

Do not try to cut the scores more precisely than IDEA does Because the instrument is not perfectly valid or reliable, trying to compare scores within the five major categories IDEA provides is not recommended.

Norming sorts people into broad categories Scores are normed. Therefore, it is unrealistic to expect most people to score above the similar range. Statistically, 40% of people ALWAYS score in the similar range and 30% above and 30% below that range.

More thoughts on norming Many teachers teach well. Therefore, the comparative standard is relatively high. Being similar is not bad. It is fine. If we made a list of 10 teachers at random at Stockton, we d expect that one would fall into the much lower range, two into lower, four into similar, two into higher, and one into much higher if we think Stockton teachers are basically comparable to the teachers in the IDEA database.

Thoughts about comparing to SET data We can t perform the most accurate comparisons of IDEA data to SET data because we don t know the standard of error for the SET. The SET also did not convert or adjust or norm scores. Questions on it were not tested for validity and reliability. Most questions don t compare to the IDEA form (and those that do could be differently influenced by the other questions on the forms).

Mathematical conversion All that said, I ve found what are supposed to be fairly valid equations for adjusting from a 5 point to a 7 point scale and back. That said, research indicates that scales with fewer points (the IDEA compared to the SET) allow for less precise measurement. Apparently this mainly means that because people can t go as much higher, scores tend to be lower even after being converted.

Other things to consider

Other items that relate to our definition of excellent teaching Primarily, pages one and two should be used for summative evaluation of teaching. Page 4, which provides raw data, should be at least skimmed to note distribution of scores and for responses to additional questions. Due to our definition of excellence in teaching, we should also attend to item 17 on page 3 ( Provided timely and frequent feedback for summative evaluation.

Items to consider for formative evaluation In cases where evaluators can or choose to provide formative evaluation, page 3 is essential. Here, evaluators should note that effective teaching methods and styles depend upon the learning objectives for the class, and IDEA notes these. IDEA also provides suggestions for areas of strength, areas of weakness, and areas that are ok but have room for improvement. Evaluators can point to these to see what behaviors to recommend or applaud.

Teachers can use page 3 Teachers can look to the information on page three to see what steps they might take to improve student progress on various objectives. See sample report.

References Cashin, William. Student Ratings of Teaching, the Research Revisited. 1995. Idea paper 32. http://www.theideacenter.org/sites/default/files/Idea_Paper_32.pdf Cashin, William. Student Ratings of Teaching: A Summary of the Research. 1988. Idea paper 20. http://www.theideacenter.org/sites/default/files/Idea_Paper_20.pdf Colman, Andrew, Norris, Claire., and Preston, Carolyn. Comparing Rating Scales of Different Lengths: Equivalence of Scores from 5-Point and 7-Point Scales. 1997. Psychological Reports 80: 355-362. Hoyt, Donald and Pallett, William. Appraising Teaching Effectiveness: Beyond Student Ratings. Idea paper 36. http://www.theideacenter.org/sites/default/files/Idea_Paper_36.pdf Interpreting Adjusted Ratings of Outcomes. 2002, updated 2008. http://www.theideacenter.org/sites/default/files/InterpretingAdjustedScores.pdf Pallet, Bill. IDEA Student Ratings of Instruction. Stockton College, May 2006. Using IDEA Results for Administrative Decision-making. 2005. http://www.theideacenter.org/sites/default/files/Administrative%20DecisionMaking.pdf

Effective Evaluation of Teaching: Understanding IDEA Scores and Student Diversity

Download Presentation

Presentation Transcript

Related

More Related Content