Why teaching evaluations might not be a good way to evaluate teaching
TVO talks to Emma Phillips about a recent arbitration win
Arbitrator William Kaplan recently held that Ryerson University can no longer use student evaluations of teaching to make decisions about hiring and promotions. He concluded that, while student evaluations are easy to administer and “have the air of objectivity”, their ability to measure the quality of an instructor’s teaching is “imperfect at best and biased and unreliable at worst.”
Emma Phillips, who represented Ryerson Faculty Association in that case, was interviewed by TVO in an article that takes a look at the evidence upon which the arbitrator based his decision.
Emma Phillips, one of the lawyers for the faculty association, says that although the ruling is binding only on Ryerson, the declaration that SETs may not be used in hiring or promotion at that institution could have wide-ranging implications for post-secondary instructors everywhere.
“What Kaplan found is that students are really not in a position to assess whether a professor is effective,” Phillips said.
Part of the testimony that Kaplan considered was a report by Philip Stark, a statistician at the University of California, Berkeley. Stark states that if an evaluation uses a 1 to signify “strongly disagree” and a 5 to signify “strongly agree,” attempting to come up with a score out of 5 is “statistically meaningless.” If a teacher gets a 2.5 rating, that could mean she was considered mediocre by the entire class, or it might mean she challenged her students and ended up loved by the hard workers and hated by the slackers. The average provides no way of telling who was a better teacher.
But regardless of how the results are presented, SETs are useful only if the questions actually measure how well a teacher is teaching. Stark isn’t convinced that they do.
First, he points to the possible biases. Studies have shown that students’ opinions of their teachers appear to be swayed by the grades that they expect, whether or not the material is heavy in math, the instructor’s gender, age, attractiveness, and race — even the physical condition of the classroom.
On top of that, Stark reviewed several recent studies and concluded that they “generally find weak or negative association between SET and instructor effectiveness.”