Four years ago, I wrote a series of pieces about how student evaluations did not yield meaningful data. Worse, the data they did yield were used to disproportionately attack women and minorities. This should have come as no shock because all the evidence suggests that student evaluations are biased against women and minorities. It’s the circle of … well, something.
The literature suggests that teaching evaluations are deeply, deeply troubled measures of professor teaching skills. They ought to be done away with. But I’m not optimistic. Universities employ experts in using surveys. Yet they continue to use flawed ones. And as universities seek to measure more “stuff” regardless of the precision of those measures, student evaluations are here to stay. So, it’s worth a moment envisioning how to make them less troubling
Student evaluations should not be fully anonymous. To the extent that students are repeat players in the evaluation game, we ought to collect more meaningful data to observe how patterns emerge. That requires that students be tracked across subjects. I’m not suggesting that professors have access to de-anonymized data. I’m suggesting that someone ought to have access to it.
If a student seems to repeatedly mark women or minorities lower than white male professors, that’s useful information. So, too, is information that the comments from that student are more hostile toward women and minorities. The fact that the comments are not perfectly anonymous ought to temper some of the more useless comments, such as “worst course ever” or “the professor sucks!”
Student evaluations also need to be linked to the final course grade to see what patterns emerge there. Ratemyprofessors now asks the question what grade the student received, but has no way of proving the student actually even took the course, let alone got the grade declared. But universities can and should. A recent study determined that “student grade satisfaction — regardless of the underlying cause of the grades — appears to be an important driver of course evaluations.” I think it would be interesting to compare an expected grade (which ought to be on the evaluation) to the actual grade received.
Demographic data for both the student and the professor ought to be included as well. For example, does a particular white professor consistently perform poorly when rated by Black and Brown students? That might be a signal of concern.
Evaluations should happen more often. It always confuses me as to why evaluations occur at the end of the course. It does not allow the professor an opportunity to correct any issues, and it encourages students to sit and suffer in silence if there is something that would better facilitate their learning. As issues are not corrected, the evaluations may end up being worse. Many professors recognize this by having midcourse evaluations. But it seems to me this ought to be common practice. Note that the goal is to improve teaching, not to weaponize the evaluations to target certain professors out of jobs.
Evaluations should be supplemented with other interventions. Evaluations are a tool, not a rule. Using course evaluations for purposes of promotion and tenure is deeply troubling and problematic, given the other barriers in place that promote barriers for women and minorities. Instead, teaching evaluations if properly constructed COULD serve as a useful tool. But only one potential tool.
If evaluations indicate that a professor over time has trouble in the classroom, the faculty need to step up and see if that is, in fact, true. That means sitting in on the class and observing. Perhaps more than once.
Even then, data should be kept. Are women professors the only ones whose classrooms get visited? How a department approaches its commitment to teaching might speak volumes about larger institutional biases that could hamstring the teacher in the classroom.
Evaluations need to be redesigned (you know, once we figure out what we’re evaluating). It seems to me that the largest problem with student evaluations is we have NO CLUE WHATSOEVER what we’re measuring. A fairly recent study ASSUMED the best-case scenario for student evaluations. Even under that best-case scenario, student evaluations were deeply flawed.
The literature has shown that student evaluations as currently constructed are strewn with gender and racial biases. Instructor attire and weight has impacts on student evaluations, too. In short, there is a lot of noise in student evaluations that have nothing to do with teaching and everything to do with student biases.
We ought to rethink how we construct and measure the survey. I offer some guiding principles:
Stop comparing evaluations across courses. You aren’t going to get any information comparing a mandatory course like Professional Responsibility with a popular optional course.
Stop asking open-ended questions that allow students to vent or gush. Such opportunities often do not yield beneficial results, and often are the very place racist or sexist comments rear their head. Often, such comments have nothing to do with pedagogy, but more to do with personality.
Stop measuring nonsense. The literature points out that an integer rating system of 1 to 5 cannot be properly averaged: The students may not think that the difference between 4 and 5 is great while the difference between 1 and 2 is. Hey, come to think of it, that’s a good criticism of how we do grades, too!
Stop asking questions students can’t answer. Did you even wonder why Mr. Myagi never asked Daniel whether painting the fence or waxing the car would be useful in self-defense? It’s because Daniel would have said it was a waste of time at the time he was learning those skills. Only after, did Daniel learn the value of those techniques. So, asking “Will this course help you in your future career?” and other such questions isn’t useful. It is merely an attempt to sneak the “customer service” model of education into the evaluations.
Stop taking evaluation answers as gospel. If your school has a question about how many classes the professor has missed — and every school and department in which I have taught has that question — you know that students do not necessarily report accurately. In one semester, I canceled between zero and eight classes (I canceled one in reality). But you could see how the answers were correlated with the other portions of the evaluation.
It’s not one thing. It’s not just how administrations weaponize student evaluations that is flawed. It’s now just how the data from the evaluations is measured that is flawed. And it’s not just how the evaluations themselves are flawed. It is a combination of ills that produce the results. We ought to be more deliberate in how we use evaluations as tools to facilitate learning and stop using them as weapons.
LawProfBlawg is an anonymous professor at a top 100 law school. You can see more of his musings here. He is way funnier on social media, he claims. Please follow him on Twitter (@lawprofblawg). Email him at firstname.lastname@example.org.