Teaching Evaluation: Pursuing a Mirage

When I first arrived at Carleton University in Ottawa the faculties were just beginning to get into teacher evaluation. I was appalled at the methods proposed and wrote and circulated (to the whole university) a memo on the subject. The methods of measuring teaching skills that emerged then prefigured some of the worst notions we have today and were naive, mechanistic, and simplistic.

The politicians and public school bureaucrats in the United States–see the recent “triumphant” reform in Florida–apparently believe that teacher-evaluation is a way of improving education. That might be so in a never-never land where the evaluators were omniscient, infallible, and had designed a system that actually measured something, but in our frail world and given what the art of teaching involves, it’s pure self-delusion to imagine that teaching can be evaluated by the means at hand today. The percentage of a teacher’s students who pass standard tests proves very little. In fact, my tendency would be to fire any teacher whose students scored exceptionally well in standard tests. The object of teaching is to develop the potential of the individual student, not to teach them how to pass tests. Passing tests is quite irrelevant.


What follows is what I wrote in 1970 on this subject. I wouldn’t change a word of it in the light of what I know today.

On the Proposal to Evaluate Teachers

I find the questionnaire circulated by your committee discouragingly inadequate as an initial inquiry. There may be good reasons why you begin with such an approach, though I can see none. The following remarks seek to probe some of the problems of evaluation in a fairly casual way. Even so, I find myself at the end face to face with the apparent absurdity of the whole drive to “rate” teaching ability.

Is the phenomenon of the teaching act really comprehensible under the terms you imply in your questionnaire ? Is it comprehensible under any terms we could conveniently render into ratable values? I doubt it.

I should like to make a few points to expose only some of the difficulties; I have no doubt that there are many others.

1) Your distinction between “course content” and “teaching ability” as areas for evaluation seem dubious. “Course content” — the phrase itself — is ambiguous. It might mean either the actual specific material of the course, e.g., “the paintings of all the major masters of the Italian Renaissance”; or it might mean the qualitative level of the course, as when one might brag: “My course is no Mickey Mouse production; the content is very high”. If it means the former, I fail to perceive how “course content” in that sense can be
evaluated. Clearly, a course in “the paintings of all the major masters of the Italian Renaissance” is not better intrinsically than a course in “all the Northern painters of the Renaissance”. Choices here must be made as part of programmatic wholes, by departments and faculties, and whether they are “right” or “wrong” depends upon program objectives.

Yet surely “course content” in the sense of the qualitative level of the given material cannot be separated from the evaluation of “teaching ability”. In that sense given two teachers covering roughly the same material, the better teacher will produce the better level of treatment, will cover the material better, barring freakish cases where a good teacher throws out the “course description” for some reason or other.

The evaluation of course content (assuming you are talking about the specific relevance of the material of the course) should be left to departments and faculties. If you are talking about the skill with which the teacher handles the material, then that should clearly be evaluated as part of his “teaching skill.”

2) There is a good case for assuming that evaluating teaching ability is impossible. In the terms suggested by your questionnaire, I have no doubt whatever that it is in fact impossible. Let us assume that teaching shares some qualities with art, not an unusual assumption, for while the symbolic content of art is probably different in most cases from the informational content of the teaching material, teaching is at least like art in that it is tinged with personality-factors. I think here particularly of the performing arts, which move from a “role” or “score” through a human interpreter, to an audience. Was Toscanini a better conductor than Furtw√§ngler? Or even: Churchill a better speaker than de Gaulle ? This is an analogy only, and may arouse howls of dismay, but I think the point suggested is unassailable. Teaching by human teachers is always interlocked with personality factors, and when the relative skills of the “performers” are fairly close, there is no “objective” way to rate one over another. Thus, among greater singers, one can never really decide among the very greatest, the best one can hope for is to separate the great from the mediocre, the mediocre from the absolutely incompetent. Among three good teachers, one with a brilliantly logical mind, another with a deep spiritual vision, another with a warm human relevance — how is one to compare and “evaluate?” The idea is absurd, and only invites absurdities of evaluation. Even if one were to assume that an “ideal” course content could be fixed, and that “feedback” of certain factual information were the highest test of teaching skill, how could “feedback” (which is here a matter of intonation, gesture, glance, in short, of the whole “impact” of human exchange) — how could this possibly be measured? Teaching machines using an abstract symbolism can perhaps be so evaluated, but surely not the performing animals who stand up before their fellows in order to nudge them into some kind of awareness ? We are usually sensible enough to settle for “rough and ready” evaluation of teachers, as we are indeed for “performers” of all kinds. After a while it becomes generally accepted that so and so is a dud, and that someone else is an “outstanding teacher.” Speaking generally, I think this is the most accurate kind of evaluation one is likely to get. Naturally, there are occasional injustices. The plausible windbag outshines the terse genius (or the artful dullard the fertile windbag). And if you can somehow prevent this, you will be accomplishing miracles. Yet I feel that, eventually, though the mills of reputation grind slowly, they grind exceedingly small, that in the long run, the incompetents are known, and the exceptional teachers too, though not necessarily rewarded. Perhaps if you want to find out who’s good and who’s not, you should simply “ask around”. If you ask widely enough, and in the most unstructured manner possible, you will get (nine times out of ten) a roughly correct answer.

3) Let us assume, however, that you disagree with such arguments, that you insist on setting up machinery for evaluation. Surely the first thing you must do is to find some generally acceptable statement of what you mean by a good teacher. This,
I believe, will prove impossible. I suggest my own criteria, in order to illustrate the difficulty.

a) The good teacher is one who “wakes up” the minds of his students and encourages them to develop in themselves their own maximum abilities. He frees them from the tyranny of the collective and the conventional, and encourages them to become centers of original intellectual, moral, and spiritual energy. The good teacher, having fused his own knowledge and passion, directs his students toward their own self-synthesis. The bad teacher lets the sleepers sleep, and puts those half-awake back to sleep (usually both literally and figuratively). There is no personality type of the good teacher, however; or favoured
technique. He may be an ironist or an evangelist, quick or slow; he may speak with rhetorical flair or rather dryly. However he performs, the knowledge and the passion get through and transform his listeners into participants in a dialogue. Out of this, human beings grow. (And the subject doesn’t matter: the best lectures- I have ever heard were by Aldous Huxley on mysticism — on which he spoke dryly; the second best were on insurance, passionately delivered).

b) The good teacher sets forth what is known, creates the ambiance of a specific discipline, suggests perspectives, activates traditions. He enables his students to understand the specific content of his course, relates it to his discipline as a whole (even if only unconsciously and by suggestion), and in so doing “wakes up” his students in the manner suggested in (a) above. The good teacher moves from the specific to the universal, in the sense that though he must necessarily limit his terms of reference, almost any given point carries the possibility of sudden acceleration into the “shocking perspectives” of the awakened mind. Thus, though in most cases he and his students begin with the “irreducible stubborn facts” of the discipline, they will go on to the more culturally and biologically relevant level of the “awakened” mind — which is why I have listed it first.

Now let us suppose that after two years of sitting in committee, and having heard the testimony of several hundred passionately engaged witnesses, a group produces some such definition of “good teacher”. Does it even then have the right to ask every “performer” in the university to accept its definition of what he is supposed to be doing ? If one man in a university of several hundred teachers declares his aims to be quite other than those “agreed upon”, would it be wise to force him to be judged by them? I think not. I think that that would be as absurd as trying to formulate the criteria of good singing, or good public speaking in such a way as to impose on the already existing “loose” but flexible and useful criteria, a quite bureaucratically arbitrary centrality. Why such an insane desire to impose abstract limits on processes which custom and general opinion in time evaluate ? The teacher who refused to accept a committee’s definitions would surely, in fairness, have to be asked to state his own aims. One part of the rating process would have to include the rating of his own aims against his achievements.

The first result, therefore, of setting up any specifically agreed-upon terms of evaluation would be:1) either the infringement of the individual teacher’s right to be himself, to work toward his own goals in the specific dynamic situation of his own microcosm; or: 2) the introduction of a dual-evaluation system, one part of the process being directed to measuring a “performer” against his own criteria, the other toward measuring his against the agreed-upon arbitrary centrality.

We are already in administrative trouble, already face to face with the absurdity of “evaluation”, but this is only the beginning. Let us assume that no one dissents from the general criteria, and that my argument against having them is to be rejected. Even so, one will not in any case come up with criteria which can be administratively evaluated by some simplistic process such as letting the dean, or the students, or the teacher, or anyone else “do it.” Take the case above, where I reduce the criteria of the good teacher to only two. By what administrative machinery might these two qualities, the ability to stimulate and awaken the mind of the student, and the ability to instruct in a specific discipline, be rated ?

In the first case, one would surely have to turn first to the students whom he is trying to “awake”, to find out if the teacher is succeeding. But how: by asking them ? Or perhaps one should ask the teacher if he is succeeding ? Then too, ought there not to be outside observers from his own department or faculty, who might see things from a wider perspective ? Or perhaps, distinguished local examples of the “awakened” mind ?

Even with a specifically agreed-upon criterion, the process is not easy. And surely one could not use the same “observers’: to measure the other defined criterion of the good teacher, his ability to instruct in his specific discipline. For after all, here his colleagues in his own department and outsiders in the same discipline should know best. But the students, who know relatively little about the discipline as a whole, ought to have a voice too, surely ? And what about the instructor himself — doesn’t he have some ability to measure himself in his field

We have at this point, in order to be quite fair, three rating processes, one based on the criteria of the instructor himself, one based on one aspect of his teaching ability, and a third based on another. All three of these processes, again –to be fair — require several separate rating groups. The process would probably take up the better part of an academic year. Several dozen people would be involved with each instructor, some of them of considerable eminence, perhaps. After this, the harassed “performer” ought not merely to be given a raise, he ought to be allowed some kind of title, or decoration. Perhaps he might even be allowed to wear his fingernails in long splendid curves. The result of all this activity would be to decide that Professor X is excellent, Professor Y mediocre, and Professor Z, unmentionable. (Incidentally, Professor Z had better not be or even feel “threatened” by all this, because he might just be your only talent in the field of photo-fission, even though as a “performer” he leaves something to be desired — and he might prefer to remain unrated in some other university). Which takes us to the threshold of another absurdity: who is interested in setting up machinery for rating teaching performance, and for whose benefit is it being done ? Are we going to reverse the present market situation (which, as everyone knows, rates publication over classroom performance) by setting up machinery that pretends to measure the imponderable ? If we really believed that classroom teaching were the most important ratable factor in a college teacher, wouldn’t the machinery have evolved long ago to enable us to make ratings ?

4) I conclude my discussion as follows, summing up my points. First, your separation of course content and teaching ability seems dubious. Second, it is very doubtful that teaching can be rated at all. Third, to rate it in the manner implied in your initial questionnaire seems to reduce complex human abilities to quantities measurable by filing clerks and those who press buttons to add up dots on multiple choice questions. Four, even if a more subtle rating process were set up, it would involve more administrative machinery than could possibly be directed into the project. Five, the final result of any rating will probably merely confirm what is known already by rumour and general opinion. Six, aren’t you putting the idealistic cart before the market-place horse, that is, is your committee functioning in terms of existing social and economic realities ?

Rather than answer all your specific questions (though I have answered some) I prefer to direct my own questions to your committee “more in sorrow than in anger.”

Is it just possible that we ought to consider the whole question of teacher evaluation as a nightmare from which which should be trying to awake ? Is the real nightmare, perhaps, beginning when we are asked to reply to simplistically laid-out queries from a “Committee on Course and Teacher Evaluation” ?

Shouldn’t we invite this “Committee on Course and Teacher Evaluation” not to waste too much time deciding that there is no way teachers can be evaluated that does not do violence to the reality it ostensibly seeks to make measurable ?


Many years later my second wife became a teacher and worked in special education, teaching students with problems, and also the exceptionally gifted (in a Grade Four public school class). I had occasion to witness some of her work and I got much feedback from parents whose children were her students and some from the students themselves. It was clear to me that she was an outstanding teacher, and so considered by her students and by almost all parents whose children she taught. Her “secret” was as follows. She was unfailingly imaginative, highly informed, and either flexible or firm in discipline, as the occasion demanded. She understood the psychological undercurrents that were revealed in her students’ behaviours. She taught her students to be self-reliant, to respect others as persons, to love learning, to be excited about intellectual discoveries, and to share them, while also learning new things from others. She challenged her students, but it was never a competition. She made her goals clear and explained to her students what she expected from them.

They were no fools, those Grade Four kids; they knew they were in a special place, and they could hardly wait to get to school every day. Many adult visitors to the class, including this one, wished they had been trained by “Mrs. H.” Now tell me how a standardized test would have connected with or expressed the quality of this classroom experience, or how preparing students for tests would have enriched them for a lifetime, as this remarkable teacher did.

