I support developing more effective ways to evaluate teachers — using multiple measures.

What I don’t support, however, is the present effort by the Gates Foundation that’s spending millions of dollars using student scores on standardized tests as THE MEASURE used to evaluate teachers.

I have no objection to scores from existing standardized tests being a part — a small part — of those multiple measures. If present efforts to create a “new generation” of state assessments actually invite teachers to work with them and develop more accurate performance-based assessments, I would have no objection to their proportional weight being increased — a little.

Accomplished California Teachers (of which I am a member) published a report earlier this year that I think accurately reflects my thinking on teacher evaluation:

To support collaboration and the sharing of expertise, teachers should be evaluated both on their success in their own classroom and their contributions to the success of their peers and the school as a whole. They should be evaluated with tools that assess professional standards of practice in the classroom, augmented with evidence of student outcomes. Beyond standardized test scores, those outcomes should include performance on authentic tasks that demonstrate learning of content; presentation of evidence from formative classroom assessments that show patterns of student improvement; the development of habits that lead to improved academic success (personal responsibility, homework completion, willingness and ability to revise work to meet standards), along with contributing indicators like attendance, enrollment and success in advanced courses, graduation rates, pursuit of higher education, and work place success.

I’ve written at the Washington Post what these ideas look like on the ground at our school (see The best kind of teacher evaluation).

I’m not going to spend a lot of time here reviewing the reams of research that have shown how evaluating teachers using student test results are unstable and inaccurate.  You can find more than enough evidence for that at The Best Resources For Learning About The “Value-Added” Approach Towards Teacher Evaluation.

But right now my big concerns about the Gates Foundation efforts are how I fear they might be minimizing two key tools that can have a huge impact on improving teacher effectiveness — videotape and student surveys.

As I’ve previously written (There Are Some Right Ways & Some Wrong Ways To Videotape Teachers — And This Is A Wrong Way) Gates is funding a massive effort to videotape teacher lessons and then have them evaluated by people who have never visited the school nor have any kind of relationship with the teacher, and rate them using checklists and correlate them to value-added scores.

Contrast that way with how videotape is being used to universal acclaim at our school (led by principal Ted Appel) where a talented consultant (Kelly Young at Pebble Creek Labs), who has been working with us for years, meets with us to review an edited version of a taped lesson, with us initially giving our own critique and reflections followed by his comments. This process is entirely outside of the official evaluation process, and is focused on helping teachers improve their craft. It has been one of the most significant professional development experiences I’ve had. At my request, Kelly and I subsequently showed the video and shared our critique with my class, which was a transforming experience for all involved. Teacher Magazine will be publishing my account of that class period in early January.

As part of their massive project, Gates is also having thousands of students complete anonymous surveys evaluating their teachers and, you guessed it, correlating the answers to student test scores.

I’m a huge fan of getting student feedback. In fact, I’ve posted My Best Posts On Students Evaluating Classes (And Teachers). To help students see that I take their responses seriously, I always reprint the results in this blog (you can see them and the questions at that “The Best…” list) and email the results to teachers and administrators at my school.

But I want to know more from students than what Gates is asking. I want to know if they think I’m patient and if they believe I care about their lives outside of school. Yes, I certainly want to know what they think I could do better, and I also want to know what they think they could do better. I want to learn if they think their reading habits have changed and, for example, when I’m teaching a history class, are they more interested in learning about history than they were prior to taking the class. I want to find-out what they believe are the most important things they learned in the class and, for many, it might be learning life skills like the fact their brain actually grows when they learn new things or the fact that they had in them the capacity to complete reading a book or writing an essay for the first time in their lives. And, in the discussion that follows (one thing I learned as an organizer is that a survey’s true use is as a spark for a conversation) we discuss all these things and many more, including the differences between what might be what we like to do best and what we learn the most from.

By trying to connect videotaping teachers to anonymous checklist evaluators and test scores, and doing the same to student surveys, I fear the Gates Foundation may succeed in framing the public conversation about these tools as just a means to one end — better scores on assessments that don’t accurately measure learning.

This minimizes these potentially powerful tools, contributes toward seeing both teachers and students as replaceable widgets, and unfortunately reinforces a school reform debate where many worship at the alter of multiple choice test results.

Using videotaped teacher lessons and student surveys for the primary purpose of connecting them to teacher evaluation by test scores is like using a Stradivarius and a Grand Piano to play “Mary Had A Little Lamb” to evaluate the musician.  In both instances, the tools have far more value to everyone if  used in more expansive ways.

No, we all deserve better…

(Here’s a link to the article I wrote about my evaluation)