Guest Post by John Thompson

When I first followed Larry’s link to Tom Kane’s Op Ed in the Wall Street Journal (see Disappointing Op Ed On Using Tests To Evaluate Teachers By Head Of Gates’ Project), I also was disappointed.  Perhaps I’m naive but, upon reflection, I was struck by Kane’s conclusion.  The Gates Foundation’s scholar concluded that, “as imperfect as the current measures of effective teaching are—and they must be improved—using multiple measures provides better information about a teacher’s effectiveness than seniority or graduate credentials.”  In other words, after investing tens of millions of dollars in research, the best thing he can say about the use of test score growth for evaluations is that it is better than two of the weakest indicators available?

In fact, I wonder why Kane compared his attempts to quantify instructional effectiveness to two issues that have little or nothing to do with that issue.  Seniority is the teacher’s First Amendment in that it protects educators from the whims of their bosses, not to mention politicized fads.  There are many simpler and safer ways to reform seniority without encouraging test-driven evaluations. And, whether you agree or disagree with the policy of providing incentives for graduate courses, that issue has nothing to do with the question of whether an algorithm can be made accurate enough for firing teachers.

Too many economists trying to improve the validity of these value-added models (VAMs) seem to believe that the purpose of these experimental algorithms is making their calculations more reliable, as they seem oblivious to the actual circumstances in schools.  For instance, the study cited by Kane, “Long-term Impact of Teachers” by Raj Chetty, John Friedman, and Jonah Rockoff, made a big deal out of the consistency of the teacher effects they found when teachers in the 95th percentile change schools – as if that neat experiment said anything about real-world policy issues.  (If it could be shown that elite teachers in elite schools were transferring in significant numbers to the inner city and producing test score gains, THAT would be relevant.)

In contrast, if the issue is whether value-added is good for students, Catherine Durso’s “An Analysis of the Use and Validity of Test-Based Accountability “ asks the right questions.  The National Education Policy Center study looked at about 800 Los Angeles teachers who changed schools to see whether the different environments had an effect on their value-added.  Only 30% of those teachers stay in the same value-added evaluation category after changing  schools.

The Los Angeles VAMs were most reliable in predicting future performance when they used six years of data (dating from 2004 to 2009.)  But more than a quarter of teachers subject to data-driven evaluations only had one year or data, and the majority had three years or less.  So, Durso devised an ingenious thought experiment.  She took the six years of data from the same teachers and divided it into two three-year periods.  Same teachers, same numbers, but the value-added model using the first of their data was only 40% accurate in predicting performance in the other half of the same teachers’ data.

Durso then took the six year results and predicted ELA teachers’ value-added in the seventh year (2010.)  It compared the VAM results to what actually happened in the seventh year.   The VAM prediction was only 27% accurate in predicting the teachers’ effectiveness category.

The Los Angeles study placed teachers into quintiles while other evaluation rubrics place teachers in categories under different names.  But we must remember what that categorization means in the real world.  What does it mean, real world, when 85% of ELA teachers have scores with a margin of error so great that they could be evaluated as either “less effective,” “average,” or “ more effective?”  

Worse, the NEPC study shows that it is harder for a teacher to raise his or her value-added after being moved to a school with lower value-added. Once value-added is incorporated into evaluations, what type of teacher would commit to the toughest schools? In addition to creating incentives for teaching narrowly to the bubble-in test, value-added evaluations are bound to produce an exodus of the best teachers from the toughest schools and/or the profession. Before long, only incompetents who couldn’t find work elsewhere, saints, adrenalin junkies, and mathematical illiterates would remain in the schools where it is harder to raise test scores.

I would have hoped that economists manipulating education statistics would at least consider the concept of “rational expectations,” and the laws of supply and demand.  Who would commit to a career where there is a 10% or 15% or whatever other percent chance, PER YEAR, of the career being damaged or destroyed by circumstances beyond your control? Also missing from the work of value-added advocates are footnotes showing that they have considered qualitative research such as that of Aaron Pallas and Jennifer Jennings, or Linda Darling Hammond, Audrey Amrein-Beardsley, Edward Haertel, and Jesse Rothstein.  But, they will be reading Catherine Durso’s analysis.  Had they had the benefit of her findings before they committed to test-based accountability, I still find it hard to believe that believe that they would have even started down the high-stakes value-added road.

John Thompson taught for 18 years in the inner city.  He blogs regularly at This Week in Education, Anthony Cody’s Living in Dialogue, the Huffington Post and Schools Matter.  He is completing a book, Getting Schooled, on his experiences in the Oklahoma City Public School System. 

Editor’s Note: I’m adding this post to The Best Resources For Learning About The “Value-Added” Approach Towards Teacher Evaluation.