Sunday, September 16, 2012

Problems with the use of student test scores to evaluate teachers (the Economic Policy Institute)

By Richard Rothstein, Helen F. Ladd, Diane Ravitch, Eva L. Baker, Paul E. Barton, Linda Darling-Hammond, Edward Haertel, Robert L. Linn, Richard J. Shavelson, and Lorrie A. Shepard | August 27, 2010

Conclusions and Recommendations

“...We began by noting that some advocates of using student test scores for teacher evaluation believe that doing so will make it easier to dismiss ineffective teachers. However, because of the broad agreement by technical experts that student test scores alone are not a sufficiently reliable or valid indicator of teacher effectiveness, any school district that bases a teacher’s dismissal on her students’ test scores is likely to face the prospect of drawn-out and expensive arbitration and/or litigation in which experts will be called to testify, making the district unlikely to prevail. The problem that advocates had hoped to solve will remain, and could perhaps be exacerbated.

“There is simply no shortcut to the identification and removal of ineffective teachers. It must surely be done, but such actions will unlikely be successful if they are based on over-reliance on student test scores whose flaws can so easily provide the basis for successful challenges to any personnel action. Districts seeking to remove ineffective teachers must invest the time and resources in a comprehensive approach to evaluation that incorporates concrete steps for the improvement of teacher performance based on professional standards of instructional practice, and unambiguous evidence for dismissal, if improvements do not occur.

“Some policy makers, acknowledging the inability fairly to identify effective or ineffective teachers by their students’ test scores, have suggested that low test scores (or value-added estimates) should be a “trigger” that invites further investigation. Although this approach seems to allow for multiple means of evaluation, in reality 100% of the weight in the trigger is test scores. Thus, all the incentives to distort instruction will be preserved to avoid identification by the trigger, and other means of evaluation will enter the system only after it is too late to avoid these distortions.

“While those who evaluate teachers could take student test scores over time into account, they should be fully aware of their limitations, and such scores should be only one element among many considered in teacher profiles. Some states are now considering plans that would give as much as 50% of the weight in teacher evaluation and compensation decisions to scores on existing poor-quality tests of basic skills in math and reading. Based on the evidence we have reviewed above, we consider this unwise. If the quality, coverage, and design of standardized tests were to improve, some concerns would be addressed, but the serious problems of attribution and nonrandom assignment of students, as well as the practical problems described above, would still argue for serious limits on the use of test scores for teacher evaluation.

“Although some advocates argue that admittedly flawed value-added measures are preferred to existing cumbersome measures for identifying, remediating, or dismissing ineffective teachers, this argument creates a false dichotomy. It implies there are only two options for evaluating teachers—the ineffectual current system or the deeply flawed test-based system.

“Yet there are many alternatives that should be the subject of experiments. The Department of Education should actively encourage states to experiment with a range of approaches that differ in the ways in which they evaluate teacher practice and examine teachers’ contributions to student learning. These experiments should all be fully evaluated.

“There is no perfect way to evaluate teachers. However, progress has been made over the last two decades in developing standards-based evaluations of teaching practice, and research has found that the use of such evaluations by some districts has not only provided more useful evidence about teaching practice, but has also been associated with student achievement gains and has helped teachers improve their practice and effectiveness.61 Structured performance assessments of teachers like those offered by the National Board for Professional Teaching Standards and the beginning teacher assessment systems in Connecticut and California have also been found to predict teacher’s effectiveness on value-added measures and to support teacher learning.62

“These systems for observing teachers’ classroom practice are based on professional teaching standards grounded in research on teaching and learning. They use systematic observation protocols with well-developed, research-based criteria to examine teaching, including observations or videotapes of classroom practice, teacher interviews, and artifacts such as lesson plans, assignments, and samples of student work. Quite often, these approaches incorporate several ways of looking at student learning over time in relation to the teacher’s instruction.

“Evaluation by competent supervisors and peers, employing such approaches, should form the foundation of teacher evaluation systems, with a supplemental role played by multiple measures of student learning gains that, where appropriate, should include test scores. Given the importance of teachers’ collective efforts to improve overall student achievement in a school, an additional component of documenting practice and outcomes should focus on the effectiveness of teacher participation in teams and the contributions they make to school-wide improvement, through work in curriculum development, sharing practices and materials, peer coaching and reciprocal observation, and collegial work with students.

“In some districts, peer assistance and review programs—using standards-based evaluations that incorporate evidence of student learning, supported by expert teachers who can offer intensive assistance, and panels of administrators and teachers that oversee personnel decisions—have been successful in coaching teachers, identifying teachers for intervention, providing them assistance, and efficiently counseling out those who do not improve.63 In others, comprehensive systems have been developed for examining teacher performance in concert with evidence about outcomes for purposes of personnel decision making and compensation.64

“Given the range of measures currently available for teacher evaluation, and the need for research about their effective implementation and consequences, legislatures should avoid imposing mandated solutions to the complex problem of identifying more and less effective teachers. School districts should be given freedom to experiment, and professional organizations should assume greater responsibility for developing standards of evaluation that districts can use. Such work, which must be performed by professional experts, should not be pre-empted by political institutions acting without evidence. The rule followed by any reformer of public schools should be: “First, do no harm.”

“As is the case in every profession that requires complex practice and judgments, precision and perfection in the evaluation of teachers will never be possible. Evaluators may find it useful to take student test score information into account in their evaluations of teachers, provided such information is embedded in a more comprehensive approach. What is now necessary is a comprehensive system that gives teachers the guidance and feedback, supportive leadership, and working conditions to improve their performance, and that permits schools to remove persistently ineffective teachers without distorting the entire instructional program by imposing a flawed system of standardized quantification of teacher quality.”

61.  Milanowski, Kimball, and White 2004.

62.  See for example, Bond et al. 2000; Cavaluzzo 2004; Goldhaber and Anthony 2004; Smith et al. 2005; Vandevoort, Amrein-Beardsley, and Berliner 2004; Wilson and Hallam 2006.

63.  Darling-Hammond 2009; Van Lier 2008.

64.  Denver’s Pro-comp system, Arizona’s Career Ladder, and the Teacher Advancement Program are illustrative. See for example, Solomon et al. 2007; Packard and Dereshiwsky 1991.

From the Economic Policy Institute:

No comments:

Post a Comment