Thursday, September 10, 2009

They're doing it again, part 2

More on the NYC teacher "value-added" reports.  In the previous post, I pondered how they managed to come up with a way of creating a percentile rank out of fuzzy data points.  So taken was I with the NYC Dept of Ed's statistical gimmickry that I forgot to note just how fuzzy those data points are. 

Changes in students' standardized test scores from year-to-year are mostly random.   The phenomenon is well known in the psychometric literature. Every test score is a fuzzy measure of a student's true ability, so when you subtract one fuzzy measure from another, you mostly end up with fuzz.  Because the measuring stick is so imprecise, it's almost impossible to say with any confidence whether one student gained more than another over the course of a year.  It's like measuring grains of sand with a household ruler.

Now if there were a few particularly large chunks of sand, you could confidently say they were bigger than the others, but for the most part, all the grains of sand would be indistinguishable as far as you could tell with your household ruler.

Quantity would help.  If you had 2 groups of lots (says, million of grains) of sand, and you knew how many grains were in each group, you could confidently measure the average size of the grains in each group by putting them in a big beaker of water, using your household ruler to measure how far the water rose, and calculating the average displaced volume per grain.  (Well, you'd have to go look up some formulas in your kid's 8th grade math book first, but it could be done, in theory.)

So, the big question when it comes to gains NYC student test scores is: how many are needed to get a good read?  Evidently, as evidenced here and here, a whole schoolful of kids is not enough.  These posts show that there is almost no correlation between school-level average test score gains from one year to the next.  Now, we know Klein loves shaking things up, but this suggests a degree of chaos in schools that is hard to believe.  The more plausible explanation for the lack of correlation between years is that average gains, at the school level, are mostly fluff.  It appears that most schools do have enough kids to provide an accurate measure.

I suspect, though I'm not sure, that most NYC teachers teach far fewer than a whole schoolful of kids.  So if "progress" cannot be accurately measured for entire schools, what does this tell us about the accuracy of progress measured for teachers?

My guess is that the weird confidence intervals NYC DOE gives for teachers' percentile ranks (and I am truly curious how they get those) would be much much larger if they accounted for the imprecision in the measurement of teacher average test score gains.

No comments:

Post a Comment