March 13, 2007

Schemo Responds

Courtesy of gadfly Rory, Diane Schemo responds (via email) to my criticism of her reporting of the Madison Wisconsin school district's turning down of $2 million dollars in Reading First grant money.

A. Wisconsin's average NAEP score remained the same, but that fact used alone is misleading.

Enrollment changes during that period were severe: from 1998 to 2005, Wisconsin's low-income population rose from 1 in 4 students to 1 in 3 students, and blacks were 13 percent of all students, up from 10 percent in 1998. Latinos rose from 4 to 6 percent. White student enrollment by 2005 had declined to 77 percent, from 82 percent in 1998. According to the state Department of Public Instruction, Wisconsin has the fastest growing poverty rate of any state in the nation.

If you look at the scores for black 4th graders, they rose to 194 on the 2005 NAEP, up from 187 in 1998. Scores for Latinos rose to 208 from 201 during the same period.

Though White scores remained steady at 227-228, the shrinking pool of whites meant that the increasing numbers of black and Latino students brought down the overall average, even though these two groups made significant gains..

To put the case in its most extreme terms, a severe enough exodus of white students would inevitably mean that NAEP scores would nose dive as the population shifts, until Wisconsin had succeeded in completely erasing any achievement gap, and black and Latino scores were on par with white scores--something no state, anywhere, has done, using phonics, whole language, steroids, tea leaves . . . you name it.

I can sum up this argument in one word: irrelevant.

This argument is irrelevant to the question of whether performance in Madison, and in particular the Reading First eligible schools, actually increased.

Even if black and Hispanic performance rose with respect to white performance it occurred on a state-wide basis. This tells us nothing about black and Hispanic performance in Madison. As far as we know, performance in Madison could have plummeted during this time period and black and Hispanic performance outside of Madison could have skyrocketed. The data is silent.

Now, let's take a closer look at this stunning increase in black and Hispanic performance in Wisconsin that's got Schemo giddy as a schoolgirl.

NAEP 4th Grade Reading Test -- Scale Scores

Year Black Hispanic White
1992 198 209 227
1994 196 203 227
1998 187 201 228
2003 200 209 225
2005 194 208 227

These are not the miraculous gains Schemo would have us believe. A more accurate characterization is partial recovery of lost ground lost since 1994. In any event, according to NAEP's statistical significance calculator the difference between black performance between 1998 and 2005 was not statistically significant. Oops. Moreover, the "gain" in black perfromance from the 1998 nadir to 2005 was only about 0.20 s.d. which is far less than the alleged gain of 0.77 s.d. on the Wisconsin test. Even with that small "gain" the performance level for black students still remains less than it did in 1994.

My initial allegation remains unrebutted by Schemo. NAEP 4th grade reading data for all students in the state of Wisconsin shows flat performance between 1998 and 2005. This NAEP data casts serious doubt on the miraculous state-wide student gains made in Wisconsin's own 3rd grade reading test.

Schemo seems to forget that a little thing called NCLB was passed during this period and that in response thereto many states responded by making their assessments easier in order to make it easier to comply with NCLB. There's even a name for it: the race to the bottom.

Schemo's demographic shift argument doesn't fly either. According to NAEP, the change in scores for whites, blacks and Hispanics was not statistically significant, as I indicated above. The only realistic inference we can draw is that scores remained about the same for all groups, as did the overall performance level.

B. To say that a city, with its higher concentration of poverty and minority students, performed on a rough par with the state average is not an indictment. Most cities would be quite proud to keep pace with state averages. Indeed, the reason you have a special NAEP urban assessment is because the profiles of cities are so different from states.

Madison's performance level in 2005 was close to the average performance of the rest of the state. But then again, this is almost exactly where it was back in 1998. It certainly didn't gain any ground. Perhaps Madison doesn't have the Dickens-like poverty levels that Schemo is trying to portray.

Then again, compared to the rest of the state, whites in Madison seem to slightly overperform whites state-wide and blacks and Hispanics seem to slightly underperform blacks and Hispanics state-wide. Maybe Madison has a bunch of affluent whites making up for the lagging poor blacks and Hispanics. It's no secret that balanced literacy reading programs favor the affluent who have the superior background knowledge needed to succeed in these programs.

But again, this argument by Schemo is also irrelevant because we are concerned with the performance of the Reading First eligible schools, not all of Madison's schools. And these schools, as I pointed out, significantly underperformed other schools in Madison and the rest of Wisconsin.

Let me try to explain it a different way. Something happened in the state of Wisconsin between the years 1998 and 2005 such that 3rd grade students need to perform about 0.77 s.d. better on Wisconsin's WRCT in order that they may score exactly the same on the NAEP as 4th graders performed back in 1998. This 1998 cohort scored 0.77 s.d. less on the 3rd grade test and still managed to score the same on the NAEP as the 2005 cohort with those significantly better 3rd grade scores. Apparently, something magical is going on in Wisconsin such that reading comprehension rose dramatically in 3rd grade only to completely wash away by 4th grade. As it turns out, the kids in Madison's eligible Reading First schools didn't get as much magic as the rest of the kids in Madison or Wisconsin because their third grade scores didn't magically rise by as much as the kids in these other schools.

So if the object of Madison's fancy home-brewed reading program was to raise the scores of the kids in the eligible Reading First school relative to the better performing schools, we know that didn't happen because these kids now perform worse relative to these other kids in the third grade tests and the 4th graders in the better performing schools still perform exactly the same as they did in 1998.

No matter how you slice it, the performance of Madison's eligible Reading First schools did not improve relative to other schools in Wisconsin.

C. Also, Madison's efforts were part of a statewide drive to improve reading scores, so keeping pace with the rest of the state is, again, not an indictment of balanced literacy or any other single approach. To conclude that Madison's approach was unsuccessful, you'd have to compare districts across the state by their method of instruction, enrollment features and test score gains.

Another losing argument.

According to NAEP, this state-wide initiative failed miserably. Performance for all groups is not statistically different than it was back in 1998. And based on the 3rd grade test, the relative performance of Madison's eligible reading first schools declined. So, yes, it is clear: Madison's approach has been unsuccessful because it has failed to increase student performance.

And just so we're clear exactly just how miserable reading scores are in Wisconsin: 2/3 of black kids, 1/2 of Hispanic kids, and 1/4 of white kids performed below the basic level on the NAEP in 2005 just like they did back in 1998. These are the kind of scores that has the public calling for reform. And here we have Schemo trying to spin these scores as some kind of success story.

D. Finally, the blogger uses statistical sleight of hand when he wants to discuss the achievement gap, switching the time frame back to 1992. But the ground zero we were counting from was 1998, when Wisconsin apparently reacted to its eroding performance and started a statewide drive to improve early reading. And if you look at those figures (cited above), the gap between African-Americans, Latinos and whites shrank.

Let's measure the achievement gap from the table above

Year Achievement Gap
1992 29%
1994 31%
1998 41%
2003 25%
2005 33%

The mean achievement gap from 1992-2005 is 31.8% and the 2005 level is still above the mean. About the best we can say is that Wisconsin has almost gotten back to 1994 levels.

But here's the important thing, reading scores during this period have remained flat. If the object is to improve reading scores for all kids, and especially minorities, Wisconsin has failed at this task for 13 years.

My arguments remain unrebutted.

Schemo has committed journalistic malpractice. The article was supposed to be honest reporting of facts. It is one thing to report that Madison's reading program for its eligible Reading First schools has been successful based on Wisconsin's 3rd grade test. But it is malpractice not to also have reported that:
  1. Wisconsin's scores on the same test actually improved by a greater amount state-wide than did Madison's eligible Reading First schools, and
  2. Wisconsin's scores have remained flat based on NAEP.
Both of these facts cast serious doubt on the Madison school district's contention that their reading program has been successful. A serious reporter would have asked for better data or would have challenged the district to test a random sample of students on a generally accepted measure of reading ability. Schemo is not this reporter.

Update: For those of you with weak statistics skills, the fact that there is not NAEP data for Madison is not dispositive. We have state level data for both assessments. The state level data shows that the mean shifted 0.77 standard deviations for Wisconsin's test and didn't shift at all for the NAEP for the 1998-2005 period in question. Madison's shift in mean scores on the Wisconsin test parallels that of the state-wide shift. As such, Madison's expected performance gain on the NAEP will be close to zero, the same as the state-wide performance gains.

The other suspicious factor is the magnitude of the shift in the Wisconsin assessment. A 0.77 sd shift is considered to be a large effect size. Such large shifts are all but unheard of in education. We have found Lake Wobegone, kids, it is the entire state of Wisconsin.

We'd expect some statistical noise due to the differences in cohorts and the like, but we don't expect to see systematic error between tests like these. Like I said at the outset, this back of the envelope analysis raises serious doubts as to the accuracy of Madison's claims of performance gains sufficient that the yoke is on Madison to back up their contentions with data from an assessment that is known not to have shifted during this period, such as the CTBS, ITBS, SAT-9, and the like. Why isn't that data forthcoming?


TurbineGuy said...

I am rather proud of myself. I came pretty close to the same arguments myself in the follow up comment to the last post.

Dan Willingham said...

Thanks for a thoughtful analysis of these data. I was also frustrated by the author's treatment of the science behind phonics vs. whole-language. She made it seem as though the question of effectiveness were really up for grabs, which played nicely into the article's theme--federal officials force locals-on-the-ground to follow their inferior program (likely to line federal officials pockets). If the you admit that the science overwhelmingly supports the feds, the story doesn't have quite the same zing.

CrypticLife said...

I see the whole TAWL discussion arose incident to a planned debate on edspresso, and the TAWLer's all seem eager to support their champion. I think the debate is DI vs. Whole language (sort of a partial comparison, given that DI also has mathematical components. Of course, if they went to trying to defend constructivist math too. . . ).

I'm not sure how much we can help -- you have the research, at this point you have the data and are pretty good at wielding rhetoric and argument.

Personally, my psychological training (college/some grad school) was mostly under the tutelage of a fairly strict Skinnerian behaviorist. It heartened me that Wes Becker was so tied in to DI's founding. If you need any help on the citations for psychological data, I still have a lot of my old texts. I don't know how much it will help given that TAWLers seem to give no credence to data anyway. I'm looking forward to the debate.

Schemo has something of a history of supporting whole language. After doing a search on some articles last night, I seriously doubt she was an entirely innocent dupe.

Anonymous said...

Perhaps the University of Wisconsin, Madison's hometown should test these schemes on rats before imposing them on children from vulnerable groups.
Education experimentalists should (at least) have to obtain informed consent from parents of those children, just as would be required were those kids enrolled in trials of an experimental cognition-enhancing drug.

CrypticLife said...

Regrettably, they're not "experimenting" with the children in the schools. Though this is often thrown as a criticism, the truth is far worse. Experimentation would imply that they're keeping careful track of the data, identifying and controlling for independent variables, and using a control group. It would imply they were being thoughtful about their interpretation of data.

No, the truth of the matter is that they're flailing about, implementing measures without any thought or plan, without knowing which worked and which did not, and with preconceived interpretations guiding the measurement of data.

If they were experimenting, we could hope that it would gradually become better.