March 30, 2007

There must be a macro

I don't think a week goes by that I don't read some version of this story in today's Daily Mail:

Schools should teach children the key skills they need for life - like walking and thinking - not set subjects such as history or French, teachers' leaders have said.

The Association of Teachers and Lecturers called for the National Curriculum to be torn up and the testing system abolished.

The union said teachers in local schools should be able to adapt lessons to fit a new framework focusing on important skills for life, rather than academic subjects.

Notice how the two-- no testing and teaching only thinking skills-- go hand in hand.

It actually would be a pretty cool idea. Instead of teaching subject matter content, schools would only have to teach magical thinking skills. And, with those magical thinking skills, students would be able to derive the subject matter on their own. Imagine how much less teaching and school that would involve. Too bad it's impossible.

We don't learn thinking skills in isolation. People's knowledge is tied to specific domains. If you don't believe me just go down and ask your local chess master to take out your appendix and you'll see what I mean. Or better yet, have him fix your toilet or argue your case in court or build you a particle accelerator.

Of course, the nutters who always propose these wacky schemes know this, because they always tie the teaching of thinking skills to abolishing the tests that would demonstrate that students have acquired the thinking skills.

March 29, 2007

And then came Dick and Jane

Lileks has a good post quoting some factoids from the “A Manual of Radio News Writing” written in 1947.

The book describes the Average Listener, and it’s an interesting snapshot of American culture in 1947:

“His formal education stopped somewhere between the end of grammar school and the second year of high school.”

The average person – or, more accurately, the average radio news consumer – did not finish high school. Interesting.

“In general, he reads slowly, leisurely, and not too widely or deeply.”

But he reads.

“Newspaper reading is an ingrained habit.”

Wow. To repeat: the average radio news consumer is a high-school drop out who’s also a habitual newspaper reader.


What the hell happened to today's high school grads?

As I predicted ...

Dick Allington is out whoring the WWC's report on Reading Recovery. In this Edweek article, he tries to counter Jack Fletcher's argument that much of the "research" for Reading Recovery is based on the bogus Clay Observation Survey, a measurement instrument developed by Reading Recovery, is biased in favor of Reading Recovery, is prone to reporting bias and confirmation bias, and doesn't measure skills associated with reading natural text:

"I don’t think [the Clay Observation Scale is] any more of a concern than using DIBELS,” he said, referring to the Dynamic Indicators of Basic Early Literacy Skills, a test that is widely perceived to be the Bush administration’s favored measure for gauging students’ reading progress under Reading First. That test was devised by Reading First consultants and is being used to tout the federal initiative’s success.

Notice how Edweek carries Allington's water here by writing the lurid innuendo so Allington doesn't have to. Also notice how Allington never denies that the Clay Survey is an accurate measure of reading. A sort of non-denial denial.

Of course, DIBELS has more valid research showing its validity as a testing measure than does the Clay Survey. In particular the Oral Reading Fluency test being used as one measure in Reading First does correlate highly with real reading ability. To the extent it is biased, DIBELS favors readers who are good decoders. In contrast, the Clay Survey is biased in favor of readers who score well on reading predictive text, in particular predictive text that the student's have been repeatedly reviewed in the Reading Recovery program, i.e., readers who can't necessarily read non-predictive text which hasn't been used in the Reading Recovery program.

Than Allington makes this nonsensical comment:

The question now is are we going to take all the interventions off the Reading First Web sites that don’t meet the What Works criteria. I don’t have a lot of confidence that anyone in Washington actually cares about the evidence.

First of all, Reading First only requires reading programs that are "based on" SBRR and is not limited to validated programs. Second, Reading Recovery still has the sticky problem of not providing systematic and explicit instruction in phonics which is a real statutory requirement.

Then the Edweek article morphs into a poorly written Reading First hit piece which I won't waste time debunking. Then we get more water carrying:

Critics have also noted that most of the studies were conducted by researchers affiliated with Reading Recovery, which is not unusual among the studies the clearinghouse reviews.

Well, yeah, but the problem with this in Reading Recovery's case is with the exception of one independent study, all the positive research has been conducted by Reading Recovery researcher and, more importantly, that research contained serious methodological flaws. Namely, the Reading Recovery researchers systematically excluded students who failed to make progress in the program. 1/3 of the students; a significant portion. This isn't scientific research; it's junk science.

March 27, 2007

Curriculum-Development Group Urges Focus Shift to Whole Child

What else would you expect from a group calling itself the Association for Supervision and Curriculum Development’s Commission on the Whole Child?

The definition of a successful student has to change from one whose achievement is measured solely on the basis of test scores to one who is healthy, emotionally and physically inspired, engaged in the arts, and prepared for employment in a global economy, a report says.

Is there another valid way to measure success besides test scores? I'm sure it's something squishy and subjective. And useless.

[T]he report, released this month, says educational practice and policy today are concentrated overwhelmingly on testing gains.

The reason why education practice and policy are "concentrated overwhelmingly on testing gains" is because when we started looking at such things we noticed that schools weren't doing a very good job at this, their primary charge.

But academic achievement cannot happen without significant emphasis on other factors, including student engagement, personalized learning, and skilled and caring teachers, it adds.

And they know this how? I would think the best way to achieve academic achievement is to focus on academic achievement. One good way to do that is to get feedback from tests of academic achievement.

Abd, er, you're not going to be too successful with the student engagement thingy unless you get the academic achievement thing right in the first place.

“The current focus on accountability has shifted focus away from whole-child education,” said Judy Seltz, the deputy executive director of the Alexandria, Va.-based ASCD, which works to identify and share sound policy and best practices in education.

“We need to rethink what education of the whole child means and make sure every student has access to a rich and challenging curriculum that pays attention to other aspects,” she added, pointing out that research shows students who feel connected to their community tend to do better academically.


That's the good thing about our focus on accountability; it flushes out the chumps, like Judy Schultz, who are reduced to releasing reports that no one takes seriously.

March 26, 2007

Round up the anti-testing nutters

WaPo rounds up all the anti-testing nutters to give their opinions in Putting assessments to the test. Conspicuously absent is anyone defending the opposing viewpoint. Not exactly fair and balanced.

Then we get a related story on the DIBELS test in which we get more slanted views. In this case, we get the comically slanted criticism of Ken Goodman, the inventor of whole language:

It is an absurd set of silly little one-minute tests that never get close to measuring what reading is really about -- making sense of print

Not quite as silly as Goodman's whole language program which encourages kids to guess at words instead of decoding them. Naturally, many kids taught via Goodman's method turn out to be poor decoders and poor readers who perform poorly on such tests. Goodman's kids also don't make much sense out of print either. And, of course, DIBELS does contain a reading comprehension test and an oral fluency test which correlates highly with student's ability to make sense of print.

WaPo is really giving NYT a run for its money for the goofiest education articles as of late.

March 25, 2007

More NYT on Reading First

Diane Jean Schemo doesn't know how to read a statute. There's no other explanation for this paragraph in this NYT article:

Reading First has come under heavy fire in Congress and elsewhere. Previous audits of the program, and some local school officials, said the department had used the law to promote reading programs with a heavy reliance on phonics, which focuses on the mechanics of sounding out syllables, rather than methods emphasizing additional strategies for making sense of texts. The House and the Senate are planning hearings.

There's a very good reason why ED forced schools to adopt reading programs "with a heavy reliance on phonics, which focuses on the mechanics of sounding out syllables." That's what the law required. Follow along closely.

Section 1202(c)(7)(A) of the Reading First statute states:

an eligible local educational agency that receives a subgrant under this subsection shall use the funds provided under the subgrant to carry out the following activities:

...

(ii) Selecting and implementing a learning system or program of reading instruction based on scientifically based reading research that— (I) includes the essential components of reading instruction

The "essential components of reading instruction" is defined in section 1208(3) as:

The term ‘essential components of reading instruction’ means explicit and systematic instruction in—
(A) phonemic awareness;
(B) phonics;
(C) vocabulary development;
(D) reading fluency, including oral reading skills; and
(E) reading comprehension strategies.

I've emphasized the word "phonics" so you don't miss it.

let me translate the legalese into English for you. In order to get a Reading First grant, a school must select and implement a reading program that includes the explicit and systematic instruction in phonics, among other things. Notice how the statute says "phonics" and not "methods emphasizing additional strategies for making sense of texts" as Schemo suggests.

It was ED's job to make sure that schools followed this law. But, according to Schemo, when ED enforced to law they "used the law to promote reading programs with a heavy reliance on phonics."

Using Schemo's twisted logic ED would have lost no matter what they did. Had ED allowed reading programs "emphasizing additional strategies for making sense of texts" it would have clearly violated the law. And, if ED only permitted phonics based programs, as it did, it was illegally promoting phonics based reading programs.

Pathetic.

March 23, 2007

I've Updated the Blogroll

I've added a bunch of edu-blogs to the blogroll on the side.

Go check them out.

They've all withstood the test of time, i.e., I am tired of googling them every time I want to read them.

I stubbornly refuse to adopt modern technology and use a real feed reader.

March 22, 2007

All smoke, no fire

Title I monitor has a good article on the Reading First scandal that you should read.

All you need to know about this so-called scandal can be summed up by looking at the contortions OIG went through to establish a "conflict of interest."

The legal issue, however, is complicated. The RMC Research Corporation of Portsmouth, New Hampshire operated three contracts — totaling nearly $40 million — to provide technical assistance to states and districts on Reading First. Its contract with ED contained boilerplate federal conflict-of-interest language designed to prevent “the existence of conflicting roles that might bias a contractor’s judgment” and stave off an “unfair competitive advantage.”

Got that? RMC's contract contained standard conflict-of-interest (COI) boilerplate.

But when RMC later subcontracted the actual operations to three regional centers — at the University of Texas, the University of Oregon, and Florida State University — the contracts did not contain the conflict-of-interest clause. The clause also was absent in consulting agreements between RMC and its technical assistance providers. As a result, the OIG said, “they may not have disclosed any actual or potential” conflicts of interest.

RMC's failing to put the COI boilerplate in its subcontracts is a technical infraction at best. The question OIG was supposed to be investigating was whether the technical assistance providers had actual conflicts. Apparently, the answer to that question was "no they didn't" or else we would have gotten a detailed accounting of all the actual conflicts from OIG. Instead, we got the statement that the technical assistance providers "may not have disclosed any actual or potential" conflicts of interest. That's something a senior auditor tells a junior auditor before telling the junior auditor to go out and find conflicts. Go get 'em, son.

Apparently, the junior auditor failed to find anything useful using this standard, so the OIG switched gears to salvage something.

The conflict of interest standard is much more clear-cut, and at the same time, more limited, than the OIG’s suggested standard of “bias or impaired objectivity.”

What's up with that? Since when do auditor's get to suggest their own standards instead of following the standard set down by law. Love 'em or hate 'em, we elect politicians to make laws and standards like this, not unelected OIG auditors who are supposed to be looking for real live legal violations instead of fabricating their own based on their own made-up standards. Title I explains the problem with OIG's made-up standard.

A conflict-of-interest standard would, at the very least, suggest that someone providing technical assistance for Reading First not have a connection to reading programs for students in kindergarten through the third grade, the program’s constituency. But a technical assistance provider who has designed a McGraw-Hill math product, to use a hypothetical example, while perhaps not having a direct conflict of interest in recommending against a Harcourt reading program, might have “an appearance of bias or impaired objectivity” in connection to any McGraw-Hill product.

See? Under the real COI standard OIG couldn't find any real or potential conflicts. So they made up their own COI standard to fit the the "violation" to the facts. This is how the OIG found violations by finding a technical assistance provider who's written a college level textbook for a publisher who also publishes textbooks eligible for Reading First funds. And, even under this contrived standard, OIG was only able to find “an appearance of bias or impaired objectivity” as opposed to actual bias or actual impaired objectivity. Even OIG admits this:

The OIG acknowledged there “is no federal requirement that contractors, subcontractors or consultants be vetted for bias or impaired objectivity” but said that not having one damaged the “integrity and reputation” of RMC and the department.

So what we're left with is no actual legal violations, but OIG's opinion that DoE should have gone beyond the requirements of the law and should have adopted OIG's made-up COI standard because not to have done so damaged the “integrity and reputation” of DoE.

I suppose there exists a forum for OIG to give their opinions somewhere, but one place that I'm certain those opinions do not belong is in an audit. There is no room for judgments or opinions in an audit. Only clear violations of law, which in this case there were not, are supposed to go into OIG audits.

Are all OIG's as incompetent as DoE's OIG?

March 20, 2007

Reading Recovery gets the WWC Treatment

The What Works Clearinghouse issued a report yesterday on the whole-language uber-expensive tutoring program Reading Recovery.

Let me quickly summarize the good news for Reading Recovery: if you allow the developers of Reading Recovery to research their own program enough times, allow them to collect their own data, and allow them to use their own non-standard measure to gauge efficacy,then you might be able to show positive results if you're willing to live with serious methodological flaws in the research.

But, the bad news is pretty devastating: The only study conducted by independent researchers found that if you add an explicit systematic phonics component to Reading Recovery, you get results that are 37% better.

As usual, the WWC report shows that the state of education research is execrable. The WWC reviewed 78 studies. Only four met the WWC's standards and one met with reservations. That means that 73 didn't meet the WWC's standards. That's a 6.4% success rate. This doesn't necessarily reflect badly on Reading Recovery, but a fair amount of that bad research was conducted by Reading Recovery affiliated researchers.

Many of the positive findings were the result of using non-standard assessments, such as the Reading Recovery created Observation Survey of early Literacy Achievement, which are biased in favor of Reading Recovery and use "predictable text, rather than text that uses authentic, natural language patterns. Children who have learned the prediction strategies of Reading Recovery will score better reading predictable text than they will reading authentic text."

Stanovich and Stanovich (1995) report that many studies have found that authentic text is not very predictable:

It is often incorrectly assumed that predicting upcoming words in sentences is a relatively easy and highly accurate activity. Actually, many different empirical studies have indicated that naturalistic text is not that predictable. Alford (1980) found that for a set of moderately long expository passages of text, subjects needed an average of more than four guesses to correctly anticipate upcoming words in the passage (the method of scoring actually makes this a considerable underestimate). Across a variety of subject populations and texts, a reader's probability of predicting the next word in a passage is usually between .20 and .35 (Aborn, Rubenstein, & Sterling, 1959; Gough, 1983; Miller & Coleman, 1967; Perfetti, Goldman, & Hogaboam, 1979; Rubenstein & Aborn, 1958). Indeed, as Gough (1983) has shown, the figure is highest for function words, and is often quite low for the very words in the passage that carry the most information content." (p. 90)

If authentic text is not very predictable, then children who read well in predictable text may not necessarily read well in authentic text. The strategies they have learned for reading may not generalize to real reading. So, much of the positive findings for Reading Recovery do not pertain to what is considered to be real reading.


Then we have the inconvenient problem that three of the studies meeting WWC studies were conducted by researchers affiliated with Reading Recovery. Note the researcher names in the following studies:

  • Pinnell, G. S., Lyons, C. A., DeFord, D. E., Bryk, A. S., & Seltzer, M. (1994). Comparing instructional models for the literacy education of high-risk first graders. Reading Research Quarterly, 29(1), 9-38.
  • Pinnell, G. S., Lyons, C. A., & DeFord, D. E. (1988). Reading Recovery: Early intervention for at-risk first graders. Arlington, VA: Educational Research Service
  • Schwartz, R. M. (2005). Literacy learning of at-risk first-grade students in the Reading Recovery early intervention. Journal of Educational Psychology, 97(2), 257–267.

Gay Sue Pinnell, Diane Deford, and Carol A. Lyons are directors of the National Reading Recovery Center at Ohio State in the U.S. Bob Schwartz is a Reading Recovery Trainer from Oakland University.

Having your own researchers conduct experiments isn't necessarily fatal. It is cause for concern due to the issue of apparent bias and grounds for scrutiny. That scrutiny was forthcoming and the research was found wanting.

Shanahan and Barr reviewed the pre-1995 Reading Recovery studies and noted that all the studies contained serious methodological problems: "We found no studies of Reading Recovery that did not suffer from serious methodological or reporting flaws-published or not." (1995, p. 961) Shanahan and Barr identified three types of problems in the Reading Recovery pre-post design, which would lead to exaggerated success rates:

[The reported learning gain] most certainly is an overestimate of typical amounts of learning from Reading Recovery for several reasons: (a) test score improvements not linked to learning are likely to occur when students with extreme scores are selected for participation; (b) normal development and learning gains typical of young children can be due to other sources of growth and education; and (c) there is systematic omission of children who are not having success in Reading Recovery. (p. 969)

Worse still, is the systematic omission of data in the Reading Recovery affiliated research because among those omitted are children the Reading Recovery teachers identify as ones who are not progressing well. Children who are not successful are intentionally dropped before completing the entire program. The reports then do not reflect how well Reading Recovery serves the entire population it claims to serve, nor do they provide information regarding overall class effects or school effects. Consequently, the success rates cannot be used to evaluate the effectiveness of Reading Recovery.

Probably the most serious flaw in Reading Recovery research has to do with who is included in the experimental sample. In some analyses, only discontinued students were examined, making the program appear more effective than it really is. In most of the studies, students were omitted from analysis because of serious learning problems, poor school attendance, or other similar difficulties. These omissions were often made without mention. It is impossible to provide a valid estimate of the effects of Reading Recovery unless all children who start the program are included in the eventual analysis….Unfortunately, even two of the more sophisticated studies (Center, Wheldall, Freeman, Outhred, & McNaught, 1995; Pinnell, Lyons, DeFord, Bryk, & Seltzer, 1994) that we analysed have lost as much as half of their data, without any empirical estimate or control of the effects of these missing data. (p. 991-2)

The Ohio State programs have routinely collected information on those who are
dropped for various reasons, but these data have not been taken account of in their studies or technical reports, nor have they been available to the public. Depending on the proportion of participants omitted in this fashion, this creates a substantial bias in favor of Reading Recovery gains, and there is no sound way to adjust the scores that are reported on this basis." (Shanahan & Barr, 1995, p. 966)


See READING RECOVERY: AN EVALUATION OF BENEFITS AND COSTS for a much more thorough discussion of these methodological flaws in the Reading recovery research conducted by Reading Recovery affiliated researchers.

So what are we left with? Three of the studies were conducted by Reading Recovery affiliated researchers. At least two of these studies contained serious methodological flaws. One of the independent studies showed no effects using the Reading Recovery intervention. And, the final study, Iversen, S., & Tunmer, W. (1993). Phonological Processing Skills and the Reading Recovery Program. Journal of Educational Psychology, 85(1), 112-126, showed that Reading Recovery was 37% more successful when a systematic explicit phonics portion was added. And, let's not forget that most of the measurement devices were non-standard devices created by the Reading Recovery people and don't generalize to the reading of non-predictable text.

Doesn't seem like there's much left at all. Keep that in mind when you read the inevitable whoring of this report by tricky Dick Allington, the Dick Van Patten of bad education research.

March 19, 2007

More on Wisconsin's Reading First Scandal

The Prof at Right Wing Nation has my back on Wisconsin's Reading First Scandal.

From these data, we cannot say that the Reading First districts did or did not increase their proficiencies. There just aren't enough data. But — and this is crucial — neither can the Reading First districts claim that they raised their proficiencies using these data. The only way we can determine whether these school systems did or did not raise their proficiencies is by analyzing the raw data, and not the aggregates by school district. In other words, Ken was right, and the journalist was wrong.

I like that last sentence so much I'm going to repeat it.

In other words, Ken was right, and the journalist was wrong.

I did a very quick initial analysis by assuming both distributions were essentially normal so I wouldn't have to fix the improperly formatted data set. RWP, an expert in data analysis, did the hard word of reformatting the data set and did a complete analysis. I re-analyzed the data and came up with the same numbers as RWP. Time permitting, I'll update my figures with the re-analyzed numbers. The effect sizes will change slightly, but my conclusions will still hold.

The important thing to remember is that RWP's analysis includes the following assumption.

We are assuming here either that the proficiency exam standards did not change between the two years or that the proficiency reports for the two years are comparable (if they are not, then Wisconsin cannot make any statement about their proficiency levels over time).

As the NAEP data clearly shows, the Wisconsin's proficiency exam standards did change between 1998 and 2005. NAEP scores declined slightly, while the Wisconsin scores magically skyrocketed. Suspiciously so.

So, when RWP says that Wisconsin and Madison can validly claim that scores have increased, this conclusion only holds if Wisconsin's proficiency exam standards didn't change between 1998 and 2005. And, as we know from NAEP scores, they did.

RWP's analysis reveals an importnt factoid.

So far, everything looks positive, until we look at the skewness. Go back to that bell curve water balloon. If you grab the tail on the right and pull it further to the right, more of the water will spill into that tail, right? That's what we call a right-skewed curve, and it has a positive skewness factor. If you pull the left tail out, more water spills into that left tail, and we have a left-skewed curve, with a negative skewness factor. When we look at the skewness for the two years, both are left-skewed — that is, in both years, there are more data in the left tails (less than proficient) — but the 04-05 curve is more left-skewed than 98-99 (-1.86 and -0.92, respectively). So even though it does look like Wisconsin may have improved the reading proficiency between the two years, they also slightly increased those who were less than proficient (if this seems like a paradox to you, think of the water balloon again, and all will be made clear).

Even though Wisconsin's scores magically skyrocketed, the number if students who were less than proficient increased slightly. Kids at top learn more, but the kids at the bottom continue to struggle. These would be the kids that Reading First aimed to serve.

I think we can finally put this one to bed, unless Madison officials decide to make more outlandish claims.

March 15, 2007

medieval barbers

Teacher TMAO turns in another great post on teacher autonomy:

As a teacher I should get to teach whatever I want because I know best and no one is allowed to determine if what I taught or how I taught it was important, valid, or successful because I earned tenure in the name of academic freedom.

That's the argument our profession makes. It's weak sauce.
Check it out.

The Big Guns Pound Schemo

The story so far ...

Diane Jean Schemo writes a highly biased, misinformed, and irresponsible article insinuating that the Madison Wisconsin Metropolitan school district was bullied into dropping their "balanced literacy" program and adopting a phonics-based reading program, refused, and allegedly have experienced great success with their own program.

Mark Liberman, of Language Log, discussed the article here and included a letter from Mark Seidenberg,Professor, Psychology and Cognitive Neuroscience, University of Wisconsin-Madison, also critical of the Schemo article.

I pointed out some of the flaws in this argument here. Basically, Wisconsin made its third grade reading test easier during the period tested and this accounted for all of Madison's gains rather than real improvement in reading ability.

Schemo responded to my criticisms in an email and I deconstructed those arguments here.

Now the big guns in early reading instruction are responding to Schemo's article.

  • Reid Lyon, former Chief of the Child Development and Behavior Branch within the National Institute of Child Health and Human Development (NICHD) at the National Institute of Health (NIH),
  • Robert Sweet, Former Professional Staff Member Committee on Education and the Workforce U.S. House of Representatives,
  • Louisa Moats, Ed.D. Formerly co-investigator of the NICHD Early Interventions Project,
  • Linnea Ehri, Ph.D., Distinguished Professor Program in Educational Psychology CUNY Graduate Center and Joanna Williams, Ph.D. Professor, Program in Cognitive Studies in Education Teachers College, Columbia University, and
  • Timothy Shanahan, Professor of Urban Education, University of Illinois at Chicago President, International Reading Association

A common theme emerges:

The at-risk students that Reading First aims to serve are underperforming compared to more advantaged students by Wisconsin's own measures

"However, the poor, disadvantaged children for whom RF was specifically targeted lost out. The State of Wisconsin’s own statistics tell the story. In 2005, forty five percent of African Americans in Madison schools are in the lowest two categories of reading ability. In another State assessment corroborating this fact forty six percent of third grade African American students scored below grade level compared to nine percent of white students."

Richard Allington, Schemo's "expert," is a buffoon

"Richard Allington, who was quoted in opposition to Reading First, has no credentials as a researcher or scientist. He and the "reading community" to which he refers have perpetuated myths and ineffective practices associated with Whole Language for decades – and look at what those have brought us."

There is no scientific basis for Madison's reading program


"Like the issue of global warming, there is no scientific debate about whether children benefit from direct instruction in how the alphabetic code of English represents speech. There is, in contrast, plenty of evidence that teaching children to guess at words through context and pictures is, indeed, malpractice, and that most poor readers fall by the wayside early because no one is teaching them how to read."

The National Reading Panel's meta-study has not been rejected by the reading community, only by the nutters

"The Panel's report became the basis of Reading First, and has been cited hundreds of times in scholarly research journals, endorsed by the International Reading Association, and identified as one of the most influential educational reports by the Editorial Projects in Education Research Center."

March 13, 2007

Schemo Responds

Courtesy of gadfly Rory, Diane Schemo responds (via email) to my criticism of her reporting of the Madison Wisconsin school district's turning down of $2 million dollars in Reading First grant money.

A. Wisconsin's average NAEP score remained the same, but that fact used alone is misleading.

Enrollment changes during that period were severe: from 1998 to 2005, Wisconsin's low-income population rose from 1 in 4 students to 1 in 3 students, and blacks were 13 percent of all students, up from 10 percent in 1998. Latinos rose from 4 to 6 percent. White student enrollment by 2005 had declined to 77 percent, from 82 percent in 1998. According to the state Department of Public Instruction, Wisconsin has the fastest growing poverty rate of any state in the nation.

If you look at the scores for black 4th graders, they rose to 194 on the 2005 NAEP, up from 187 in 1998. Scores for Latinos rose to 208 from 201 during the same period.

Though White scores remained steady at 227-228, the shrinking pool of whites meant that the increasing numbers of black and Latino students brought down the overall average, even though these two groups made significant gains..

To put the case in its most extreme terms, a severe enough exodus of white students would inevitably mean that NAEP scores would nose dive as the population shifts, until Wisconsin had succeeded in completely erasing any achievement gap, and black and Latino scores were on par with white scores--something no state, anywhere, has done, using phonics, whole language, steroids, tea leaves . . . you name it.

I can sum up this argument in one word: irrelevant.

This argument is irrelevant to the question of whether performance in Madison, and in particular the Reading First eligible schools, actually increased.

Even if black and Hispanic performance rose with respect to white performance it occurred on a state-wide basis. This tells us nothing about black and Hispanic performance in Madison. As far as we know, performance in Madison could have plummeted during this time period and black and Hispanic performance outside of Madison could have skyrocketed. The data is silent.

Now, let's take a closer look at this stunning increase in black and Hispanic performance in Wisconsin that's got Schemo giddy as a schoolgirl.

NAEP 4th Grade Reading Test -- Scale Scores

Year Black Hispanic White
1992 198 209 227
1994 196 203 227
1998 187 201 228
2003 200 209 225
2005 194 208 227


These are not the miraculous gains Schemo would have us believe. A more accurate characterization is partial recovery of lost ground lost since 1994. In any event, according to NAEP's statistical significance calculator the difference between black performance between 1998 and 2005 was not statistically significant. Oops. Moreover, the "gain" in black perfromance from the 1998 nadir to 2005 was only about 0.20 s.d. which is far less than the alleged gain of 0.77 s.d. on the Wisconsin test. Even with that small "gain" the performance level for black students still remains less than it did in 1994.

My initial allegation remains unrebutted by Schemo. NAEP 4th grade reading data for all students in the state of Wisconsin shows flat performance between 1998 and 2005. This NAEP data casts serious doubt on the miraculous state-wide student gains made in Wisconsin's own 3rd grade reading test.

Schemo seems to forget that a little thing called NCLB was passed during this period and that in response thereto many states responded by making their assessments easier in order to make it easier to comply with NCLB. There's even a name for it: the race to the bottom.

Schemo's demographic shift argument doesn't fly either. According to NAEP, the change in scores for whites, blacks and Hispanics was not statistically significant, as I indicated above. The only realistic inference we can draw is that scores remained about the same for all groups, as did the overall performance level.

B. To say that a city, with its higher concentration of poverty and minority students, performed on a rough par with the state average is not an indictment. Most cities would be quite proud to keep pace with state averages. Indeed, the reason you have a special NAEP urban assessment is because the profiles of cities are so different from states.

Madison's performance level in 2005 was close to the average performance of the rest of the state. But then again, this is almost exactly where it was back in 1998. It certainly didn't gain any ground. Perhaps Madison doesn't have the Dickens-like poverty levels that Schemo is trying to portray.

Then again, compared to the rest of the state, whites in Madison seem to slightly overperform whites state-wide and blacks and Hispanics seem to slightly underperform blacks and Hispanics state-wide. Maybe Madison has a bunch of affluent whites making up for the lagging poor blacks and Hispanics. It's no secret that balanced literacy reading programs favor the affluent who have the superior background knowledge needed to succeed in these programs.

But again, this argument by Schemo is also irrelevant because we are concerned with the performance of the Reading First eligible schools, not all of Madison's schools. And these schools, as I pointed out, significantly underperformed other schools in Madison and the rest of Wisconsin.

Let me try to explain it a different way. Something happened in the state of Wisconsin between the years 1998 and 2005 such that 3rd grade students need to perform about 0.77 s.d. better on Wisconsin's WRCT in order that they may score exactly the same on the NAEP as 4th graders performed back in 1998. This 1998 cohort scored 0.77 s.d. less on the 3rd grade test and still managed to score the same on the NAEP as the 2005 cohort with those significantly better 3rd grade scores. Apparently, something magical is going on in Wisconsin such that reading comprehension rose dramatically in 3rd grade only to completely wash away by 4th grade. As it turns out, the kids in Madison's eligible Reading First schools didn't get as much magic as the rest of the kids in Madison or Wisconsin because their third grade scores didn't magically rise by as much as the kids in these other schools.

So if the object of Madison's fancy home-brewed reading program was to raise the scores of the kids in the eligible Reading First school relative to the better performing schools, we know that didn't happen because these kids now perform worse relative to these other kids in the third grade tests and the 4th graders in the better performing schools still perform exactly the same as they did in 1998.

No matter how you slice it, the performance of Madison's eligible Reading First schools did not improve relative to other schools in Wisconsin.

C. Also, Madison's efforts were part of a statewide drive to improve reading scores, so keeping pace with the rest of the state is, again, not an indictment of balanced literacy or any other single approach. To conclude that Madison's approach was unsuccessful, you'd have to compare districts across the state by their method of instruction, enrollment features and test score gains.

Another losing argument.

According to NAEP, this state-wide initiative failed miserably. Performance for all groups is not statistically different than it was back in 1998. And based on the 3rd grade test, the relative performance of Madison's eligible reading first schools declined. So, yes, it is clear: Madison's approach has been unsuccessful because it has failed to increase student performance.

And just so we're clear exactly just how miserable reading scores are in Wisconsin: 2/3 of black kids, 1/2 of Hispanic kids, and 1/4 of white kids performed below the basic level on the NAEP in 2005 just like they did back in 1998. These are the kind of scores that has the public calling for reform. And here we have Schemo trying to spin these scores as some kind of success story.

D. Finally, the blogger uses statistical sleight of hand when he wants to discuss the achievement gap, switching the time frame back to 1992. But the ground zero we were counting from was 1998, when Wisconsin apparently reacted to its eroding performance and started a statewide drive to improve early reading. And if you look at those figures (cited above), the gap between African-Americans, Latinos and whites shrank.

Let's measure the achievement gap from the table above

Year Achievement Gap
1992 29%
1994 31%
1998 41%
2003 25%
2005 33%


The mean achievement gap from 1992-2005 is 31.8% and the 2005 level is still above the mean. About the best we can say is that Wisconsin has almost gotten back to 1994 levels.

But here's the important thing, reading scores during this period have remained flat. If the object is to improve reading scores for all kids, and especially minorities, Wisconsin has failed at this task for 13 years.

My arguments remain unrebutted.

Schemo has committed journalistic malpractice. The article was supposed to be honest reporting of facts. It is one thing to report that Madison's reading program for its eligible Reading First schools has been successful based on Wisconsin's 3rd grade test. But it is malpractice not to also have reported that:
  1. Wisconsin's scores on the same test actually improved by a greater amount state-wide than did Madison's eligible Reading First schools, and
  2. Wisconsin's scores have remained flat based on NAEP.
Both of these facts cast serious doubt on the Madison school district's contention that their reading program has been successful. A serious reporter would have asked for better data or would have challenged the district to test a random sample of students on a generally accepted measure of reading ability. Schemo is not this reporter.

Update: For those of you with weak statistics skills, the fact that there is not NAEP data for Madison is not dispositive. We have state level data for both assessments. The state level data shows that the mean shifted 0.77 standard deviations for Wisconsin's test and didn't shift at all for the NAEP for the 1998-2005 period in question. Madison's shift in mean scores on the Wisconsin test parallels that of the state-wide shift. As such, Madison's expected performance gain on the NAEP will be close to zero, the same as the state-wide performance gains.

The other suspicious factor is the magnitude of the shift in the Wisconsin assessment. A 0.77 sd shift is considered to be a large effect size. Such large shifts are all but unheard of in education. We have found Lake Wobegone, kids, it is the entire state of Wisconsin.

We'd expect some statistical noise due to the differences in cohorts and the like, but we don't expect to see systematic error between tests like these. Like I said at the outset, this back of the envelope analysis raises serious doubts as to the accuracy of Madison's claims of performance gains sufficient that the yoke is on Madison to back up their contentions with data from an assessment that is known not to have shifted during this period, such as the CTBS, ITBS, SAT-9, and the like. Why isn't that data forthcoming?

March 12, 2007

When TAWLers Attack

looks like my recent post mocking the balanced literacy shenanigans being perpetrated in Madison has attracted the attention and ire of the Teachers Applying Whole Language listserv.

Go read my initial post first to get some context and I'll try to deal with the TAWLers arguments, such that they are, as they come up.

I don't understand how people cannot see that WL is offering MORE strategies, not fewer.

More strategies: yes. More productive strategies: no. Not only have they inserted a bunch of unproductive strategies to confuse naive readers, they've downplayed the most productive strategy, letter-sound correspondence, as the strategy of last resort.

It isn't like taking MOST OF THE LETTERS OUT of the words, it is not requiring every single letter to be agonized over every single time.

But, unfortunately, that is how skilled readers process words. That's what eye-movement research has shown us. Skilled readers do so at a very rapid pace. I would think any teacher of reading would be familiar with this research.

I watched Reading Mastery in Roseville one time in CA, and they were making the kids sound out SAID! Kids who could read better already than that lesson was calling for were penalized for being able to say the words automatically because they weren't slowing down and sounding out the parts.

"Said" is one of the first irregular words taught in the Reading Mastery sequence. We're talking like the second month of kindergarten when kids are just learning how to read. First the word is taught in isolation and then read in connected text. It is a difficult word for children because it is one of the first words they are taught that breaks the phonetic rules.

The kids are taught to sound it out /s/ /a/ /i/ /d/ phonetically and then taught that it is pronounced /s/ /e/ /d/. In this way, the child has a mental hook that the letter combination s-a-i-d equals /s/ /e/ /d/. They wanted to get to the meaning, and were told to stick with the single letters.

A child who can read the passages with no errors is most likely a child that is misplaced in the sequence. This would be a teacher placement problem, not a curriculum problem.

And, despite the effort to make it impossible to read that Call of the Wild passage, it wasn't so impossible, either.

It wasn't meant to be impossible, it should have been at an instructionally comprehensible level, but at a word identification level that was meant to be at the frustration level (80%). It also demonstrates how difficult it is to guess at the omitted words even when read in context and with some phonetic clues and word structure clues provided. It demonstrates that skilled readers aren't able to read a passage with fluency once the phonetic markers are removed. Skilled readers do not rely on context clues to identify unknown words. They do rely on context clues to ascertain meaning, as this TAWLer demonstrated. But, let's not kid ourselves, that's not reading. And, any kid whose decoding skills are so bad that he can only identify 80% of the words is a kid who will have no love for reading even if he can get the gist of the passage through context clues. This is the main fallacy underlying whole language pedagogy.

The example in the Times article, showed the boy "guessing" "pumpkin" and being told to "look at the word." How is that telling him to randomly guess? One clue was how long the word was, but if that did not cause him to actually look more closely at the letters in that word, another clue was surely to follow.

So, here's a few guesses that might have worked just as well as "pea" -- pop, pup, puppy, petunia, and the like.

The boy is guessing because he clearly hasn't been taught that the p stands for /p/ and ea stands for /ee/ or /aa/. All the kid needed to read the word was to know two phonics rules to read the word. Some kids will figure this out. But others won't unless it is explicitly shown to them and ample practice provided.

It is impossible to get what a real supportive reading session would be given one small example like that. The author of the article in the other publication made unfounded assumptions that the boy did not look at the word but at the picture. Well, if the picture was being used to give him the words, he wouldn't have said "pumpkin" then, would he.

Or maybe there was a picture of a pumpkin and a pea and the kid just happened to pick the wrong one. That wouldn't happen if he'd been taught to read correctly now would it. Guessing at pictures isn't reading. And guessing at pictures or predictable text does not translate into reading regular text without pictures.

They didn't point out that he was using background knowledge of what would fit in a sentence that must have been about food, vegetables, or farming...and that he didn't just throw out random words beginning with "p".

Another problem is that many at-risk kids lack the background knowledge that they'll be asked to call on to engage in these whole language guessing games. Again, this argument clearly shows that this TAWLer doesn't know the difference between using background knowledge for word identification as opposed to ascertaining word meaning. Most readers know all the vocabulary and background knowledge to fluently read the Call of the Wild passage, but once the letter identification clues are removed, skilled readers are unable to activate that vocabulary and background knowledge to identify the missing words from the context provided.


This makes me so mad. How can they gloss over the fact that the district is successful by the very measures that are being used to push other programs that they don't want to use?

Except that it's not successful. As I showed here. Madison's Reading First eligible schools are underperforming in the balanced literacy program.

As for the Call of the Wild passage... If a student had that many gaps in their reading, then I would say it is too difficult and they need to read at a lower level. They will spend their whole time trying to figure out the words and make no sense of the story.

Welcome to what reading is like for a kid who has weak decoding skills. Skills they won't be taught explicitly or in a systematic manner in balanced literacy.

And the Jack London example is beyond nonsense, because as someone ... has already said, a text this full of holes is far above the reading level of reader. Guess book choice and teacher guidance count for nothing!

It's an easy test. If the child struggles to identify at least 2 out of every 10 words read, then the level of the book is at the same level of modified Jack London passage I provided. Most books my first grade son comes home with in his balanced literacy class fall into this frustration level category because the books are not carefully vetted for decodability. These teachers really have no clue what they are asking of their students. Apparently, when it comes to whole language instruction, teacher guidance doesn't count for anything.

This guy makes it up as he goes along. Nothing is referenced to any citation. Obviously, he is not planning to debate ... using any recognized, peer reviewed research. He is only out to smash whole language. Makes me wonder why he is so passionate about this when you consider that there are probably fewer than 5% of teachers in the world who even know what whole language is, let along claim to be whole language proponents!

This is a pot, kettle, black moment. This is the crowd that cites their own opinions as research and doesn't know how to design a scientific study if it bit them. Please.

This blog isn't a research paper. I try to cite when I can, but I don't cite every well-known point. As always, if someone doubts any point I make, provide a reason why you doubt it, and request a cite. One will be provided.

Plus, I have nothing against Whole Language per se. I just dislike bad instruction. And, whole language is bad instruction based on bad science practiced by ideologues on unsuspecting children.

There's also a No True Scotsman fallacy thrown in there too. See if you can spot it.

Keep those arguments coming TAWLers. It's like shooting fish in a barrel.

Madison Cooks Books

(This post summarizes, in a hopefully more coherent manner, the analysis I did in this post.)

The NYT reports that the Madison Wisconsin school district turned down $2 million dollars in Reading First grant money. The reason: they didn't want to use a phonics-based reading program as required by Reading First. Instead, Madison wanted to continue using the district-created "balanced literacy" reading program. Nothing wrong with that per se; Reading First is a voluntary program entailing federal oversight.

But with NCLB requirements for increased student proficiency looming on the horizon, the pressure is on for low-performing school districts, like Madison, to increase student performance. It would be quite an embarrassment if Madison's reading program failed to produce results, considering their very visible refusal to adopt a phonics-based reading program--especially since there was grant money attached.

So what do you suppose Madison did when the agenda-driven NYT came looking for a poster child of a school district bullied by the Feds in the (phony) scandal-plagued Reading First program? They cooked the books.

Under their system, the share of third graders reading at the top two levels, proficient and advanced, had risen to 82 percent by 2004, from 59 percent six years earlier, even as an influx of students in poverty, to 42 percent from 31 percent of Madison’s enrollment, could have driven down test scores. The share of Madison’s black students reading at the top levels had doubled to 64 percent in 2004 from 31 percent six years earlier.

And while 17 percent of African-Americans lacked basic reading skills when Madison started its reading effort in 1998, that number had plunged to 5 percent by 2004. The exams changed after 2004, making it impossible to compare recent results with those of 1998.


Madison's scores did rise from 58.9% in 1998 to 82.7% in 2004, an apparent rise of +0.72 standard deviations (s.d.) which is a large effect size. But the question remains: did student performance actually improve?

Let's find out.

Madison's scores rose less than Wisconsin's

According to the Wisconsin Reading Comprehension Test (3rd grade), the number of proficient students in Wisconsin rose 22.5 points from 64.9% in 1998 to 87.4% is 2005, an increase of +0.77 s.d. So it wasn't just Madison's scores that rose, scores rose across the board in Wisconsin. If anything, Madison's scores rose slightly less than the average score in Wisconsin.

But, is such an across-the-board gain in achievement realistic in the first place?


Wisconsin's NAEP scores remained flat

NAEP scores for Wisconsin show that the number of proficient (and above) students was 34% percent in 1998 and dropped slightly to 33% in 2003 and stayed there in 2005, the last time fourth graders were tested for reading. Students scoring at the basic level (and above) dropped from 69% to 67% during this same period. From the period 1992-2005, the achievement gap between black and white students rose from 28 points to 33 points and the gap between poor and non-poor students dropped slightly from 28 points to 25 points.

So, NAEP shows us that the reading performance of Wisconsin fourth graders has basically remained flat since about 2000. (Go here and select Wisconsin as the jurisdiction.)

The NAEP scores tell us that Wisconsin's miraculous gains in reading achievement from 1998-2005 are non-existent. Wisconsin did what most states did in response to NCLB, they goosed the tests to artificially increase test scores to give the appearance of increased student achievement.

Let's recalibrate Wisconsin's gain of +0.77 s.d. to 0.0 s.d. to account for the non-gains made in NAEP. This means that Madison's real performance during 1998-2005 actually declined by -0.05 s.d.

Madison's schools eligible for Reading First funding performed significantly below Madison's other schools

According to this source, there were four Madison schools eligible for Reading First funding: Glendale, Hawthorne, Lincoln, and Orchard Ridge. These schools never received this funding because Madison choose to stick with its Balanced Literacy reading program. Let's see how those schools performed.

The average gain made by these four schools was only 21.6% or +0.56 s.d. This is a significant under-performance compared to the statewide gain of +0.77 s.d. Using our NAEP recalibration, the performance of these schools actually declined by -0.19 s.d. That's big.

In contrast, the average gain made by the remaining schools in Madison was 25.4% or +0.80 s.d. which is about the same gain made statewide.

The Disaggregated Data

In the NYT article, Madison officials made some wild suggestions about the relative performance of black students compared to other students (mostly white) in Madison. I showed in my previous post how these percentile gains are misleading due to how scores are distributed (a normal distribution), so I won't repeat it here.

In fact, I believe the scores that Madison is using are inaccurate. Madison's numbers have black performance rising by +0.86 s.d., which is high. I can't find disaggregated data for the WRCT, but the disaggregated data from the 3rd grade performance on the new WSAS test shows that black performance is significantly less than this and less than the average gain made by white students. It's not an apples to apples comparison, but it is consistent with the rest of the data.

Conclusion

Madison is cooking the books.

Its schools slightly underperformed Wisconsin schools and Madison's other schools.

In fact, NAEP data shows that the gains made by Wisconsin are illusory. It's doubtful that scores rose at all in Wisconsin.

If we look at only the schools in Madison that were eligible for Reading First funding, we see that these schools performed significantly worse than other schools in Wisconsin.

So it appears that Madison's Balanced Literacy reading program, which cost the district $2 million, failed to increase student performance in Madison and actually caused a relative decline in the schools that were supposed to get Reading First funding.

This is exactly what we expect to see in your typical balanced literacy program, at-risk children failing to achieve. These are the children most damaged by "balanced literacy" programs, kids with low language skills and background knowledge. These were the kids that Reading First intended to serve.

March 9, 2007

Schemo gets pwned

See my summary post in which I try to roll all the updates into a coherent post.

**Update** The scandal grows. It turns out that the Reading First schools in Madison underperformed the non-Reading First schools by almost a 1/4 standard deviation. This is an educationally significant difference. (Clarification: I'm using "Reading First schools" to designate the schools in Madison that were eligible for reading First funding but didn't get it because Madison decided to use their balanced literacy program instead of a Scientifically based reading program.)

In part one of this post, I showed you the embarrassing horror show that is taking place in the Madison school district that has hapless first graders guessing at the word "pea" and thinking the word is "pumpkin." If it weren't embarrassing enough having that displayed in the opening paragraphs of this NYT article, the Madison school district has the audacity to claim that their reading program is actually boosting student performance.

Call it the $2 million reading lesson.

By sticking to its teaching approach, that is the amount Madison passed up under Reading First, the Bush administration’s ambitious effort to turn the nation’s poor children into skilled readers by the third grade.

...

Madison officials say that a year after Wisconsin joined Reading First, in 2004, contractors pressured them to drop their approach, which blends some phonics with whole language in a program called Balanced Literacy. Instead, they gave up the money — about $2 million, according to officials here, who say their program raised reading scores.
...

Under their system, the share of third graders reading at the top two levels, proficient and advanced, had risen to 82 percent by 2004, from 59 percent six years earlier, even as an influx of students in poverty, to 42 percent from 31 percent of Madison’s enrollment, could have driven down test scores. The share of Madison’s black students reading at the top levels had doubled to 64 percent in 2004 from 31 percent six years earlier.

And while 17 percent of African-Americans lacked basic reading skills when Madison started its reading effort in 1998, that number had plunged to 5 percent by 2004. The exams changed after 2004, making it impossible to compare recent results with those of 1998.


No it didn't. It should have taken Diana Jean Schemo, this article's author, about half an hour on the internet to figure out that the "Madison officials" were spinning the "reading scores."

NAEP scores for Wisconsin show that the number of proficient students was 34% percent in 1998 and dropped slightly to 33% in 2003 and stayed there in 2005, the last time fourth graders were tested for reading. Students scoring at the basic level dropped from 69% to 67% during this same period. From the period 1992-2005, the achievement gap between black and white students rose from 28 points to 33 points and the gap between poor and non-poor students dropped slightly from 28 points to 25 points. So, NAEP shows us that the reading proficiency of Wisconsin fourth graders has basically remained flat since about 2000. (Go here and select Wisconsin as the jurisdiction.)

The Wisconsin Reading Comprehension Test (3rd grade) tells a different story. The number of proficient students in Wisconsin rose 22.5 points from 64.9% in 2000 to 87.4% is 2005. Based on our knowledge of NAEP scores for the same population, we know that this gain was imaginary. What Wisconsin did was a combination of making the test easier and lowering the cut score so that 22.5% more students would be able to pass it. The result is that the mean shifted by about +0.77 of a standard deviation.

Gather 'round, kids, and learn how Wisconsin made a fool out of the statistically illiterate Diana Jean Schemo.

Here's what I get from comparing the 1998-99 scores to the 2004-2005 scores:

In 1998 the percent of the state passing the exam was 64.9%. In 2005, it was 87.4%. A shift of 0.77 standard deviations (s.d.). Remember, during this same period NAEP scores for the state remained virtually unchanged.

In 1998 the percent of the Madison school district passing the exam was 58.9%. In 2005, it was 82.7%. A shift of .72 s.d. If anything, Madison underperformed the state during this period.

Update Three: From this source, I found out that Madison's Reading First Schools (the schools eligible for RF funds, but which never go them because Madison made them use a balanced literacy program) are Glendale, Hawthorne, Lincoln, and Orchard Ridge. As it turns out, the average gains made by these schools was only 21.6% or 0.56 s.d. In contrast, the non-Reading First schools gained 25.4% or +0.80 s.d. (Again, the percentile scores are deceptive because the Reading First schools caught the fat part of the curve, whereas the non-RF schools did not.) To put this in perspective, the 0.24 s.d. differential is about the same effect size found in project STAR (class size reduction) which educators rave about. So, the non-Reading First schools in Madison slightly overperformed the state average while the Reading First eligible schools significantly underperformed the state avergae.

I didn't find disaggregated data, so I can't determine if the stats for black students given to Schemo is accurate. But assuming they are, the rise in scores from 31% to 64% represents a shift of +0.86 s.d. That seems highly improbable considering 1) that Madison's overall rise was only +0.72 (this means that the scores for non-black in Madison rose only about +0.58, a significant underperformance compared to the rest of Wisconsin) and 2) black-white achievement gap on NAEP actually increased during this period, meaning that black performance actually declined.

Update: Rory finds the disaggregated data. In 2005, black performance in Madison was 57% which represents a shift of +0.68 s.d. which represents a underperformance of both Madison (+0.72 s.d.) and Wisconsin (+0.77 s.d.).

What actually happened was something like this:

Wisconsin raised the mean passing level from level 1 to level 2. The top curve shows black performance. This shift captured the fat part of the curve for black students (31% to 64%); picture level 1 being even further to the left. The blue shaded portion represents the increase in black performance.

Update two: Just in case it isn't clear, Madison's claim that black student performance "doubled" and exceeded the gains made by other students based on percentile changes is highly mispleading and a misuse of statistics. When you measure gains by the more statistically accurate standard deviation it is readily apparent that blac student performance was no greater than the gains made by other groups.. In fact, it appears to be considerably less. The achievement gap remains unchanged or may have even grew. This is what you would expect to happen under a whole language reading program because those students at the bottom of the curve are the ones most damaged by the non-explicit instruction.

The bottom curve shows white performance. The shaded portion between level one and level two shows the gain. I'd estimate that the white performance rose from about 69% to about 90%. (If anyone finds the disaggregated data let me know.) As you can see, white performance was already past the fat part of the curve--the average white student was passing the exam back in 1998.

Update: White performance in Madison was actually 93%. Ka-ching!

One thing is for certain, Madison's balanced literacy program did not cause the imaginary gains made by Madison school children.

Update: Mark Liberman of Language Log chimes in:

DeRosa's argument seems pretty persuasive to me. If he's right, then I'd consider a different evaluation -- was Schemo bamboozled by the Madison school authorities, or did she pick Madison, and spin the story as she did, in order to make an essentially dishonest point, suggesting that the mean old federal bureaucrats are trying to stop the dedicated local educators from continuing to use the methods that are helping their children so much?
I certainly don't want to discount the possibility that Schemo was a willing dupe. It could very well be that she went looking for a story and Madison told her what she wanted to hear.

Madison school district passes up free money

Instructivist tipped me off to this NYT article about the Madison Wisconson school district which passed up $2 million in Reading First grant money so it could continue to misteach children to read using their ideologically preferred, but research discredited, reading programs.

The article starts out with a real horror show:

Surrounded by five first graders learning to read at Hawthorne Elementary here, Stacey Hodiewicz listened as one boy struggled over a word.

“Pumpkin,” ventured the boy, Parker Kuehni.“Look at the word,” the teacher suggested.

Using a method known as whole language, she prompted him to consider the word’s size. “Is it long enough to be pumpkin?”Parker looked again. “Pea,” he said, correctly.

Young Parker isn't reading, he's guessing. I bet there was a picture of pea in the illustration that no doubt accompanied the passage that Parker was attempting to read. This is whole language--kid's aren't taught that the p stands for the sound /p/ and that ea stands for the sound /eee/. The student is supposed to identify words based on context clues, such as the shape of the word and the meaning of what he is reading, i.e., he is supposed to use his understanding of the surrounding words, sentences, or even paragraphs to help them read an unfamiliar word.

In balanced literacy when the student isn't able to guess the word correctly based on the context clues, the teacher throws him a bone and tells him that p stands for /p/ and ask him to guess again with this new tidbit of information.

In a real phonics class, the teacher first instructs the student that p stands for the sound /p/ and that ea stands for the sound /eee/ and that when the student reads the word "pea" he should blend the sounds /p/ /eeee/ to identify the word "pea." If the student knows the meaning of the word pea, i.e., it is in the student's oral vocabulary, he will comprehend what he has just read assuming that his fluency rate was sufficiently high.

What the whole language people get wrong is that it is not a productive reading strategy to identify words based on context clues. Using context clues for deriving the meaning of identified (decoded) words is perfectly ok and is what skilled readers do.

It is difficult or a skilled reader to appreciate just how wacky reading the whole language way really is. Skilled readers identify written words very rapidly and use context clues so seamlessly to ascertain the meaning of unknown decoded words that it is difficult to separate the identification (decoding) part of reading with the meaning deriving part. Skilled readers don't remember how difficult reading was when they were just learning how to read and their decoding skills were not as fast or accurate as they are today.

So let's simulate a reading passage in which you can only identify (decode) about 80% of the words. This passage reduces your ability to use phonics to identify words. You are stuck using whole word reading strategies to identify words and comprehend the passage. See how well you use those those strategies and context clues to read the following passage:

He had never seen dogs fight as these w__ish c___ f____t, and his firs ex__________ t____t him an unf________able l_____n. It is true, it was a vi_______ ex_________, else he would not have lived to pr_____it by it. Curly was the v_________. They were camped near the log store, where she, in her friend__ way, made ad_________ to a husky dog the size of a full-______ wolf, th____ not half so large as _he. __ere was no w_____ing, only a leap in like a flash, a met_____ clip of teeth, a leap out equal__ swift, and Curly's face was ripped open from eye to jaw.

It was the wolf manner of fight__, to st___ and leap away; but there was more to it than this. Th__ or forty huskies ran _o the spot and not com______d that s_____t circle. Buck did not com______d that s_____t in_______, not the e___ way with which they were licking their chops. Curly rushed her ant________, who struck again and leaped aside. He met her next rush with his chest, in a p________ fash___ that tum___ed her off her feet. She never re_____ed them. This was __at the on______ing huskies had w______ for.

Weeeeeeeee! Wasn't that fun? I'm sure you enjoyed your reading experience immensely. Imagine reading an entire book that way. And by the way, the passage was from Jack London, Call of the Wild, and is generally recognized as being great literature. According to the whole language people that should have instilled a love for learning in you.

Skilled readers eventually figure out the code and learn how to identify words with a great deal of accuracy. However, for many kids, text continues to look like this because they fail to learn the code in the absence of explicit phonics instruction.

But this isn't the worst part of the article. I'm saving that for part two.

March 7, 2007

How not to teach comprehension

It's been a while since I deconstructed an inane education article, but I think this Seattle Times article, Revving up reading in Marysville schools, is worthy of that treatment:

On a recent Monday morning at Cascade Elementary School in the Marysville School District, veteran teacher Lauri Hagglund engaged in a timeless activity — reading aloud to her second-grade students.

The children sit cross-legged or up on their knees on brightly colored carpet squares. The teacher stops at the end of each page to display the book's illustrations.

But pull the lens back a few feet and the classroom becomes a laboratory for practicing the latest approaches to literacy. A reading coach sits a few feet away from Hagglund, charting plot details about cause and effect from a story involving a very tidy cat and a very messy one.

The problem with read-alouds is pretty clear. During the read-aloud the student isn't actually reading! And what second grade students need to be doing more than anything else is practicing reading.

A lot.

But, let's get beyond that.

What is most string about this "laboratory for practicing the latest approaches to literacy," as you will soon see is how primitive the laboratory actually is. I would have thought that the level of "experimentation" would have been a tad more sophisticated in 2007.

This isn't even the beta version of reading comprehension. In this school, reading comprehension is still in the alpha stage and this is the first trial run through to catch bugs.


At the edge of the reading carpet, several observers, including the district's superintendent, assistant superintendent and school principal, take notes.

When Larry Nyland took over the district in 2004, the new superintendent launched an initiative focused on strengthening students' reading skills. Standardized test scores in the district were among the county's lowest, and classroom assessments showed that even students who read stories at grade level sometimes struggled to understand their science and history books.

It's always worse than you think. They've been in alpha testing for three years now.

Let's go to the videotape and see what the reading scores are:

Fourth grade: percent meeting standard (state average)

2003-04: 75.6% (74.4%)
2004-05: 82.8% (79.5%)
2005-06: 84.9% (81.2%)

Slightly better than average. But look at the third grade scores (only tested in 05-06):

2005-06: 60.9% (68.3%)

Below average.

And, finally let's use the reality check to see how inflates those scores really are: Fourth grade NAEP reading scores for Washington: 33% proficient or better in 2003 and 36% in 2005. That's a nice little 45 point discrepancy.

Perhaps, they'd be getting better results if they wore white lab jackets, thick horn-rimmed glasses, and had shiny silver clipboards because that's what I hear all the real scientists wear when they conduct science-like experiments.

The district developed focused training for teachers on ways they could help students more readily grasp meaning and deepen their thinking about books. Principals learned how to observe and give feedback to their teachers. Reading coaches at every elementary school worked with struggling students.

Seems like they forgot the most important step. Step One: determine if the intervention works. Step Two: disseminate and train.

In January, they began presenting lessons in classrooms while teachers observed both the techniques and their effectiveness with the children.

The elementary classrooms were also re-imagined so that each had a communal-reading area where the children could sit comfortably and closely to the spoken words.
Maybe instead of re-imagining the classroom they should have re-imagined their concept of reading. Last I checked reading involved getting close to the text of books, not getting close to the "spoken word" of a teacher reading the book for you. No wonder the kids have a hard time reading.

District administrators began making regular visits to classrooms to gauge how the training might be furthered or refined. Last year, Nyland and his staff made 500 such visits.

For many teachers, it was the first time they'd been regularly observed and given feedback since they were students, Nyland said. And some were wary of the approach.

"Up until now, administrators were only in the classroom to evaluate teachers," Nyland said. "We had to show them our focus was on improving student learning. It's like athletes watching game film to see what they're doing and what they might want to change."

What is it with teachers being so reluctant to be observed by management? They need to get over that real quick.

Back in the classroom, the school's reading coach, Leanne Rivas, asks the children how the two cats in "The Tale of Two Kitties" feel about each other. In the illustration, they sit at opposite ends of a long fence, their tails toward each other.

Ostensibly, this is a lesson on reading comprehension. And here we are discussing an illustration in the book. I think there's some cognitive dissonance in play here.

Let me suggest a better exercise to teach drawing inferences from text. The students read the following passage: "Linda and Cathy were alone on an island. Linda said, 'Stop crying, Kathy. We are both very smart, and if we use our heads, we will get out of here.'" Afterwards, the teacher asks: "Why was Kathy crying?"

That micro- exercise comes from lesson 68 of Reading Mastery III, a second grade curriculum. The instructional content of that lesson is light years ahead of what's going on in this classroom. Go take a look.

"Turn and talk," Rivas instructs, and on cue, the students turn to a partner to discuss the cats' mutual disdain. The approach seems more typical of a grown-up reading group than an elementary-school classroom, but Assistant Superintendent Gail Miller said that by recalling details, summarizing their thoughts and finding evidence for their views, the students go more deeply into the text and practice skills they'll use all of their reading lives.

No, they're talking about a picture of two cats. One kid is talking to another kid about a picture. Then the other kid takes a turn. Missing is the feedback and assessment by a teacher to determine who understands and who doesn't.

Some shortcomings of the approach are quickly apparent. Confident and chatty children give their opinions first. When Rivas counts backward — three, two, one, time for talk is done — several quieter students haven't said a word.

Another challenge emerges. Rivas is drawing boxes around the plot elements to help illustrate cause-and-effect. Writing the students' own words down also helps them summarize their ideas and gives kids just learning English a chance to see their spoken words in print, but her examples of cause-and-effect get a little lost in all the plot elements.

In short, this is a poorly thought out and even more poorly executed lesson. It is confused. The point of the instruction is unclear. Assessment of individual students is impossible. It is a waste of time from beginning to end. So why was it being presented to the students in the first place. This one should have been strangled in the cradle.

Still, the pedagogy doesn't interrupt the students' enjoyment of the book. Most laugh and point as the cats, Fluffy and Scruffy, team up to rout some impudent mice.

I think they call that being off-task.

I wonder how many of these students know what the words "rout" and "impudent" mean? Not knowing the meaning of words like this is what causes comprehension difficulty in the later grades. Yet, the teaching of vocabulary will be downplayed in classrooms like this in favor of these hokey reading strategies.

About a half-hour after it began, what the district calls a reading "walk-through" ends.


You don't want to know what I call it.

The school administrators regroup with the reading coach in the office of Cascade's principal, Chris Sampley. Miller, who previously worked with student teachers at Seattle Pacific University, asks Rivas to recall the purpose of the lesson and how it went.

Rivas says she realizes she strayed from the initial goal and isn't sure what the students learned.

That pretty much sums up what's wrong with education today. No one knows what the students actually learned.

And here's problem number two:

The observers are less hard on her than she is on herself. Miller reminds her how at different points she guided the students back to the text, the same strategy a more experienced reader would use to search for meaning or relationships.

Everything is all right, sweetie. You just keep working at it. You'll get better. Someday. hopefully soon. Or not.

Rivas, in hindsight, thinks "The Tale of Two Kitties" was better suited to demonstrate compare-and-contrast than cause-and-effect. The observers agree that one challenge of their literacy effort is that each book has to be dissected for its best lessons in advance.

It took three years to come to that realization? This is why we have NCLB. This should have been done thirty years ago.

At lunch time, she'll sit down with the classroom teacher and review any thoughts both of them have.

A week later, Rivas and Hagglund are teamed up again with another book. Some adjustments are immediately evident. The students are instructed to take turns talking about the book so no one is left out. Rivas is also more deliberate about what she chooses to chart, emphasizing cause-and-effect, which she notes is both a life lesson and a question likely to appear on the Washington Assessment of Student Learning.

See I was wrong. I had initially assumed that the students were instructed to take turns discussing. They weren't. And, it took these two trained professionals a week to figure that out.

The main problems with the lesson are still present: the teacher is reading the book, not the students, the teacher is making the chart, not the students, and the teacher still has no idea if each student learned anything.

After class, Hagglund talks about the literacy initiative and how, after 27 years in the classroom, it's changing her teaching. More than anything, she says, the literacy initiative has reminded her what it's like to be a student again, to be asked to do something new and hard.

"It puts us all in the place of being learners," she said.

I thought that's what ed school was supposed to be for?