April 25, 2007

Accelerated Reader gets the WWC treatment

The WWC recently issued a report on Accelerated Reader.

This report follows an alarming pattern in which the WWC allows the evaluation of the research to include a testing instrument developed by the authors of the education program as a valid measure of success. Out of the 35 studies reviewed, only one (Ross, Nunnery, & Goldfeder,
2004) met WWC standards. And guess what:

The STAR Early Literacy test and STAR reading test are the only outcomes reported in the study. The STAR tests are developed and distributed by Renaissance Learning, which also distributes Accelerated Reader/Reading Renaissance.

The test is a computer-adaptive, norm-referenced test that measures student reading comprehension. It is designed for students who have at least a 100-word reading vocabulary and can be used with all students in grades 1–12. Students read passages of text and fill in key missing words from a set of options (modified cloze procedure).

What's surprising is that even using their own testing instrument, Accelerated Reader didn't perform all that well.

Comprehension. Ross, Nunnery, & Goldfeder (2004) reported a positive and statistically significant effect of Accelerated Reader/Reading Renaissance on third grade student performance on the reading comprehension measure (Star Reading test). In WWC computations, this positive effect was not statistically significant, but considered substantively important according to WWC criteria (an effect size greater than 0.25).

General reading achievement. Ross, Nunnery, & Goldfeder (2004) showed that Accelerated Reader/Reading Renaissance had positive and statistically significant effects on the general reading measure (Star Early Literacy test) for kindergarten, first, and second grade students. According to WWC analysis, the average effect size across grade levels was statistically significant.

With respect to general reading achievement, to get statistically significant results, the WWC had to average the performance of grades K-3. None of the grades individually had sufficient students to achieve a statistically significant result.

Yet on the basis of this one small study using a questionable testing instrument, the WWC concluded that Accelerated Reader has "potentially positive effects" for reading comprehension and general reading, earning the coveted green box and +?.

This seems like extremely flimsy evidence to me and it seems to send a message to publishers on how to cook the books to get the thumbs up from the WWC.

April 23, 2007

Reading First Hearings

I watched all four hours of the Reading First hearings on Friday night and only managed to doze off once. (That's because I was assembling a new computer or the kids as I listened, so I was somewhat occupied.)

I'm not in the habit of watching too many of these congressional hearings, but I have witnessed a fair number of courtroom proceedings. There is no comparison. The congressional hearing is more like a kangaroo court than a legal proceeding. Most of Congressmen and women had no idea how to ask questions to elicit information from the witness. Many seemed content to mug for the camera and read speeches. It was clear that most of the Congressman on the education subcommittee had no idea of the issues surrounding Reading First.

Let me single out Congressman Miller as being particularly inept. This is especially surprising because according to Miller's bio, he's a lawyer. A lawyer that doesn't know how to take a deposition. Miler's idea of taking a deposition consists of reading a portion of the Inspector General's report in as dull and monotonous a voice, ask a, "so what do you think about that?" kind of question, give the deponent an opportunity to speak (sometimes at least), and then give his own opinion of the "facts" in an angry voice. We got more testimony from Miller than we did from any of the deponents.

Sadly, we are no closer to the truth now than we were before the hearing. We did, offhandedly, learn a few things, however.

  • The Inspector General, Higgins, admitted that he had never looked into the question of Scientifically Based Reading Research (SBRR) or the Essential Components of Reading Instruction (ECRI). This is a critical omission. Reading First is a statute that was designed to exclude reading programs that were not based on SBRR and which did contain the ECRI. Reading programs were, in fact, excluded. That was the whole point of Reading First to fund the right programs and exclude the wrong ones. So, were any programs improperly excluded? We don't know because no one looked at the critical cut-off point, i.e., whether the excluded programs were based on SBRR and had the ECRI.

  • Starr Lewis, Kentucky’s associate commissioner of education, was trotted out to give her tearfully story that she couldn't scam funding from Reading First for the controversial Reading Recovery reading program (and other whole language programs) and the fraudulent DRA testing instrument designed to show that Reading Recovery "works" even though the kids going through the program often can't read. This goes to the SBRR and ECRI issue. No one denies that Kentucky was denied Reading First funding. The issue was whether Kentucky was improperly denied. We don't know because no one has looked into the issue yet. Like, I said, kangaroo court.

  • Then Lewis recounted how Kentucky was denied funding for their use of the bogus DRA testing instrument. Kentucky responded by adding DIBELS to the list of permitted testing instruments. DOE said not good enough, we're not funding DRA. Kentucky finally withdrew DRA and was funded. Does this represent a violation of the law? We don't know because no one has looked into the SBRR and ECRI issue yet. Do you see a pattern emerging? The sequence of events also doesn't necessarily represent an instance of DIBELS being forced on Kentucky. DIBELS didn't serve to remedy the problem caused by including DRA and we don't know if the inclusion of other testing instruments would have satisfied DoE because Kentucky never presented other testing instruments for consideration. If Miller was looking for a poster child for the Reading First scandal, Kentucky wasn't it.

  • Then we have the conflict issue. Right off the bat this is a non-issue because Congress failed to included a conflict provision in the Reading First statute. But, DoE did, in fact, screen for financial conflicts. No financial conflicts were found. Higgins affirmed this. But, in his report Higgins substituted his own judgment for DoE's judgment by suggesting that DoE should have screened for "professional associations" with reading programs. This is a ridiculously broad standard which would have eliminated all the reading experts from judging what programs should have been excluded from reading First for failing to be based on SBRR and having ECRI. All the reading researchers have some professional tie to either phonics-based reading programs and whole-language reading programs and under Higgin's contrived non-statutorily-based conflicts standard all the reading researchers should have been excluded. Yeah, that's the conclusion we want to reach, exclude all the reading experts from being able to determine which reading programs qualified for funding.

  • To avoid this problem, DoE made certain that none of the reading researchers would review the programs with they had some tangential affiliation. According to the former Reading First director, they modeled this system off of a previous reading program funding scheme which passed Congressional muster. And bear in mind, that for prior federal programs that doled out money, the old standard was to specifically exclude the collection of data. That didn't work out so well in hindsight, so we tightened up the rules a bit in Reading First. Apparently, Congress didn't realize what it was getting itself into. fair enough, they have the right to change their mind to avoid the "appearance of impropriety" that they believe that resulted from DoE's scheme for complying with the Reading First statute. But, such an ex post facto change doesn't exactly show that DoE was guilty of any wrongdoing.

  • Making this conflict issue an even bigger non-issue, it came out of the hearing that less than 10% of the states even specified the actual names of reading programs in their Reading First applications. So even if there was a conflict with the reviewers, they had no way of knowing which programs the states were actually selecting. How could the "biased" reviewers have pushed specific programs on the states when practically none of the states even specified reading programs in the first place. There's a name for this in the law. It's called "harmless error."

  • Finally, there is the related problem that many states choose not to use Reading First funds to purchase many reading programs. Even if a reading program has SBRR and ECRI out the wazoo (Hello, SfA) no state was obligated to select the program for funding. Even if DoE did improperly force state's to adopt specific reading programs (which there is no real evidence of them actually doing), less than 10% of states actually listed specific programs in the first place and many of them went and chose programs on their own after the application process. So we have this large disconnect between what DoE could influence and what state's could actually do. Supposedly, there are a few instances in which state's tried to pull a fast one by submitting an application that appear to show they would abide by the SBRR and ECRI requirements, but then funded whole language programs. You can't resolve this dispute within looking into the SBRR and ECRO provisions, and no one has yet to do that.

Those are the big issues and no light was shed on any of them in the hearing. There are some remaining minor issues that indicate sloppiness of the part of DoE but it's not clear if they represent violations of any law that amount to anything. In all the OiG reports you don't see any real violations anywhere just "potential" violations and "appearances of violations." This is probably why this scandal hasn't gained much traction outside of the sour-grapes whole language community which has been largely excluded under Reading First. It is this exclusion which has caused the raising of scores under Reading First.

Of course the media, still doesn't understand the story.

Wapo tries to spin it as a financial scandal:

The Justice Department is conducting a probe of a $6 billion reading initiative at the center of President Bush's No Child Left Behind law, another blow to a program besieged by allegations of financial conflicts of interest and cronyism, people familiar with the matter said yesterday.

The disclosure came as a congressional hearing revealed how people implementing the $1 billion-a-year Reading First program made at least $1 million off textbooks and tests toward which the federal government steered states.

Too bad even the OiG failed to find any real financial conflicts and that there was no evidence of actual "steering" of state selections over and above the proper exclusion of programs as required by the statute.

When Wapo does give us an example of a purported financial conflict, it turns out not be a valid one:

One official, Roland H. Good III, said his company made $1.3 million off a reading test, known as DIBELS, that was endorsed by a Reading First evaluation panel he sat on. Good, who owns half the company, Dynamic Measurement Group, told the committee that he donated royalties from the product to the University of Oregon, where he is an associate professor.

Two former University of Oregon researchers on the panel, Edward J. Kame'enui and Deborah C. Simmons, said they received about $150,000 in royalties last year for a program that is now packaged with DIBELS. They testified that they received smaller royalties in previous years for the program, Scott Foresman Early Reading Intervention, and did not know it was being sold with DIBELS.

I haven't been following the testing instrument part of the story as much as the reading program side, but my understanding is that Good, Kame'enui, and Simmons were panelists judging testing instruments. Apparently, the Early Reading Intervention which two of them are affiliated with didn't include a packaged DIBELS component until recently. SO clearly, there was no violation at the time of review in the absence of some knowledge that DIBELS was to be included in the future. That evidence has not yet been adduced.

Then we have the lurid innuendo concerning Good, who created DIBELS. Apparently, he's made some money off of his invention. When you enact a law designed to give out a billion dollars of funding, lots of people who own products that meet the law's requirements will benefit
by the law. There is no evidence of record that shows that Good forced any testing instrument on any state. The best the OiG found was this summary of the testing instruments that listed DIBELS as one of 24 possible testing instruments that were believed to be valid instruments.

And, the NYT points out that:

All three Oregon panelists said they had not ranked their own materials’ fitness for use under Reading First, and so had avoided any conflict of interest. And they said it was the quality of their programs, not direct or indirect pressure from Education Department officials, that explained their popularity.

All we are left with is the testimony of Miller who does not appear to understand that his quips don't count for much besides headlines.

In an interview after the hearing, Mr. Miller said: “This hearing made it pretty clear that there was a very incestuous relationship among a small group of people in the Education Department and among contractors. They were very clearly using this program … for profit.”

Even if all this were true, it still doesn't mean the law was violated. The law was designed to profit a small group of reading programs for the benefit of children who are struggling to read. The fact that some profited as intended does not mean there was a violation. Potentials and appearances don't necessarily rise to the level of violations in the absence of proof. Proof that Miller has so far failed to obtain.

Today's big news is that the Justice Department may get involved. We'll have to wait any see if some real lawyers can cobble together a coherent scandal from the shoddy facts we've been given so far. I'm predicting in the absence of any new findings, the existing "facts" are insufficient to show any real violations of law.

April 18, 2007

Decode vs. Guess

I'm going to try and tie up a few loose ends from the whole language vs. phonics debate. This'll be the first in a series of posts.

Whole language presents a bit of a moving target. At first Goodman defined it as a psycholinguistic guessing game and it started out as being very anti-phonics. Under this original theory, decoding was not viewed as something that readers were necessarily doing when they read. Thus, one could honestly set forth the position that tests which measure decoding ability did not accurately measure reading ability.

But then a funny thing happened.

It turned out that many children taught to read as a psycholinguistic guessing game failed to become skilled readers, many turned out to be non-readers. Instead of folding up the tent and admitting defeat, the whole language movement changed its world view to "balanced literacy" which "of course" includes instruction in phonics.

This changed world view required a shift in how reading was viewed. Instead of a psycholinguistic guessing game, reading is now viewed as entailing the ability to decode text. The new view is that the whole process of learning how to read should be taught holistically in a content and literature rich environment. But underlying this shift in verbiage was the profound shift to defining reading as including the ability to decode words by attending to all the letters of each word all the time in conjunction with the use of meaning-based context cues.

Fair enough. There's some reason to believe that skilled readers sometimes use context clues to determine the meaning of unknown words (words not in their oral vocabulary) or determine the meaning of ambiguous words. For example, let's say that the child comes to the word "camel" and isn't sure is the word is cAmel, CamEl, or camel. The child could determine the correct word if the sentence read "They rode on top of a camel through the desert" and the word camel was in the child's receptive vocabulary.

But this new whole language world view presents a paradox. We know that the ability to fluently decode correlates highly with the ability to read well. And, we know that the ability to decode can be objectively measured using tests such as DIBELS. We also know that children taught phonics systematically and explicitly turn out to be better decoders.

So, if the ability to decode is so important, as the whole language people think it now is, how does one reconcile the whole language belief system with the objective evidence that whole language taught children do not decode as well as students taught via a more explicit and systematic version of phonics?

Moreover, the whole language people continue to insist, with no evidence, that decoding tests are invalid, but yet they also insist that it is important to teach phonics, which is supposed to teach children how to decode.

They're trying to have their cake and eat it too. Problem is that their view doesn't make much sense and is internally contradictory. What am I missing?

I don't think that word means what you think it means

According to this USA Today article, this second grade teacher seems to be unclear on what a "constitution" really is:

Each fall, Diana Schmiesing has her second-graders at Providence Elementary School develop their own constitution. Year after year, under Schmiesing's subtle guidance, the pupils discover that all their suggestions boil down to: respect yourself, respect others, respect our classroom. Students sign them into law in a Constitution Day ceremony.

By presenting rules as a problem-solving activity that'll help them all, Schmiesing finds that respect is something 7-year-olds understand. "It's a wonderful way of having kids vested in the class," says Schmiesing, 55.

Black's Law Dictionary, Sixth Edition, defines constitution, in relevant part, as "The organic and fundamental law of a nation or state ... prescribing the extent and manner of the exercise of sovereign powers."

A constitution is supposed to limit the power of government over the citizens. In the case of a classroom, the government is the school and the citizens are the students. Schools are run like dictatorships or monarchies, not democracies or republics.

A proper classroom constitution, therefore, would define and limit the school's power over the students.

In contrast, Schmiesing's constitution appears to limit the student's rights and abilities by defining how students are supposed to act. That gets it exactly backwards.

And, Schmiesing is a member of the 2006 All-USA Teacher Team just in case you were wondering.

(I'm going to avoid commenting on the banality of Schmiesing's Rosa Parks social studies project, but feel free to leave your two cents.)

April 13, 2007

Debate Afterthoughts


I'm back from a much needed break.

First, let me point out that I relied on three main references for my posts.

1. How Psychological Science Informs the Teaching of Reading

2. Successfully Decoding Unknown Words: What’s the Teacher’s Role?

3. Direct Instruction Reading, Fourth Edition

When I write "relied on," I mean "stole heavily from." I sacrificed lengthy quotations, block quotes, and a thousand ids. for readability. The people who wrote these articles are the experts, not me.

The How Psychological Science Informs the Teaching of Reading article is a very even-handed piece that everyone who wants to learn how humans read should take the time to go through. Reading is much more than phonological recoding and it's easy to see how teachers, like the whole language teachers, who aren't up to date on all the latest research might misunderstand what is going on when a person reads.

There are many more good articles on reading that I didn't reference due to timing. Some were referenced in the comments section. If anyone else has any good ones, leave a reference in the comments.

My tactic in the debate was to try to explain what is really going on when we read, rather than to get into a research pissing contest. That would not have been productive, in my opinion. half of the debate would have then consisted of my having to point out that that while much of the whole language "research" has all the trappings of research, it really isn't research at all, but someone's opinion.

I am surprised that more whole language advocates didn't chime in during the debate. They still haven't chimed in over at the tawl list. It's like they don't want to confront a viewpoint that is in opposition to their belief system. I suppose if someone told me that the laws of thermodynamics were all wrong, I'd be a little upset too.

I am not surprised how much the commenters who did chime in relied on anecdote to support their beliefs. There seems to be some mythical poor reader that is absent from the research. A reader with poor decoding skills, but somehow can read if he attends to the other whole language cues. I think a good experiment would be to round up all these identified by whole language teachers and thoroughly test their reading ability. I am willing to bet that all those people are poor readers with underdeveloped decoding skills.

Another popular point brought up is the "what's wrong with teaching more than phonics" argument. These commenters clearly didn't read my posts. The problem with teaching these alternate cues is that they are not used by skilled readers for word identification and they confuse beginning readers. More on that later.

I think that wraps up my post debate points.

Hopefully Edspresso will fix all the links in my posts that they broke. Here they are just in case they don't.

There are still a few loose ends I want to tie up. I'll do that in a new post.


That went rather swimmingly.

I'm too beat to write any more just yet, but I still have a few loose ends to clean up. A few unanswered questions. Maybe later today, maybe tomorrow.

Looks like with Edspresso under new management that a wrench may be thrown into the moderated comments at Edspresso so feel free to leave your comments here.

I'll be back.

April 11, 2007

When fanboys attack

An Alfie Kohn Fanboy stopped by and left the following comment on a nearly eight month old post:

I will go ahead and admit that I am just beginning my graduate studies in education so I am not the grand master instructivist believes him/herself to be, but let me see if I can express my self more professionally than he/she did. I realize that this is a very old post, but one statement irks me to the point that I must make this probably futile reply:

"Alfie has has given us a false premise. Homework is not responsible for stress etc. It is the student's inability to do homework, no doubt because they weren't properly taught the underlying subject matter, which leads to frustration and stress."

This is so totally false that words fail me. Homework doesn't have to be impossible to create stress. If the assignments are ridiculously easy, but ridiculously long (try unending and unnecessary) as they always were in my personal experience with primary school, then I can only view them as Mr. Kohn believes them to be: a conspiracy to desensitize students to the drudgery of spending the rest of their lives in a cubicle. I'm not certain as to the validity of Kohn's data or arguments, but I am sure about yours. And as far as your statement that there is no good data pertaining to this matter, try googling Dr. Harris Cooper. Or if you like, e-mail me and I'll send you a 60 page PDF.

You sir/madam are the jackass.

Sounds like he's been sniffing too much glue making all those ed school collages.

April 10, 2007

High school failing to teach right subjects

This isn't surprising. At all.

What students learn in high school doesn't match with what they need to know as college freshmen, according to a national study released yesterday.

Professors believe high school teachers should cover fewer topics with more depth to prepare students for college. That is one of the findings of the survey by ACT, a nonprofit educational and testing organization.

But aren't college professors looking for creative students? Apparently, not.

“A really common complaint from (college) faculty is students not being able to put together a complete sentence properly,” said Erin Goldin, director of the Writing Center, which provides tutoring at Cal State San Marcos.

“When students come in here, . . . I try to explain the rules, but they don't seem to have learned the structure of a sentence.”


In writing, college instructors place more emphasis on the fundamentals – basic grammar, sentence structure and punctuation – than their high school counterparts.


Both groups agree on the critical reading skills needed to enter college. However, the survey found a general lack of reading instruction in high school. More attention to reading complex texts is needed, according to the study, not just in English and social studies, but also in math and science.

No, they just want them to be able to write a coherent sentence--something they can't do.

What about math and science? Don't college professors want students who can think outside the box and who have higher order thinking skills? Er, no.

High school teachers valued exposure to advanced math content to a greater degree than college faculty, who placed more emphasis on understanding the fundamental underlying math skills and processes.

High school teachers rated knowledge of science content as more important than understanding the science process and inquiry skills. College faculty valued the reverse.

Well, at least they agreed on a few things.

The ACT survey, which was completed by 6,568 middle and high school teachers and college faculty nationwide, showed disagreements in virtually every college-preparatory subject.

Or, not.

April 9, 2007

Phonics vs. Whole Language Debate

The first day's posts are up at Edspresso. Go check them out.

To Close Gaps, Schools Focus on Black Boys

The NYT laps up another story about a non-instructional remedy to what is fundamentally an instructional problem: the academic performance of black boys.

Instead of fixing what and how they're teaching, the Ossining Union Free school district is trying something different:

[T]he black boys at Brookside, are set apart, in a way, by a special mentoring program that pairs them with black teachers for one-on-one guidance outside class, extra homework help, and cultural activities during the school day. “All the black boys used to end up in the office, so we had to do something,” said Lorraine Richardson, a second-grade teacher and mentor. “We wanted to teach them to help each other” instead of fight each other.

The message is that the problem is "black boys." There's something about black boys that is inherently defective and causing underperformance. It can't be that the school isn't teaching properly.

Let's see what this program entails:

The special efforts for Ossining’s black male students began in 2005 with a college-preparatory program for high schoolers and, starting last month, now stretch all the way to kindergarten, with 5-year-olds going on field trips to the American Museum of Natural History and Knicks and Mets games to practice counting.

Well, at least they got the practice counting part right.

And here's some irony for you:

Ossining’s unusual programs for black boys have drawn the attention of educators across the country as school districts in diversifying suburbs are coming under new pressure to address what many see as a seemingly intractable racial divide with no obvious solution.

First of all there is an obvious solution: teach better. But, that's a solution to one is looking for. The irony is that this "unusual program" has "drawn the attention of educators across the country." That's because they're all looking for the easy solution, as opposed to something with a proven track record. And, that's because all the programs with proven track records all involve instructional changes. these programs don't draw "the attention of educators across the country."

The federal No Child Left Behind law’s requirement that test scores be analyzed for each racial group has over the past decade spotlighted the achievement gap even in predominantly white suburban districts.

See. NCLB has been good for something. Collecting data. Something schools are loathe to do since it shows they are failing.

Some groups have attacked the program for the wrong reasons:

“I think this is a form of racial profiling in the public school system,” said the coalition’s executive director, Michael Meyers. “What they’re doing here, under the guise of helping more boys, is they’re singling them out and making them feel inferior or different simply because of their race and gender.”

Actually, having a low performer is a mainstream class sends a steady stream of information to the student that he isn't as smart as the rest of the class. So, this isn't the problem.

then we have this non-sequitur:

At a time of wider debate over the socioeconomic barriers facing black boys, the focus on boosting educational support has gained traction with policymakers.

But elsewhere in the article we're told that the black girls aren't performing as badly as the boys. So, it's not an SES issue now is it?

Finally we come to some words that should strike horror in anyone who follows education:

A New York Times analysis of state education data showed

That's where I had to stop.

April 8, 2007

When 100% Isn't 100%

When it's NCLB proficiency rates.

It is often claimed that 100% of students must be proficient under NCLB. frequently this claim is followed quickly by the assertion of what an impossible task this is.

Of course this is and has been a giant lie from the get go as this Ed Week article makes clear.

The tests may also allow some schools to make adequate yearly progress under the No Child Left Behind Act when they had not before. Up to 2 percent of students’ proficient and advanced scores on these particular tests, which the department calls “alternate assessments based on modified achievement standards,” may be counted when measuring AYP. Two percent of all students is equivalent to about 20 percent of students with disabilities.

The Education Department also allows up to 1 percent of all students in a state—equivalent to 10 percent of students with disabilities—to take a different type of alternate assessment and be counted as proficient for purposes of AYP. Those tests, which are the ones used with students with significant cognitive impairments, are less complex and comprehensive.

Get that? 2% can take a modified assessment (blind kids can take the test in braille, etc.) and 1% can take alternate assessment. Not to mention that 5% of students can be absent on test day. And, there's even talk of raising the 1% alternate assessment rate to 3%.

So, the reality is that under NCLB 100% can be as low as 92% (and perhaps soon 90%).

Keep this in mind when you hear educrats mention how onerous NCLB is.

April 6, 2007

The phonics vs whole language debate

I'll be partaking in the great phonics vs. whole language debate of aught seven starting Monday at Edspresso.

Ostensibly, I'll be defending "phonics" while Nancy Creech will be defending "whole language." At least that's how it's being billed.

However, in actuality, I'll be defending comprehensive reading instruction having a code-based emphasis and I suspect Nancy will be defending comprehensive reading instruction having a meaning-based emphasis.

No one seriously argues for a no phonics approach anymore (that view has been thoroughly discredited). Nowadays, everyone claims to be teaching phonics. The real question devolves into where the proper emphasis should be placed in beginning reading instruction--code emphasized instruction or meaning emphasized instruction.

Likewise, no one seriously argues for a phonics only approach:

Consider phonics. It is treated as something of a panacea, but how far could phonics take the learner? The idea is that one letter (or sound combination) makes one sound. This works for “Nan had a bad cat,” but it starts falling apart if “Nan had a bad day,” or if “Nan and the other children saw a show,” and still farther if “Nan and a friend were walking to school.” The letter o makes seven different sounds in common words. How does the program solve the problem of carefully teaching words with all these sound variations and teaching all the common irregular words? Solving this problem is not only far more challenging than introducing “a is for apple” but it must be addressed thoroughly and early in the first level of the program.

From Science Versus Basic Educational Research.

Complicating the problem is the fact that a fair portion of high performing students will generally learn to read regardless of where the emphasis is placed. In fact, a tiny fraction of children learn with no formal reading instruction at all and an equally tiny fraction of kids can learn to read visually, without learning the code. However, the majority of children need formal reading instruction involving a phonics component. The question is how best to teach it.

Hopefully, that'll be the real subject of next week's debate.

Update: I see that the debate has attracted the attention of the Reading Reform Foundation across the pond. Dick Shutz doesn't think I know what the alphabetic principle is. You'll have to wait until next week to see if I do or not. But, I don't think Dick appreciates the point Engelmann was making in the quote above.

The English writing system represents a trade-off between phonological explicitness and morphological transparency. For example, in a fully explicit system the letter a would be associated with a single vowel phoneme, such as the short a in fat and would use a different symbol for the vowel in fate. There is a cost for this explicitness, however, in that this would obscure the morphological relationships between words. So, the use of the symbol a to represent two different phonemes in "nature" and "natural" may be confusing as a guide to pronunciation, but it serves to remind the reader that the two words are morphologically related. This trade-off occurs repeatedly in English and serves to confuse naive readers to no end. Engelmann understands the difficulties this trade-off presents in designing an instructional sequence; Dick apparently does not.

See this video (quicktime; the most relevant portion begins at 10:20 in the video) for more on the need for carefully designed instructional sequences for teaching beginning reading to low performers.

Update II: I feel like Caesar at Alesia. Another RRFer, KenM, claims that I'm egregiously quoting Engelmann out of context. Engelmann introduced the quote with "There is a lot more to reading instruction than the categories that are currently popular— phonemic awareness, phonics, text decoding, and comprehension" which is exactly the proposition I am using the quote for: that there is more to effective reading instruction than just phonics.

Dick now claims that I don't recognize "that there is more than one legitimate instructional architectural orientation than the one that Zig has elaborated." Of course there are. There are good explicit systematic phonics instructional sequences, just as there are bad ones. A typical explicit systematic instructional sequence, on average, performs better than a typical whole language sequence and balanced literacy sequence, but still leaves a fair number of poor readers. Phonics is not a panacea.

Apparently, I can also add phonemes and morphemes to the list of things that I don't know, according to Dick. Odd, though, that he's failed to back up any of these bold claims. I suspect that Rayner, Foorman, Perfetti, Pesetsky, and seidenberg may disagree with Dick's assessment.

N.B.: I've corrected the spelling of Dick's name.

That's a feature, not a bug

This BBC News article worries that:

With lessons geared towards assessment, children are bored from the moment they begin formal schooling, the Association of Teachers and Lecturers warned.

Traditional play with sand and water was being replaced with work, it added.

Er, yeah. That's why we call it school and not play. And, at-risk kids really need to be spending as much time as possible catching up to their middle class peers as soon as possible.

The sad reality is that there is sufficient time to teach all the academics at-risk kids need to learn and still have plenty of time left over for music, art, recess, and fun. The problem is that many schools don't know how to effectively teach these kids in the first place, so the push is to spend increased amounts of time in poorly taught academic classes or "test prep" classes.

She said: "Pressure is now put on Year 1 teachers to prepare children for tests by removing sand, water, role-play etc and replacing with work space."

This was a "good model for how to switch children off and create failure," she said.

Actually, the object is to educate children. Educated children should be able to pass simple tests which measure what they've learned. And, the best model "for how to switch children off and create failure" is to fail to educate them well.

April 5, 2007

post hoc research

Slavin's Effective Programs in Elementary Mathematics: A Best-Evidence Synthesis has a good discussion on the perils of post hoc research which plagues education research:

Garden-variety selection bias is bad enough in experimental design, but many of the studies suffer from design features that add to concerns about selection bias. In particular, many of the curriculum evaluations use a post-hoc design, in which a group of schools using a given program, perhaps for many years, is compared after the fact to schools that matched the experimental program at pretest or that matched on other variables, such as poverty or reading measures. The problem is that only the “survivors” are included in the study. Schools that bought the materials, received the training, but abandoned the program before the study took place are not in the final sample, which is therefore limited to more capable schools. As one example of this, Waite (2000), in an evaluation of Everyday Mathematics, described how 17 schools in a Texas city originally received materials and training. Only 7 were still implementing it at the end of the year, and 6 of these agreed to be in the evaluation. We are not told why the other schools dropped out, but it is possible that the staffs of the remaining 6 schools may have been more capable or motivated than those that dropped the program. The comparison group within the same city was likely composed of the full range of more and less capable school staffs, and they presumably had the same opportunity to implement Everyday Mathematics but chose not to do so. Other post-hoc studies, especially those with multi-year implementations, must have also had some number of dropouts, but typically do not report how many schools there were at first and how many dropped out. There are many reasons schools may have dropped out, but it seems likely that any school staff able to implement any innovative program for several years is a more capable, more reform-oriented, or better-led staff than those unable to do so, or (even worse) than those that abandoned the program because it was not working. As an analog, imagine an evaluation of a diet regimen that only studied people who kept up the diet for a year. There are many reasons a person might abandon a diet, but chief among them is that it is not working, so looking only at the non-dropouts would bias such a study.

Worst of all, post-hoc studies usually report outcome data selected from many potential experimental and comparison groups, and may therefore report on especially successful schools using the program or matched schools that happen to have made particularly small gains, making an experimental group look better by comparison. The fact that researchers in post-hoc studies often have pre- and posttest data readily available on hundreds of potential matches, and may deliberately or inadvertently select the schools that show the program to best effect, means that readers must take results from after-the-fact comparisons with a grain of salt.

Finally, because post-hoc studies can be very easy and inexpensive to do, and are usually contracted for by publishers rather than supported by research grants or done as dissertations, such studies are likely to be particularly subject to the “file drawer” problem. That is, post-hoc studies that fail to find expected positive effects are likely to be quietly abandoned, whereas studies supported by grants or produced as dissertations will almost always result in a report of some kind. The file drawer problem has been extensively described in research on meta-analyses and other quantitative syntheses (see, for example, Cooper, 1998), and it is a problem in all research reviews, but it is much more of a problem with post-hoc studies.

Out of the 87 studies on elementary math programs that Slavin found to be sufficiently scientific 38% were post hoc designs. Notably all the "positive" research for the NSF funded constructivist math programs were post hoc designs. Moreover, in almost all cases, these post hoc studies yielded educationally insignificant effect sizes, i.e., less than 0.25 sd. The same is true for most of the major math textbook curricula.

April 1, 2007

Effective Mathematics Instruction The Importance of Curriculum

I found a nice little study comparing a fourth grade Direct Instruction math program with a well regarded fourth grade constructivist program. The results were surprising, to say the least.

The study Effective Mathematics Instruction The Importance of Curriculum (2000), Crawford and Snider, Education & Treatment of Children compared the Direct Instruction 3rd grade math curriculum Connecting Math Concepts (CMC, level D) to the constructivist fourth grade math curriculum Invitation to Mathematics (SF) published by Scott Foresman.

Invitation to Mathematics (SF)

SF has a spiral design (but of course). and relies on discovery learning and problem solving strategies to "teach" concepts. The SF text included chapters on addition and subtraction facts, numbers and place value, addition and subtraction, measurement, multiplication facts, multiplication, geometry, division facts, division, decimals, fractions, and graphing. Each chapter in the SF text interspersed a few activities on using problem solving strategies. Teacher B taught the 4th grade control class. He was an experienced 4th grade math teacher and had taught using the SF text for 11 years.

Teacher B's math period was divided into three 15-minute parts. First, students checked their homework as B gave the answers. Then students told B their scores, which he recorded. Second, B lectured or demonstrated a concept, and some students volunteered to answer questions from time-to-time. The teacher presentation was extemporaneous and included explanations, demonstrations, and references to text objectives. Third, students were assigned textbook problems and given time for independent work.

The SF group completed 10 out of 12 chapters during the experiment.

Connecting Math Concepts (CMC)

CMC is a typical Direct Instruction program having a stranded design in which multiple skills/concepts are taught in each lesson, each skill/concepts is taught for about 5-10 minutes each lesson and are revisited day after day until the skill/concept has been mastered. Explicit instruction is used to teach each skill/concept. CMC included strands on multiplication and division facts, calculator skills, whole number operations, mental arithmetic, column multiplication, column subtraction, division, equations and relationships , place value, fractions, ratios and proportions, number families, word problems, geometry, functions, and probability. Teacher A had 14 years of experience teaching math. She had no previous experience with CMC or any other Direct Instruction programs. She received 4 hours of training at a workshop in August and about three hours of additional training from the experimenters.

Teacher A used the scripted presentation in the CMC teacher presentation book for her 45 minute class. She frequently asked questions to which the whole class responded, but she did not use a signal to elicit unison responding. If she got a weak response she would ask the question again to part of the class (e.g., to one row or to all the girls) or ask individuals to raise their hands if they knew the answer. There were high levels of teacher-pupil interaction, but not every student was academically engaged. Generally, one lesson was covered per day and the first 10 minutes were set aside to correct the previous day's homework. Then a structured, teacher-guided presentation followed, during which the students responded orally or by writing answers to the teacher's questions. Student answers received immediate feedback and errors were corrected immediately. If there was time, students began their homework during the remaining minutes.

The CMC group completed 90 out of 120 lessons during the experiment.

The Experiment

Despite the differences in content and organization, both programs covered math concepts generally considered to be important in 4th grade--addition and subtraction of multi-digit numbers, multiplication and division facts and procedures, fractions, and problem solving with whole numbers.

Students were randomly assigned to each 4th grade classroom. The classes were heterogeneous and included the full range of abilities including learning disabled and gifted students. There were no significant pretest differences between students in the two curriculum groups on the computation, concepts and problem solving subtests of the NAT nor on the total test scores. Nor did any significant pretest differences show up on any of the curriculum-based measures.

The Results

Students did not use calculators on any of the tests.

The CMC Curriculum Test

For the CMC measure the experimenters designed a test that consisted of 55 production items for which students computed answers to problems, including both computational and word problems. The CMC test was comprehensive as well as cumulative; problems were examples of the entire range of problems found in the last quarter of the CMC program. Problems were chosen from the last quarter of the program because the various preskills taught in the early part of the program are integrated in problem types seen in the last quarter of the program.

The results here were not surprising, although the magnitude of the difference between the two groups may be.

The SF class averaged 15 out 55 (27%) correct answers on the posttest up from 7 out of 55 correct on the pre-test. The CMC class averaged 41 (75%) correct on the posttest up from 6 out of 55 correct on the pretest. I calculated the effect size to be 3.25 standard deviations which is enormous, though biased in favor of the CMC students.

The SF Curriculum Test

The SF test was published by Scott, Foresman to go along with the Invitation to Mathematics text and was the complete Cumulative Test for Chapters 1-12. It was intended to be comprehensive as well as cumulative. The SF test consisted of 22 multiple-choice items (four choices) which assessed the range of concepts presented in the 4th grade SF textbook.

The SF class averaged 16 out 22 (72%) correct answers on the posttest up from 4 out of 22 correct on the pre-test. However, surprisingly the CMC class averaged 19 (86%) correct on the posttest up from 3 out of 15 correct on the pretest. I calculated the effect size to be 0.75 standard deviations which is large, even though the test was biased in favor of the SF students.

The NAT exam Math Facts Test

The CMC group also scored significantly higher on rapid recall of multiplication facts. Of 72 items, the mean correctly answered in 3 minutes for the CMC group was 66 compared to 48 for the SF group for the multiplication facts posttest. I calculated the effect size to be 1.5 sd.

Posttest comparisons on the computation subtest of the NAT indicated a significant difference in favor of the CMC group. Effect size = 0.86. On the other hand, neither the scores for the concepts and problem-solving portion of the NAT nor the total NAT showed any significant group differences. The total NAT scores put the CMC group at the 51st percentile and the SF group at the 46th percentile, but this difference was not statistically significant.


The CMC implementation was less than optimal, yet it still achieved significantly better performance gains compared to the constructivist curriculum. The experimenters noted:

We believe this implementation of CMC was less than optimal because (a) students began the program in fourth grade rather than in first grade and (b) students could not be placed in homogeneous instructional groups. A unique feature of the CMC program is that it's designed around integrated strands rather than in a spiraling fashion. Each concept is introduced, developed, extended, and systematically reviewed beginning in Level A and culminating in Level F (6th grade). This design sequence means that students who enter the program at the later levels may lack the necessary preskills developed in previous levels of CMC. This study with fourth graders indicated that even when students enter Level D, without the benefit of instruction at previous levels, they could reach higher levels of achievement in certain domains. However, more students could have reached mastery if instruction were begun in the primary grades.

Another drawback in this implementation had to do with heterogeneous ability levels of the groups. Heterogeneity was an issue for both curricula. However, the emphasis on mastery in CMC created a special challenge for teachers using CMC. To monitor progress CMC tests are given every ten lessons and mastery criteria for each skill tested are provided. Because of the integrated nature of the strands, students who do not master an early skill will have trouble later on. Unlike traditional basals, concepts do not "go away," forcing teachers to continue to reteach until all students master the skills. This emphasis on mastery created a challenge for teachers that was exacerbated in this case by the fact that students had not gone through the previous three levels of CMC.

Why didn't the CMC gains show up on the NAT problem solving subtest and total math measure? The experimenters opine:

Our guess is that a more optimal implementation of CMC would have increased achievement in the CMC group, which may have shown up on the NAT. In general, the tighter focus of curriculum-based measures such as those used in this study makes them more sensitive to the effects of instruction than any published, norm-referenced test. Standardized tests have limited usefulness for program evaluation when the sample is small, as it was in this study (Carver, 1974; Marston, Fuchs, & Deno, 1985). Nevertheless, we included the NAT as a dependent measure because it is curriculum-neutral. The differences all favored the CMC program.

That no significant differences occurred either between teachers or across years on the NAT should be interpreted in the light of several other factors. One, the results do not indicate that the SF curriculum outperformed CMC, only that the NAT did not detect a difference between the groups, despite the differences found in the curriculum-based measures. Two, performance on published norm-referenced tests such as the NAT are more highly correlated to reading comprehension scores than with computation scores (Carver, 1974; Tindal & Marston, 1990). Three, the NAT concepts and problem solving items were not well-aligned with either curriculum. The types of problems on the NAT were complex, unique, non-algorithmic problems for which neither program could provide instruction. Performance on such problems has less to do with instruction than with raw ability. Four, significant differences on the calculation subtest of the NAT favored the CMC program during year 1 (see Snider and Crawford, 1996 for a detailed discussion of those results). Because less instructional time is devoted to computation skills after 4th grade, the strong calculation skills displayed by the CMC group would seem to be a worthy outcome. Five, although the NAT showed no differences in problem solving skills between curriculum groups or between program years, another source of data suggests otherwise. During year 1, on the eight word problems on the curriculum-based test, the CMC group outscored the SF group with an overall mean of 56% correct compared to 32%. An analysis of variance found this difference to be significant...

And, here's the kicker. The high-performing kids liked the highly-structured Direct Instruction program better than the loosey goosey constructivist curriculum:

Both teachers reported anecdotally that the high-performing students seemed to respond most positively to the CMC curricula. One of Teacher A's highest performing students, when asked about the program, wrote, "I wish we'd have math books like this every year.... it's easier to learn in this book because they have that part of a page that explains and that's easier than just having to pick up on whatever."

It may be somewhat counter-intuitive that an explicit, structured program would be well received by more able students. We often assume that more capable students benefit most from a less structured approach that gives them the freedom to discover and explore, whereas more didactic approaches ought to be reserved for low-performing students. It could be that high-performing students do well and respond well to highly-structured approaches when they are sufficiently challenging. These reports are interesting enough to bear further investigation after collection of objective data.