D-Ed Reckoning: Interim Reading First Study

May 6, 2008

Interim Reading First Study

Methodological deficiencies notwithstanding, I'm not sure why anyone is surprised that the the interim Reading First Study seems to be showing null results.

Reading First was a product of political comprise. Instead of limiting grants to research validated programs reading curricula, the Reading First was watered down to permit reading programs "based on" scientifically based reading research. In reality, all this meant was that publishers needed to provide curricula appearing to have "explicit and systematic instruction in phonemic awareness, phonics, vocabulary development, reading fluency, and reading comprehension strategies.

Reading Recovery is the antithesis of this approach to teaching reading. Yet look at what those clowns managed to show (pdf).

Designing effective reading instruction targeted to at-risk kids requires an orchestration of minute details and variables.

Throwing a few disjointed phonics exercises in your previous whole language program is not effective instruction, though you'd likely be able to make a case that it falls within the statutory language of "explicit systematic instruction in ... phonics" since that undefined term is all but meaningless. Oh sure, such instruction is going to be successful with many "advantaged" kids, but a broken clock is correct twice a day as well yet we don't say that this clock tells good time.

Let's be honest, many "real" phonics programs only perform marginally better than phony phonics programs. Phonics is not a magic wand that can be waved over a reading program and make it effective instruction. Phonics is a tool. A tool that can be wielded many ways, only some of which are effective with "at-risk" children. And phonics is only one smallish part of an effective reading program.

Prior to Reading First the major education publishers were not exactly cranking out high quality instructional programs--nor were they known for their ability to design effective instruction. Then an opportunity came along to grab a larger share of the reading curricula market by putting out a product that would be selected by all those schools with Reading First grants. All that needed to be done was to redesign your program so that it appeared to comply with the undefined statutory language of Reading First. Not exactly a Herculean task.

There are an infinite number of ways to design a reading program that complies with the Reading First statute. The probability of any of the major publishers stumbling upon one of the few effective combinations is pretty slim, especially considering their previous track record. And, the probability of a school selecting one of these newly-cranked-out reading programs from one of the major publishers and seeing improved results is similarly slim.

Then there are other problems.

Even if a publisher did stumble upon an effective program, the chances that a school would actually implement it with fidelity is slim as well. There's a reason why the few reading programs that have been validated by research tend to be scripted: without the scripts, schools would screw them up.

And selecting a generic reading comprehension test as your measure of achievement is going to be mostly testing student IQ/SES and the amount of background knowledge they've acquired which arguably has little to do with reading ability.

Add all this up and the only conclusion you'd expect from evaluating Reading First schools as a whole is going to be a null set. That's what the interim study appears to have found. Lest you forget, that's what Project Follow Through found as well. Most of the Follow Through schools failed to achieve positive results. In fact, almost all of the Follow Through schools showed negative results. Reading First is only somewhat less of a failure than Project Follow Through. But that's only if you look at the programs as a whole.

It's a fair bet that when you look at individual reading programs, some of the Reading First programs will show significant positive results. One of the Project Follow Through programs showed positive results.

In education we expect a preponderance of losers. Education is not yet a mature profession. It's not even a profession. We're not going to see improvement until we make a concerted effort to separate the winners from the losers, scrap the losers, fund the winners, and find effective means of identifying and developing new winners. The Federal Government has already attempted to go down this route twice and failed both times. The Democrats tried it with Project Follow Through and the Republicans tried it with Reading First. In both cases political forces overwhelmed and weakened the attempts, returning us back to the status quo.

I expect the same outcome with Reading First. Some winner might be identified in the final report, but that outcome will be overwhelmed by the overall failure of the program as a whole. History will no doubt repeat itself again the next time we spend lots of money on a fancy grant program. You can count on that.

You're kidding yourself if you think there will be a governmental/political solution to our education woes. That's not the way the world works. It works the other way.

7 comments:

Anonymous said...: Is there a list of curricula that is approved for Reading First?

Reid Lyon has an interesting interview
about the Reading First study.; May 06, 2008 1:46 PM
KDeRosa said...: No and most schools didn't even specify programs in their applications.

I linked to Reid's article in the post.; May 06, 2008 1:53 PM
TurbineGuy said...: I searched the report data, or what was posted, to see if there was a breakout as far as programs, but no dice.

There was mention of some significant variability between programs though... would be nice to see which ones did perform well.; May 06, 2008 9:34 PM
Anonymous said...: The report should accurately be titled "Experimental Design First Study." Neither the statutory specs for the RF evaluation not the moneys appropriated by Congress can be faulted.

The report states that initially the evaluation was to be based on "real reading"--individual kids reading continuous texts and each "of the "5 essentials" were to be used as indicators, rather than a measure of g/SES, labeled "comprehension."

But when the WiseOnes found that the ideological "gold standard" of randomized control design wasn't feasible (it's never feasible in any consequential instructional matter) they settled for the "next best" regression discontinuity design.

The report does a decent job in
describing the workings of the design in non-tech terms. What the WiseOnes overlooked is that "Reading First" does not constitute a homogeneous treatment/independent variable. And the Stanford Comprehension test does not constitute an adequate dependent variable/measure of reading expertise.

Expectations that the "Final Report" will pull rabbits out of the hat are futile. The data were "in" in Spring 2007. The only "new" data not reported will relate to the Test of Silent Word Reading Fluency (TOSWRF) administered in Grade 1 in Spring 2007.

What all the commentators have miss is that if you cut through all the arcane ceremony kids in both the experimental and control groups were on average below the mean on the comp test at all points. That stat has to be taken with a good deal of wind age but it indicates that there are a substantial number of kids who can't read/are being left behind.

As Ken has made clear, programs DO differ in terms of effectiveness. It's not fair, however, to blame politicians for the failure to recognize this fact. In both FollowThrough and RF the responsibility lies with the WiseOnes who carried out the evaluations. The scary thing is that this capability hasn't improved in several decades. One could even argue that the capability has declined. The FT evaluation was shoddy, but compared to the RF evaluation it looks pretty good.; May 07, 2008 5:11 PM
Stuart Buck said...: If you check out page 44 of the actual study, the Reading First schools spent only 3.9 extra minutes per day on phonics, and 2.3 to 5.3 extra minutes on comprehension. These extra minutes, interestingly, were more than offset by a REDUCTION in time spent on vocabulary (7.8 to 11.6 minutes), fluency (about 4 minutes), and phonemic awareness.

I'm not aware of any phonics advocate that has said, "Hey, phonics is of such magical effect that if you just spent 3.9 minutes a day on phonics, and subtract 15 minutes of vocabulary and fluency instruction, you'll be much better off."; May 09, 2008 5:12 PM
Anonymous said...: Same old, same old story is told. Administer a test to determine student ranking and then an intervention hailed to be a way to make everyone equal. Lots of money is spent and the results remain the same- some people are smarter than others.; May 09, 2008 7:28 PM
Reid Lyon said...: I am sure this is more than anyone wants to know or care about, but some details about the RFIS are in order. The RFIS was designed to measure is the extent to which a specific funding stream in the form of Reading First Funding impacted reading comprehension. The impact of the Reading First funding was addressed by comparing eligible RF schools who received Reading First money with eligible schools that did not receive Reading First money. The RFIS IS NOT an experiment to test the efficacy of the intervention packet defined by RF. It’s an impact evaluation of a treatment (THE GIVING OF MONEY) in the setting of an effectiveness trial. In an effectiveness study, the “control” is not controlled, nor is the treatment. The study team was not able to prescribe any behaviors on the part of the comparison schools other than compliance with testing of students, and observation of instruction. For this type of question – funding versus no funding, the regression-discontinuity design the evaluators used was entirely appropriate. But it is possible, if not probable, that the funding of Reading First eligible schools caused changes in non-reading first schools (the comparison group) that were not anticipated. For example, we know from state Reading First evaluation reports, that some eligible RF schools not receiving funding implemented similar professional development and instruction programs as did the funded schools. They may – and many did – receive additional state/district funding to do so (more on this later). So the assumption that the eligible non- funded RF schools would continue doing what they were always doing is not valid in many cases. It is critical to understand that the RFIS did not examine the specific effects of programs, materials, or the impact of professional development, etc., on reading outcomes. Answers to these questions would have been more informative in an impact study that was designed to look at variance in treatment effects. The RFIS was supposed to do this among many other analyses, but it did not. It is possible that some data on program specific effectiveness with better comparisons will be produced in the final report, but the current design and scope of the study makes this doubtful.

But apparently, the education press reporting on the study apparently did not understand this. Toppo from USA Today led out with “Study: Bush's Reading First program ineffective, without explaining that the study only examined the impact of a funding stream and not the specific programs being purchased by the funding stream. He goes on to write, “Advocates of Reading First, an integral part of the 2002 No Child Left Behind law, have long maintained that its emphasis on phonics, scripted instruction by teachers and regular, detailed analyses of children's skills would raise reading achievement, especially among the low-income kids it targets. But the new study by the U.S. Education Department's Institute of Education Sciences (IES) shows that children in schools receiving Reading First funding had virtually no better reading skills than those in schools that didn't get the funding”. Unfortunately, the RFIS, as designed, is not capable of examining whether “scripted” phonics instruction had a differential impact on reading comprehension. Somehow, he forgets this critical feature while at the same time, reverting back to his obsession with phonics as synonymous with Reading First (Greg, please read the darn legislation). Sam Dilllon, reporting in the N.Y., Times leads out with “An Initiative on Reading Is Rated Ineffective” without explaining that that the RFIS examined the impact of funding rather than the impact of instructional programs, assessments, professional development programs and the like. Why is this a problem? Because he goes on to associate the null findings reported in the Interim Report with statements from Higgins, Kennedy, and Miller that allude to publishers and programs. But specific programs, no matter who published them, were not evaluated for effectiveness.
So why did the RFIS not evaluate the impact of what was transpiring in schools and classrooms that received Reading First funding (other than the amount of time spent in instruction by reading component). I can only guess at this point. First, it does not appear that IES or the contractors actually examined the legislative language that required the evaluation of the Reading First program. Had they done so, this is what they would have seen:
the evaluation Shall (meaning must) conduct:
1) An analysis of the relationship between each of the essential components of reading instruction and overall reading proficiency.
(2) An analysis of whether assessment tools used by State educational agencies and local educational agencies measure the essential components of reading.
(3) An analysis of how State reading standards correlate with the essential components of reading instruction.
(4) An analysis of whether the receipt of a targeted assistance grant under section 1204 results in an increase in the number of children who read proficiently.
(5) A measurement of the extent to which specific instructional materials improve reading proficiency.
(6) A measurement of the extent to which specific screening, diagnostic, and classroom-based instructional reading assessments assist teachers in identifying specific reading deficiencies.
(7) A measurement of the extent to which professional development programs implemented by State educational agencies using funds received under this subpart improve reading instruction.
(8) A measurement of how well students preparing to enter the teaching profession are prepared to teach the essential components of reading instruction.
(9) An analysis of changes in students' interest in reading and time spent reading outside of school
(10) Any other analysis or measurement pertinent to this subpart that is determined to be appropriate by the Secretary.

Second, given that the recruitment of contractors and their planning of the evaluation was delayed for unknown reasons, the amount of time required to carry out the tasks required in the evaluation (above) were probably not possible. Apparently then, the narrow questions addressing the impact of Reading First funding, while an important part of the evaluation, was addressed in isolation. Note that the delay in starting the evaluation was a concern expressed early on by staff from the House Education and Work Force Committee – a concern expressed in documents sent to the Secretary of Education (Paige) and in face to face meetings with IES and the contractors.

Third, in discussing the RFIS with people working on the evaluation, some were under the impression that the current study was the best that could be done given the resources at hand. One advisor stated that he was literally shocked to learn (1) that the congressional intent was to address tasks 1 through 10 above, and (2) that the Department had been allocated $150 million dollars ($25 million per year) to address the evaluation tasks in detail. Apparently neither the advisors to the study nor the contractors were provided the legislative language articulating the scope of the required evaluation nor apprised of the resources available - which, by the way, were sufficient to Cary out the mother of all evaluations. I am not sure, but I believe approximately $30 million dollars was expended for the current RFIS. It seems that resources may have been thought to be an issue given the sampling strategies employed and the absence of analyses addressing the majority of tasks specified in the legislation.

Not only did the education press not report these issues, they were remarkably silent on what appears to be a significant problem with the study – at least as reported in the Interim Report. While there are several confounds that limit interpretation of the data presented in the RFIS Interim Report, a hefty one is a lack of control over what is taking place in eligible funded Reading First Schools and Eligible, but not funded, Reading First schools. A major problem is that the funded schools and the non-funded schools were doing the same thing in many cases. Tim Shanahan, an advisor to the RFIS, and one deeply familiar with not only the current study but previous implementation studies has explained this clearly in a Q and A with Eduflack. Rather than summarize, it is important to look at the details – thus here is the interview:

EDUFLACK: What does the IES study really say? How strong are the findings?
SHANAHAN: THE IMPLEMENTATION STUDIES INDICATE THAT THE DIFFERENCES BETWEEN RF AND NON-RF SCHOOLS WERE PRETTY MODEST (ABOUT 50 MINUTES OF INSTRUCTIONAL DIFFERENCE PER YEAR IN AMOUNT OF INSTRUCTION), MEANING THAT RF KIDS PROBABLY RECEIVED FEWER THAN 30 HOURS OF ADDITIONAL READING INSTRUCTION EACH YEAR DUE TO THE INTERVENTION. CLEARLY A MODEST INTERVENTION, ESPECIALLY GIVEN THE SIMILARITIES IN CURRICULUM, INSTRUCTIONAL MATERIALS, PROFESSIONAL DEVELOPMENT, AND ASSESSMENTS.

Q: How valid are the findings, knowing there may be contamination across groups (that both the RF and non-RF groups may have been doing the same things in the classroom)?

A: MOST SCHOOLS EMPLOY SOME KIND OF COMMERCIAL CORE PROGRAM. WHEN READING FIRST EMPHASIZED THE ADOPTION OF PROGRAMS WITH CERTAIN DESIGNS ALL MAJOR PUBLISHERS CHANGED THEIR DESIGNS TO MATCH THE REQUIREMENTS.

READING FIRST SCHOOLS ALL BOUGHT NEW PROGRAMS IN YEAR 1; ALMOST ALL OTHER TITLE I SCHOOLS ADOPT NEW CORE PROGRAMS EVERY FOUR OR FIVE YEARS. THAT MEANS IN YEAR 1, 100% OF THE RF SCHOOLS GOT A NEW PROGRAM, AND 25% OF THE OTHER SCHOOLS DID. IN YEAR 2, THAT NUMBER WENT TO 50%, IN YEAR THREE 75%. ALL RF SCHOOLS HIRED COACHES IN YEAR 1, SO DID MORE THAN 80% OF THE OTHER SCHOOLS. ETC.

THIS ISN’T A CASE OF SPOT CONTAMINATION, IT WAS INTENTIONAL AND PERVASIVE (IN FACT, IT WAS PART OF THE RF LAW ITSELF—20% OF THE STATE MONEY, THAT MEANS $1 BILLION TOTAL WAS DEVOTED TO GETTING NON-READING FIRST SCHOOLS TO ADOPT THESE REFORMS).
Q: Given that contamination, are there contamination rates that can be tolerated in the design? For example, let’s say 15 percent of the RF and comparison groups received identical programs/PD. Is this level of contamination tolerable? What if there is a 30 percent overlap – is this level tolerable? Are there ways to estimate the degree to which percent contamination will indicate a need to increase sample size?
A: THE PERCENTAGES OF OVERLAP WERE 75-100% DEPENDING ON THE VARIABLE. THE ONLY ONE WHERE WE HAVE ANY KIND OF IDEA ABOUT WHAT IS TOLERABLE IS WITH TIME.
FROM PAST RESEARCH, ONE SUSPECTS THAT 100 HOURS OF ADDITIONAL INSTRUCTION WOULD HAVE A HIGH LIKELIHOOD OF GENERATING A LEARNING DIFFERENCE, A 50-60 HOUR DIFFERENCE WOULD STILL HAVE A REASONABLE CHANCE OF RESULTING IN A DIFFERENCE. AT 25-30 HOURS A SMALL DIFFERENCE IN LEARNING MIGHT BE OBTAINED, BUT IT IS MUCH LESS LIKELY (ESPECIALLY IF THE CURRICULA WERE THE SAME).

Q: Did the evaluation design include procedures/strategies to avoid contamination between RF and the comparison group?

A: IT [THE IES STUDY] NOT ONLY DID NOT TRY TO AVOID CONTAMINATION, IT COULDN’T POSSIBLY DO IT SINCE THE SOURCES OF THE CONTAMINATION WERE SO PERVASIVE. FIRST, THE FEDERAL POLICY EXPLICITLY CALLED FOR SUCH CONTAMINATION TO BE PUSHED. SECOND, STATES AND LOCAL DISTRICTS MADE THEIR OWN CHOICES (AND THEY FELT ENTICED OR PRESSURED TO MATCH RF).

FOR EXAMPLE, SYRACUSE, NY RECEIVED READING FIRST MONEY FOR SOME SCHOOLS, BUT MANDATED THAT ALL OF ITS SCHOOLS ADOPT THE SAME POLICIES AND PROGRAMS. THERE SHOULD HAVE BEEN NO DIFFERENCES BETWEEN RF AND NON-RF SCHOOLS IN SYRACUSE, THE ONLY DIFFERENCE WOULD BE IN FUNDING STREAM—HOW THE CHANGES WERE PAID FOR, AS THE NON-RF SCHOOLS ATTENDED THE SAME MEETINGS AND TRAININGS, ADOPTED THE SAME BOOKS AND ASSESSMENTS, RECEIVED THE SAME COACHING, PUT IN PLACE THE SAME POLICIES, ETC.
Q: Did the evaluation design describe practices in the comparison groups?
A: YES, THE IMPLEMENTATION STUDIES SHOW THE SIMILARITIES IN PRACTICES AND HOW, OVER TIME, THE PRACTICES THAT WERE SIMILAR AT THE BEGINNING BECAME INCREASINGLY SIMILAR EACH YEAR. THAT WILL BE CLEARER IN THE NEXT STUDY OUT
Q: Did the evaluation design account in any way for contamination, crossover, compensatory rivalry, etc.?
A: NO. THE FEDERAL LAW CALLED FOR THE EVALUATION OF READING FIRST IN TERMS OF THE EFFECTIVENESS OF THE INSTRUCTIONAL MODEL, BUT DID NOT CALL FOR A STUDY OF THE IMPACT OF READING FIRST UPON THE ENTIRE EDUCATIONAL SYSTEM.

EVEN THOUGH I HAD PERSONALLY MADE A BIG DEAL OUT OF THE PROBLEM FROM THE VERY FIRST STUDY DESIGN MEETING, THE METHODOLOGISTS THOUGHT THEY COULD HANDLE MY PROBLEM SIMPLY BY ACCOUNTING FOR THE RF ROLLOUT EACH YEAR. THEIR ASSUMPTION WAS THAT RF WOULD IMPLEMENT SOME CHANGES IN YEAR 1, OTHERS IN YEAR 2, AND STILL OTHERS IN YEAR 3 AND THAT THIS PATTERN OF IMPLEMENTATION WOULD ALLOW THEM TO EXAMINE A CONTINUING LAG BETWEEN THE RF AND NON-RF SCHOOLS.

I DIDN’T UNDERSTAND THAT THEY WERE THINKING THAT AND THEY NEVER ASKED DIRECTLY ABOUT THAT. LAST YEAR, I FIGURED OUT WHAT THEY WERE THINKING AND I HAD TO EXPLAIN SEVERAL TIMES THAT RF PUT ALL OF ITS REFORMS IN PLACE DURING YEAR 1, WITH NOTHING NEW IN YEARS 2 AND 3, SO IT WOULD BE IMPOSSIBLE TO TEST THE EFFECTS OF DIFFERENT PARTS OF THE IMPLEMENTATION, ETC. USING THEIR APPROACH. I MIGHT HAVE BEEN ABLE TO GET THIS FIXED IF I HAD UNDERSTOOD THAT THEY WERE ASSUMING THAT KIND OF DESIGN (OR IF THEY HAD ASKED ME ABOUT THAT SPECIFICALLY).
Q: Can we assume that the RF group is just like the comparison group except for exposure to RF funding? I the counterfactual valid?
A: READ THE IMPLEMENTATION PART OF THE REPORT (AND THERE IS ANOTHER STUDY COMING LATER THAT WILL MAKE THIS CLEARER) AND YOU’LL SEE THE DEGREE OF SIMILARITY IN THE KEY FACTORS BETWEEN THE TWO SETS OF SCHOOLS. I RAISED THIS AS A THEORETICAL PROBLEM ORIGINALLY, BUT THE IMPLEMENTATION STUDY CLEARLY SHOWS THAT CONTAMINATION WAS A BIG PROBLEM (IT CANNOT TELL US WHETHER THE CONTAMINATION CAME FROM THE $1 BILLION FEDERAL EXPENDITURE ON THIS, BECAUSE THE STATES AND LOCAL DISTRICTS OFTEN SIMPLY ADOPTED THE SAME IDEAS.

AS ONE ILLINOIS DISTRICT TOLD ME, “IF THIS IS THE RIGHT STUFF TO DO, THEN WE ARE GOING TO DO IT WITH EVERYONE.”

All this makes you want to scratch your head. Somehow, an opportunity to design and conduct one of the most comprehensive evaluations of an educational program was squandered despite having the resources to carry it out. It boggles the mind.; May 31, 2008 12:13 PM