April 14, 2008

Some Results Out of Gering

Following up on my previous post on Gering Public Schools which have implemented the whole school Direct Instruction (DI) reform with the help of The National Institute for Direct Instruction (NIFDI) beginning in the 2004-2005 school year. The intervention has been in place for three years so far and is beginning to generate some useful data.

The above graph shows DIBELS Nonsense Word Fluency (NWF) and Oral Reading Fluency (ORF) test data for grades K-6 from the spring of 2004 (before DI) and in the spring of 2007 (after 3 years of DI). These DIBELS tests are good predictors of the risk of student reading failure in subsequent grades. The graph shows the percentage of students meeting the benchmark goals. Meeting the benchmark indicates a low risk of reading failure.

The students in K-2 have been in DI since they began school. The students in gardes 3-5 have received three years of DI, i.e., for example the fifth grade students received DI in grades 3-5, but did not receive DI in grades K-2. The students in grade 6 only received a single year of DI in sixth grade.

As you can see from the graph, all grades, despite many students not receiving DI in each gradehave made substantial gains and their risk of reading failure has been significantly reduced. The kindergarten class of 2007, for example, is performing above the top 1 percentile based on these scores.

Here is a graph of grades 1-5 showing the performance of economically disadvantaged students on the same tests.

Not unsurprisingly, in 2007, the low-SES students in grades 1-5 outperformed the low-SES students in 2004. What is surprising is that the low-SES students in 2007 also outperformed all 2004 students in each grade, not just the low-SES students. That's impressive. So much for socio-economic status being a determining factor of academic success. With effective instruction, the predictive value of socio-economic status is diminished.

In case you were wondering how this performance translates to reading performance. Below is a graph of Terra Nova Reading scores for the fifth grade class of 2007 (which received only three years of DI instruction) with the performance of three cohorts of seventh graders who did not receive any DI instruction. Terra Nova is a nationally normed standardized test.

As you can see from the graph, the fifth graders who received three years of DI outperformed the seventh graders who did not.

Continued in third post.


Kathy said...

What does the DI curriculum do specifically for kids who are not performing at benchmark on the DIBELS?

Is there any one to one instruction or is it all small group?

Also when I watched the video I heard one teacher say that her kids now had more confidence and were raising their hands more. I was wondering if the teacher connected the increased confidence with the child simply being taught the correct skills to read. Do the teachers get training from the DI folks in the Alphabetic code? Do you think the teachers now realize that the balanced literacy curriculum I am assuming they were using before DI caused the lack of confidence in their students?

I ask this question as this is the hardest thing I have to deal with each day- teachers in my school not understanding the Alphabetic code and not understanding why strategies taught in balanced literacy are the main problem for kids not reading and not some inherited disability.

If DI can change teachers' reading belief system then that is a big step forward.

KDeRosa said...

I am trying to find that out. Most likely they provide additional time and assign the best teachers to the lower performers. they also make sure that the students are passing their mastery tests every ten lessons and aren't permitted to proceed in the program until they can.

I believe it's always small group instruction. I don't think they've found a benefit to one-on-one instruction over small group instruction. The number of students permitted in the small groups are reduced for the lower performing groups.

It sounded to me as if she did and they did realize that the previous programs were the cause of the previous instructional failure. I don't think teachers get specific training in the alphabetic code, but the program is taught using the code. Letter sounds are tuaght first, then those sounds are blended into words. All words are virtually 100% decodable for the first two levels. Even irregular words are handled in such a way that students sound the words out and are taught that that sequence of sounds is actually pronounced a different way.

CrypticLife said...

I'd be interested in knowing what the Terra Nova SD is. Though the graph is quite impressive, part of the reason it's impressive is because it's restricted to a ten-point range.

I really think NIFDI and Gering ought to put their videos on youtube. They need to get a lot of exposure for this. Distribution only through their own websites isn't effective enough.

KDeRosa said...

I had the same question.

This graph came from a conferencepresentation to educators. We need more info to really analyze the data.

For the time being, one interpretation of the graph is that Gering fifth graders are now performing slightly better than how their seventh graders performed before the intervention. That's about two years worth of gains which isn't too shabby.

Kathy said...

I am not so sure the DIBELS test is the best way to measure the success of DI based on my experiences with the test.

The DIBELS data from the first grade classroom at my school that receives tutoring shows that out of 22 students 19 have benchmarked the DIBELS test as of Feb 08, 86% passing, higher than the DI school first grade. I teach in a very small urban school with 64% poverty.

My school uses BL and I tutor students with explicit instruction but they must also use BL in their classrooms. Not the best situation.

I am NOT advocating BL with tutoring, just comparing my data to the Nebraska data.

I also see from my school's data that a student can benchmark DIBELS while receiving a low risk or some risk on the oral reading fluency test. We also do the phoneme segmentation test so that may affect our final score as compared to Nebraska.

Not sure anyone is aware or not but on the nonsense word reading test a student can simply say the beginning and ending sounds in a nonsense word and score two points. The child has not read the word. He can get the vowel sound incorrect. It is not counted against him. The teacher simply adds up all the points at the end of the test. A child could actually read very few words correctly but benchmark the test.

I also see on my school DIBELS data that two children are failing the grade by the DRA school district standards but are benchmarking the DIBELS.

Out of 22 students 7 are at grade level on the DRA, making only 32% are on grade level which is much lower than the DIBELS results.

I am not sure the DIBELS test always gives a clear picture of who can read and who cannot read. Same with the DRA test. It is based on a student's ability to retell a story and I have seen kids who can decode well but struggle with the standards of the DRA retell.

We have found at my school that the Terra Nova test is much easier than the state PSSA test. Our students always perform higher on the Terra Nova test so passing the TN test does not guarantee a proficient on at least our state test.

I am not sure we actually have a national reading test to simply measure what a child can or cannot read.

Anonymous said...

Very impressive results.

I wonder if anyone has done a DI vs. Kumon study? Both group students by skill level and do not allow students to move on until mastery is demonstrated. Both have tested elements that are universal across the system (at least across one language, Kumon is obviously available in several languages).

I certainly know that if I were a grade school principal (ha! like any school district is that dumb!) I would be doing more research into DI and what it could do for my school.

KDeRosa said...

Kathy, I believe they're reporting DIBELS daat because it is the Nebraska mandated test, so they most likely have lots of DIBELS data that is reportable adn comaprable.

DIBELS benchmarks merely tell what the relative risk factors are for students. That's why some students passing DIBELS still fail reading tests and others who fail DIBELS benchmarks do pass reading tests.

Gering does use other tests and I'l be posting on those also. I don't have much reading comprehension data as of yet. This is usually the last factor to improve. The intervention must have stabilized and the district must become efficient moving studenst through the program for comprehension scores to increase significantly.

Your PSSA scores are odd. The PSSA is an easy test with a low cut score. The precentage of students passing PSSA should be significantly higher, especially in grades 4,6, and 7 than the number of students scoring above the 50th percentile on TN.

Anonymous said...

PA skills are not taught prior to reading instruction in DI.

The very first lessons focus on a few basic skills before about lesson 10, when simple word reading is introduced. The prereading skills are letter sound identification, pronunciation (oral activity only -- teacher says a sound, students repeat it correctly), sequencing ( games done with actions, and terms "first" and next"), oral blending (there are visual referents, but reading in the first couple of lessons -- the children are learning correspondences in another part of the lesson), and rhyming -- students are taught to make rhyming words with a rime (not shown, such as ip or ade) utilizing a first sound that they READ, e. g. r, s, m.. They have to use their blending skill to do this. The purpose is to enable students to see, early on, that parts of words that look the same often sound the same.

No segmenting skills are taught. Later, around lesson 35 of Fast Cycle (around Lesson 70 in RM I, or halfway through the K year), supplementary spelling tasks (optional) add about 5-10 minutes to a lesson. This requires the children to spell a word they have used in reading by saying its sounds, one at a time; saying the word, then writing it. About 4-8 words are presented for this task each lesson.

The teacher models saying the word the regular way, then again with a distinct pause between each sound. The teacher signals the children to say the sounds in the word (two-second pause in between), then to write the word.

Work is checked immediately, not collected and marked later. Sentences are also dictated. At this point, capital letters have not yet been taught but children must put a space between words. Some sample sentences are: He has a farm, We are in the sand, He has mud on him, She will win a car, His dad was not sad.

These exercises go with the first half of Fast Cycle, or the second half of RM I. A language lesson is taught separately from the reading lesson in the newer version (Reading Mastery Plus, 2002)

The students get much more practice in blending than in segmenting at this stage; but the way it is taught does seem to promote both skills. Children track the sounds, then words, manually with a finger (they follow large dots and arrow prompts at the first stages), and see how the parts are distinct yet go together. The also see that digraphs go together to make one sound and are taught to write them correctly.

When they take up a spelling program later on that requires them to segment words into phonemes, or syllables, they usually can do this with little assistance (in general, blending seems to be harder for more students than phoneme segmentation).

A classroom environment with a lot of onset-rime word family work will, however, tend to produce students who routinely segment short words into onset-rime, and this shows up in their PS DIBELS performance. Once the student has begun, the tester cannot remind him or her to segment all the sounds and not just the onset and rime. Gering students most likely did not experience this confusion and responded to the PS tasks as they would do to the oral part of a Reading Mastery spelling exercise.

Other PA skills, such as phoneme deletion, are not taught in DI at all. The fluency/rate targets are rather low so are attainable by nearly everyone. I usually go for requiring a higher criterion myself; it's worth the additional effort.

Dick Schutz said...

One look at the graph Ken presents initially should tell you that that the clock is striking 13. The biggest gains are made in K and Grade 1. The “performance” then steadily goes downhill. That’s just not how learning works; one doesn’t regress.

The graph is a great “gee whiz” communication, but it’s as fake as a three dollar bill. It results from the fact that DIBELs is based on reified abstractions and Item Response Theory that inherently yield a rubber ruler. No matter that Nebraska and other states mandate DIBELs or that it has been subsidized and promoted by the Federal government. The test is fatally flawed as measure of reading expertise.

Dick Schutz

KDeRosa said...

Dick, you have to remember that these are all different cohorts with different amount of instruction of an intervention which has not yet fully stabilized. So, while your points are well taken, other possibilities exist for the apparent decline.

Moreover, the decline doesn't necessarily represent aa regression, iot could also represent, all other factors being equal, that more studenst have failed to reach the next increasingly difficult benchmark.

Dick Schutz said...

Three years is a long time for a "program" to "stabilize."

If Gering is hitching their testing horse to DIBELS, there's nothing preventing them from obtaining longitudinal data on kids. They'd find that the norms present much the same picture as the cross-cohort data. It's a rubber ruler, and doesn't reflect any actual "regression."

Using DIBELS raw scores, rather than scaled scores would be more informative. But the point is, DIBELs measures reified abstractions referenced to the "five essentials." It does not provide a sound means of tracking the acquisition for reading expertise. Neither does Terra Nova, which has the same fundamental flaws.

One can celebrate "gains" and other relative comparisons. But if you looked at the standard deviations (Does anyone even remember what a standard deviation is? It has fallen into almost complete disuse in presenting "gee whiz" information) it would be evident that there are still Gering kids who can't read. How many is unclear, and the DI grouping practice adds to the uncertainty.

One would think that NIFDY would be providing better testing advice. How long do you anticipate it will take for the program to "stabilize?" The more likely scenario seems that the program will fall victim to the Reading First budget cut. Initiatives that rely on temporary grants have zippo track record of sustainability. It would be sad to see that happen to Gering. We can hope, pray, and keep our fingers crossed for a better future, but I don't see it in the tea leaves.

Dick Schutz

KDeRosa said...

Three years is not a long time to stabilize any education program that encompasses seven grade levels. For example, the first cohort to receive the intervention in kindergarten is only in third grade this year. Next year will be the first year that the fourth grade teachers will be given the opportunity to teach a fully treated cohort. You would expect to see this cohort further along on the instructional sequence than previous cohorts, so this'll be The idea is to get students through the instructional sequence as far as possible, these fourth grade teachers, for the first, time will be receiving students who are further along in the sequence and will be expected to push them further than they've been able or expected to push any previous cohort. Perhaps, after two more cohorts we can say that fourth grade has stabilized and knows how to teach students who've been in the program from grades k-3.

My understanding is that better data will be forthcoming at some point, but this is all that's been reported so far.

Dick Schutz said...

The thing is, there is no alignment between "goals/standards". instruction/DI, and tests/DIBELS-Terra Nove. Not in Gering and not anywhere else. The Gering website has several lists of important teachable intentions. These make good sense, although they speak only to reading and math. The intents are likely closely related to the Nebraska "standards" and are in the same ballpark as intentions in other districts and states around the country.

It doesn't take any "tests" to provide information about the accomplishment of these intentions. They are uneven by grade; it's doubtful that they line up well with DI; and the DI "groups" introduce further noise.

But give the intentions and the instruction a pass. The whole enterprise falls apart when it comes to the test information. Neither DIBELS nor Terra Nova information aligns either with the intentions or the instruction. The focus of the information is on "gains" and on other relative comparisons that are completely ungrounded.

"Better data" is pie in the sky. It's altogether feasible to provide information on the accomplishment of the intentions from the get-go. Ken rightfully calls attention to "cohorts." What has Gering, or any other District, learned from last year's cohort at each grade to do things any differently to improve accomplishments the next year? Nada, zippo, zilch. The same script and testing ceremony is repeated and the same good intentions are listed.

Doing the same thing over and over again and expecting different results is Benjamin Franklin's definition of insanity. Where's Ben when we really need him?

KDeRosa said...

Dick, I agree with some of your points. The nonalignment of the standards, the currricula, and the testing instrument is problematic and makes it difficult for student achievement gains to be demonstrated and it also favors higher IQ students. However, the more a student learns the graeter the probability is that his knowledge will generalize to what the testing instrument measures.

I'm not sure I undersatndwhat you mean by the DI groups introducing noise. How is that? I do know that transfer students (a 12% mobility rate) is adding noise and is most likely depressing scores.

The crux of your argument relies on your discrediting of DIBELS and TN and I don't think you've carried your water her to prove your point. You should either provide a link to a critique of these tests based on your theory or better explain your theiry and why it is valid.

Dick Schutz said...

It doesn't take a "theory" to demonstrate that neither DIBELS or TN lines up with either the Gering statements of intended instructional accomplishments or the Mastery Tests in DI. Allyagottado is look at the tests. That's difficult for TN, since it's "confidential." but there is enough descriptive info to render a judgment.

I don't mean to pick on Gering. The district happens to be a poster example, and deservedly so. One could certainly easily find worse districts. Neither do I mean to pick on TN. I've sketched the fatal flaws of IRT as a basis for measuring student achievement in another thread on this site. If the sketch is "too sketchy" I can certainly expand. DIBELS has its uses, but not as a measure of student reading expertise. But I'm talking about these particular tests only because Gering happens to be using them

The only reasonable "gains" to consider are either transparent individual accomplishment (or mastery, if you prefer) on the instructional intentions, or the increases in the figure aggregated by teacher, school, and district across cohorts from year to year. "Gains" on ungrounded scales are strictly ceremonial.

Mobility is indeed a consideration. Rather than "lumping" the 12% into the pool of students, if DI is working, it's likely the newbies have fewer accomplishments than the DI cohort. But better or worse, their entry points would be of interest.

The "noise" I was referring to was the DI practice of ability grouping. I'm not knocking this as a pedagogical practice. But unless there is an identified physical obstacle, the function of instruction is to enhance abilities, not to maintain differences. "Lumping" the groups in reporting achievement information adds noise.

The non-alignment of instructional intentions, instructional product/protocols, and instructional accomplishment information isn't "problematic." It's as close to insane as one would care to get. Yet it goes on in district after district, year after year.

Dick Schutz

KDeRosa said...

The operative paragraph you gave is:

"The thing is,instruction isn’t a matter of 'latent traits.' it’s a matter of effecting specified performance capabilities. Prior to instruction, the student 'can’t do it;' 'scores' pile up at the bottom. Effective instruction enables specified capability; 'scores' pile up at the top. IRT precludes either distribution, because items that students flunk and items that students ace are both thrown out. The 'normal curve' inherent to IRT is operationally a function of what students have had an opportunity to learn. That gets you up to the mean of the normal distribution. The downward slope of the bell-shaped curve is a function of what only some students have had an opportunity to learn. The test is sensitive to socioeconomic status, but not to effective instruction."

I don't see how the last sentence follows from the rest of the paragraph.

If through effective instruction students are given an opportunity to learn more of what is tested, then why isn't the test sensitive to effective instruction?

I suppose these tests are sensitive to SES in the same way they are sensitive to IQ and the amount of general information possessed by students since all three are correlated. Which is tio say that the tests are overly sensitive to out-of-school effects, especially fro tests of reading comprehension.

I'd prefer criterion teste aligned with the instructional pbjectives as well, but those kinds of tests are not without their problems. See, for example, the Clay Survey and Reading Recovery and how RR hoodwinked the WWC.

I also agree that student pre-tests and comparisons across cohorts year to year are needed to accurately assess achievement gains rather than the way the data is being reported now. That's partially what I mean by "better data."

You can'r just balme schools districst for the non-alignment of objectives and tesst. These come from the state and are uniformly awful and, I'd say, unusable by schools. I'd suspect that school districts are in actuality relying mostly on their chosen curricula to set the objectives and teachers construct the tests. This is also problematic, though probably less insane.

Anonymous said...

Lots of baloney for sale here today. Nebraska is about the only state (except perhaps Iowa) that does not require standardized tests for NCLB. Nebraska submitted its own plan which the Feds approved. Teacher-developed assessments are a major portion of this. Probably no worse than other things.

The DIBELS comments are wide of the mark. DIBELS ORF passages are close to being an “authentic” assessment, since they are simply short passages of age-appropriate text. The fifth grade passages are harder than the fourth grade passages, the fourth grade harder than third. The differences are incremental and reflect the reading growth that good instruction would produce. They are as “grounded” as anything else out there. Want to see if kid can read? Hand him a story and tell him to start reading. It’s a reading task. Rate and accuracy criteria are reasonably demanding, but as it is a criterion-based measure, it is possible for everyone to pass it, no bell curve need apply.

It’s strictly a quick check type of test, like the blood pressure monitors in drug stores. If you’re not within a certain range, more investigation is needed. Any good reader in fifth grade will do fine on the DIBELS ORF. Good readers can read. Period. Because it doesn’t tap background knowledge and the like it is not so SES-dependent as a norm-referenced test would be.

However, kids who can’t decode fall on their faces with this kind of check-up. So once you start teaching kids to sound out words, you get a lot more kids benchmarking DIBELS. Doh.

DI grouped kids are not leveled by “ability” but by instructional level, which is how we teach swimming, music, Tai Kwon Do and other skill-based pusuits. That’s one reason DI gets results quickly. Nearly all the students’ instructional time is in their learning zone, they aren’t just keeping chairs warm. They can progress quickly when there are a variety of grouping options. This results in accelerated achievement.

I detect the acrid aroma of sour grapes.

Dick Schutz said...

If the State of Nebraska doesn't mandate TN or DIBELS why in the world is Gering using them? I did follow Ken in inferring the tests came as a state mandate. My error, but it doesn't impinge on the general point. I wonder whose idea it was to use the tests in Gering

The DIBELS Oral Reading Fluency and Retell Fluency are not very well-structured in terms of the Alphabetic Code. But they're tolerable. The DIBELS folks credit Stan Deno for the passages, and that work goes back 20ish years. The 1 minute speed limit and the count of the words as the "score" for retell, diminishes the information value of the Indicator as do the "Benchmark" cut scores.

These indicators are in the same ballpark as the "last Fast Cycle" passage. It's not an age or grade question of when a kid can read and understand the passage. The earlier the better, but it's the accomplishment per se that should be the focus.

Whether the DI groups are based on "ability" or DI progress is a semantic quibble. If the kids reach the point that they can read independent, that's the celebration time. The upper Deno passages difficulty is largely in terms of technical lexicon, but also increased syntactic complexity that will be beyond some kids. If kids can read and understand the text as well as they would were the communication read to them, they can read. It's not reasonable to attribute spoken language lacunae to "reading."

Dick Schutz

KDeRosa said...

I think I mentioned in the first post that in NE picks their own test for NCLB. However, I believe for RF, NE has mandated DIBELS be used for assessments.

CrypticLife said...

DI slowly gets around, bit by bit.