April 7, 2008

Your Pet Reform is Suckier Than You Think

We are not happy with student performance in the U.S. Which is to say, we are not happy with our schools' ability to educate.

So back in 1965 we, as a nation, passed the Elementary and Secondary Education Act (ESEA) to improve the state of education. The ESEA established the Department of education to distribute funding to schools and school districts with a high percentage of students from low-income families. These became known as Title I schools. The ESEA lacked a real accountability provision and so schools were not accountable for achieving any results with the federal funds. Not unsurprisingly, those results were not forthcoming.

In 2001 we decided to ameliorate that deficiency by reauthorizing ESEA to include an accountability provision. We renamed ESEA to No Child Left Behind (NCLB) and increased funding to cover the new accountability provisions (i.e., standards setting and yearly testing in grades 3-8 and 11 in math and reading (and now science)).

Let's put the problem in perspective.



This graph represents our baseline student performance back in 2001. Let's set our goal such that half the students met the goal back in 2001. (For those of you familiar with NAEP, this goal falls between the proficient and basic levels of performance). The blue shaded area under the curve represents the percentage of students who met the standard.

(Another way of looking at this is to pretend that all students took a standardized test back in 2001, we normed the test to get a normal distribution of performance with 50% of the students passing and 50% failing, then we froze that test. Subsequent improvement would result in more than 50% of students passing the test. If you think this is a Lake Woebegone effect, you don't understand the effect.)

The goal of NCLB was to have virtually all students meet this standard. This would have required an improvement of over 2 standard deviations (σ). And, thus, the chase began trying to find an intervention that would improve student performance, hopefully by at least 2σ. Seven years later this remains the national focus, at least among those who haven't given up yet.

But, here's the problem. Most people don't seem to understand how much improvement is actually needed to comply with NCLB. How much is a 2σ improvement? A lot more than most people think.

In fact, some, perhaps many, think that a real 2σ improvement is impossible. Let's accept that premise for the time being and set our goal a bit lower. What would it take for a typical Title I classroom to perform as well as an average classroom? Here is the performance of a typical Title I classroom.



In the typical Title I classroom, only 20% (blue shaded area) of the students meet the standard. In order to have this Title I classroom perform as well as the typical classroom another 30% (yellow shaded area) of students would have to meet the standards. This represents an improvement of about 0.84σ, otherwise known as a large effect size.

What does this mean? It means that if we improved all schools across the board by about 0.84σ, then only about 20% of students nationwide wouldn't meet the standards. In other words, 80% of students would meet the standards. Now a 80% pass rate doesn't comply with NCLB; but, let me let you in on a little secret. States have found ways to cheat by lowering their standards and their cut scores such that if we loosened the NCLB requirements say to a 90% to 93% pass rate, we'd be within spitting distance of meeting NCLB requirements. But, the problem remains of how to squeeze out about 0.84σ of real school improvement in the first place.

There is, of course, no shortage of opinions as to how to improve schools. It seems that everyone has their pet reform that they think is going to be some sort of magic educational bullet. The problem is that most of these educational bullets are being shot out of pop guns.

For example, let's take the favored reform of most edu-commentators: class-size reduction. The theory is that by reducing class sizes down to ridiculously small (13 to 17 students per class) and ridiculously expensive levels than student achievement will improve. In fact, student achievement will tend to improve, just not very much. Certainly less than these commentators think. In most reasonable rigorous experiments, such as Project Star, gains from class size reduction were found to be almost 0.25σ. Not much. Here's a graph to show you how little of an improvement that really is.



See that red sliver? That's the amount of improvement you can expect to see from class-size reduction. Not much. By reducing our typical Title I classroom down to Project Star levels we can expect to raise student achievement by a whopping 8%, from a 20% to 28%. Break out the champagne, kids, it's time to celebrate!

We have a name for interventions that achieve effect sizes of less than 0.25σ -- not educationally significant. This is a realization that in the real world, such interventions will likely have little or no effect in student achievement. For example, Project Star was plagued with many methodological flaws that would serve to inflate the already small effect size it achieved under experimental conditions.)

But let's not pick on class-size reduction reforms too much. The sad fact is that about 95% of all education reforms fail to achieve even the small effect sizes achieved in Project Star. This means that most education reforms fail to achieve educationally significant effects. Now go back and look at the graph again. See the read sliver which represents the smallest educationally significant effect size (0.25σ)? The red sliver for almost all educational reforms is even smaller than that red sliver shown on the graph. Wrap your head around that. And, make sure you keep that in mind the next time you tout your pet education reform. It sucks. Now you know it; stop pretending that you don't.

Now let's briefly leave the world of reality and entering the realm of fantasy. A fantasy world where statistical correlation is the same as causation. This is the land of Kozol. This is where everyone who thinks that raising student socio-economic status (SES )will lead to student achievement. It's also the land where those who think that improving teacher effectiveness is the be-all-and-end-all of education reform. It isn't. Statistical correlations aren't reality, no matter how much you want them to be.

Let's pretend for the sake of argument there's some magic potion that could increase teacher effectiveness by 2σ. To put this in perspective. This would raise the effectiveness rating of an average teacher (50%) to a super teacher (90% effectiveness rating) and would raise a 25% teacher to a 75% teacher. Using data from this study, you can see what kind of improvement we might expect from these new magical super teachers.



See the slightly larger red sliver? That sliver represents a 0.35σ effect size. An effect size that is educationally significant. By, an effect size that still misses the goal by 19 percentage points, i.e., 41% of students failed to improve sufficiently in response to the super teachers. Achieving a 2σ increase in teacher effectiveness is a pipe dream. Even achieving a 1σ improvement is probably a pipe dream, especially when you consider that the study that looked into this question failed to find a correlation between any of the typical things (credentials, experience, etc.) thought to be associated with teacher effectiveness and increased student performance. With only a 1σ improvement, however, the effect size (about 0.26σ) becomes educationally insignificant.

Bear in mind that many pet reforms relate to increasing teacher effectiveness. Paying teachers more is an attempt to increase teacher effectiveness. Raising teacher prestige is an attempt to increase teacher effectiveness. requiring greater credentials is an attempt to increase teacher effectiveness.

Which finally brings us to the reason why improving student achievement by 0.84σ (a large effect size) is within the realm of possibility. That would be the little heard of Project Follow Through, the largest education experiment in U.S. education history in which one intervention, the Direct Instruction (DI) intervention actually achieved gains of at least 0.84σ, often more.



Notice the large red slice and the lack of a yellow slice indicating no shortfall. If only your pet education reform worked as well as this one.

Update: Teach Effectively has a related post and a link to an analysis of some of the few interventions, including effect sizes, that work. Go check them out.

Update II: Brett from the DeHavilland Blog has outed himself as a closet Vanilla Ice fan. I'm sure this was a difficult and painful decision for Brett and his family. The world needs more true heroes like Brett who aren't afraid to speak truth to power.

14 comments:

Anonymous said...

Ken, I am very confused. None of the studies you cite estimates the effects of interventions on changes in proficiency. And you are comparing classroom level standard deviations with individual-level effect sizes and putting them on the same distribution? What am I missing?

KDeRosa said...

This is more of a visual aid for readers which is trying to explain difficult statistical concepts related to the effects of educational reforms. I used the classroom/students as the most readily understandable unit and used the studies as examples to demostrate the underlying concepts even though the studies aren't at the classroom/individual level.

I've defined a proficiency level and have shown what happens in an average classroom with various interventions of specific effect size.

Not sure what you mean by "individual-level effect sizes." these effect sizes are presumed to be for the distribution given in standard units.

J.D. Fisher said...

Nice post.

I've always been disappointed with the paucity of ideas in the public realm as to how DI could be generalized. There are axioms and everything. Yet what most people hear about DI is that it's a whole package--Mr. Consumer or Mr. Researcher can't understand how it works, and he or she never will.

KDeRosa said...

That's a good question, JD.

It doesn't take a genius to look at the DI curricula, see why it is successful, and then try to replicate it. Or maybe they have looked at it, figured out wht maade it successful, and then decided that it was against their ideology to do things that way.

Anonymous said...

Kderosa said, "It doesn't take a genius to look at the DI curricula, see why it is successful, and then try to replicate it. Or maybe they have looked at it, figured out what made it successful, and then decided that it was against their ideology to do things that way."

That is exactly right. Ed schools don't like things that work. They like ideology. It's easier. You don't have to be smart or work very hard to spout ideology. You have to be smart and work hard to actually DO something and understand why it works.

J.D.Fisher said...

*Sigh.*
Never mind.

Spedvet said...

I think you're making the goals of NCLB more complicated than it needs to be. One looks at all the statistical wizardry (which I don't dispute) and one can become very sour on the goals of NCLB. Instead, I think it makes more sense to throw out the bell curves and the standard deviations, and instead think about getting all students to a basic level of proficiency. Less (complicated) is more.

First we have to define proficient. It has to lie somewhere between ridiculously easy and completely out of reach for all students.

Certainly if the goal of NCLB was to get "all students" to know their timetables by the end of 10th grade, I don't think we'd be standing around talking how "impossible" that goal was. Similarly, I don't think making the reading goal of third grade decoding, or being able to differentiate between animal, plant and mineral for science would likewise be deemed out of reach for tenth graders.

Scores themselves will always follow a curve if there is enough of a sample. But good old Pass/Fail is what we should really be thinking of in terms of achievement for all children.

Certainly, we can't and shouldn't aim for every high school graduate being college ready. But neither should we settle for less than all children exiting high school with a functional level of reading and math.

Except for maybe the less than 1% of children with disabilities (which is a far smaller number than 1% of all children), this goal doesn't seem out of reach with proper instruction, such as DI.

DI has taught kids with IQs of 30 (the very low IQ range) how to read/decode. If it is possible with the lowest of the low, certainly it is more more than doable with kids within the low IQ range (60-84) and above.

Anything less should be considered a crime.

Robert said...

There is one other effect which I think is on the order of DI...Home Schooling.

I have not seen any formal studies, but I bet Zig's "TEACH YOUR CHILD TO READ IN 100 EASY LESSONS" has have far greater penetration into the Home school market than DI has had in a class room setting public or private. I just wish they had more books in that series for math and writing and reading past the second grade.

Brett said...

Are you implying that Vanilla Ice sucks? Sir, I may have to ask you to step outside. Nobody rocks harder than the ice man.

KDeRosa said...

He's not half the man MC Hammer is.

Anonymous said...

Vanilla Ice: Schools need to stop, collaborate, and listen.

Allison said...

J.D. Fisher,

Thanks for the link, even if it does make one sigh.

The part I found so fascinating was it demonstrated that it isn't even lofty ideology or deep political/philosophical convictions that lead teachers and ed schools to deny DI. It's simply self-congratulations.

They want to feel good about themselves. It must be their character that affects the outcome! It must be that they are GOOD PEOPLE! Anything that restricts them from feeling good about themselves in their performance art is bad.

Of course, people who need that level of feeling good about themselves do tend to find certain philosophical and political beliefs more in line with their own personal emptiness, but that's a result, not a cause, it appears.
The True Believer comes to mind.

J.D. Fisher said...

Thanks, Allison. What actually makes me sigh is this:

Ed schools don't like things that work. They like ideology. It's easier. You don't have to be smart or work very hard to spout ideology. You have to be smart and work hard to actually DO something and understand why it works.

The par-for-the-course stupid that I now expect every time I flip open my damn browser.

Oh, wait, and, similarly, this :

I hate the phony rigor associated with getting something published in a "refereed" journal. I see most of the articles (possibly 85%) as people trying to write about something they have never done or never even closely observed but quoting long lists of others who are as naive as they are.

Oh boo hoo. So what? Bring up DI in education circles today and just listen to the hissing. Isn't that an ad guy losing at his own game?

Johnny Pulleo and the Harmonicats said...

Long live Anonymous. Unpretentious comment even if not meeting the standards of the precious.