We are not happy with student performance in the U.S. Which is to say, we are not happy with our schools' ability to educate.
So back in 1965 we, as a nation, passed the Elementary and Secondary Education Act (ESEA) to improve the state of education. The ESEA established the Department of education to distribute funding to schools and school districts with a high percentage of students from low-income families. These became known as Title I schools. The ESEA lacked a real accountability provision and so schools were not accountable for achieving any results with the federal funds. Not unsurprisingly, those results were not forthcoming.
In 2001 we decided to ameliorate that deficiency by reauthorizing ESEA to include an accountability provision. We renamed ESEA to No Child Left Behind (NCLB) and increased funding to cover the new accountability provisions (i.e., standards setting and yearly testing in grades 3-8 and 11 in math and reading (and now science)).
Let's put the problem in perspective.
This graph represents our baseline student performance back in 2001. Let's set our goal such that half the students met the goal back in 2001. (For those of you familiar with NAEP, this goal falls between the proficient and basic levels of performance). The blue shaded area under the curve represents the percentage of students who met the standard.
(Another way of looking at this is to pretend that all students took a standardized test back in 2001, we normed the test to get a normal distribution of performance with 50% of the students passing and 50% failing, then we froze that test. Subsequent improvement would result in more than 50% of students passing the test. If you think this is a Lake Woebegone effect, you don't understand the effect.)
The goal of NCLB was to have virtually all students meet this standard. This would have required an improvement of over 2 standard deviations (σ). And, thus, the chase began trying to find an intervention that would improve student performance, hopefully by at least 2σ. Seven years later this remains the national focus, at least among those who haven't given up yet.
But, here's the problem. Most people don't seem to understand how much improvement is actually needed to comply with NCLB. How much is a 2σ improvement? A lot more than most people think.
In fact, some, perhaps many, think that a real 2σ improvement is impossible. Let's accept that premise for the time being and set our goal a bit lower. What would it take for a typical Title I classroom to perform as well as an average classroom? Here is the performance of a typical Title I classroom.
In the typical Title I classroom, only 20% (blue shaded area) of the students meet the standard. In order to have this Title I classroom perform as well as the typical classroom another 30% (yellow shaded area) of students would have to meet the standards. This represents an improvement of about 0.84σ, otherwise known as a large effect size.
What does this mean? It means that if we improved all schools across the board by about 0.84σ, then only about 20% of students nationwide wouldn't meet the standards. In other words, 80% of students would meet the standards. Now a 80% pass rate doesn't comply with NCLB; but, let me let you in on a little secret. States have found ways to cheat by lowering their standards and their cut scores such that if we loosened the NCLB requirements say to a 90% to 93% pass rate, we'd be within spitting distance of meeting NCLB requirements. But, the problem remains of how to squeeze out about 0.84σ of real school improvement in the first place.
There is, of course, no shortage of opinions as to how to improve schools. It seems that everyone has their pet reform that they think is going to be some sort of magic educational bullet. The problem is that most of these educational bullets are being shot out of pop guns.
For example, let's take the favored reform of most edu-commentators: class-size reduction. The theory is that by reducing class sizes down to ridiculously small (13 to 17 students per class) and ridiculously expensive levels than student achievement will improve. In fact, student achievement will tend to improve, just not very much. Certainly less than these commentators think. In most reasonable rigorous experiments, such as Project Star, gains from class size reduction were found to be almost 0.25σ. Not much. Here's a graph to show you how little of an improvement that really is.
See that red sliver? That's the amount of improvement you can expect to see from class-size reduction. Not much. By reducing our typical Title I classroom down to Project Star levels we can expect to raise student achievement by a whopping 8%, from a 20% to 28%. Break out the champagne, kids, it's time to celebrate!
We have a name for interventions that achieve effect sizes of less than 0.25σ -- not educationally significant. This is a realization that in the real world, such interventions will likely have little or no effect in student achievement. For example, Project Star was plagued with many methodological flaws that would serve to inflate the already small effect size it achieved under experimental conditions.)
But let's not pick on class-size reduction reforms too much. The sad fact is that about 95% of all education reforms fail to achieve even the small effect sizes achieved in Project Star. This means that most education reforms fail to achieve educationally significant effects. Now go back and look at the graph again. See the read sliver which represents the smallest educationally significant effect size (0.25σ)? The red sliver for almost all educational reforms is even smaller than that red sliver shown on the graph. Wrap your head around that. And, make sure you keep that in mind the next time you tout your pet education reform. It sucks. Now you know it; stop pretending that you don't.
Now let's briefly leave the world of reality and entering the realm of fantasy. A fantasy world where statistical correlation is the same as causation. This is the land of Kozol. This is where everyone who thinks that raising student socio-economic status (SES )will lead to student achievement. It's also the land where those who think that improving teacher effectiveness is the be-all-and-end-all of education reform. It isn't. Statistical correlations aren't reality, no matter how much you want them to be.
Let's pretend for the sake of argument there's some magic potion that could increase teacher effectiveness by 2σ. To put this in perspective. This would raise the effectiveness rating of an average teacher (50%) to a super teacher (90% effectiveness rating) and would raise a 25% teacher to a 75% teacher. Using data from this study, you can see what kind of improvement we might expect from these new magical super teachers.
See the slightly larger red sliver? That sliver represents a 0.35σ effect size. An effect size that is educationally significant. By, an effect size that still misses the goal by 19 percentage points, i.e., 41% of students failed to improve sufficiently in response to the super teachers. Achieving a 2σ increase in teacher effectiveness is a pipe dream. Even achieving a 1σ improvement is probably a pipe dream, especially when you consider that the study that looked into this question failed to find a correlation between any of the typical things (credentials, experience, etc.) thought to be associated with teacher effectiveness and increased student performance. With only a 1σ improvement, however, the effect size (about 0.26σ) becomes educationally insignificant.
Bear in mind that many pet reforms relate to increasing teacher effectiveness. Paying teachers more is an attempt to increase teacher effectiveness. Raising teacher prestige is an attempt to increase teacher effectiveness. requiring greater credentials is an attempt to increase teacher effectiveness.
Which finally brings us to the reason why improving student achievement by 0.84σ (a large effect size) is within the realm of possibility. That would be the little heard of Project Follow Through, the largest education experiment in U.S. education history in which one intervention, the Direct Instruction (DI) intervention actually achieved gains of at least 0.84σ, often more.
Notice the large red slice and the lack of a yellow slice indicating no shortfall. If only your pet education reform worked as well as this one.
Update: Teach Effectively has a related post and a link to an analysis of some of the few interventions, including effect sizes, that work. Go check them out.
Update II: Brett from the DeHavilland Blog has outed himself as a closet Vanilla Ice fan. I'm sure this was a difficult and painful decision for Brett and his family. The world needs more true heroes like Brett who aren't afraid to speak truth to power.