June 28, 2006

Light Blogging

Sorry for the light blogging, kids. I've been on vacation in sunny Nevis for the past week. Should be back on my game shortly.

June 7, 2006

Not So Stinky Research Part II

(This is part II of this post. Part I can be found here.)

In the last post, we discussed how we'd need to increase student performance by at least a full standard deviation in order to comply with NCLB. Bear in mind that this is considered to be a large effect size improvement, which are very rare in education research.

So are there any current educational interventions that are capable of improving performance by such a large amount?

Let's limit the discussion to research that has at least a moderate effect size (> 0.5 SD), statistically significant results, and at least three well-designed studies.

Believe it or not there are a few interventions that meet this criteria.

Fortunately, the American Institutes for Research has already done much of the hard work for us by evaluating the research for many popular elementary school interventions. AIR published the results in its November 2005 paper CSRQ Center Report on Elementary School Comprehensive School Reform Models. I went through the report and found three interventions that met my criteria.

Accelerated Schools

From the CSRQ Center Report:
The CSRQ Center reviewed 37 quantitative studies for effects of Accelerated Schools on student achievement. Three studies met CSRQ Center standards for rigor of research design. The Center considers the findings of these three studies conclusive, which means that the Center has confidence in the results of these studies. About one third of the results reported in these studies demonstrated a positive impact of Accelerated Schools on student achievement, and the average effect size for these significant results is +0.76.
The AS studies can be found in Appendix A of the report.

Success For All

From the CSRQ Center Report:
The CSRQ Center reviewed 115 quantitative studies for effects of SFA on student achievement. Thirty-three of these studies met CSRQ Center standards for rigor of research design. Upon review, the Center considers the findings of 31 of these studies conclusive, meaning the Center has confidence in the results reported. The findings of the other 2 studies are considered suggestive, which means the Center has limited confidence in them. Overall, the 33 studies report a mix of results showing positive effects and no effect of SFA; of the 91 separate achievement test findings across the 33 studies, just over half (52%) demonstrate a statistically significant positive impact. The average effect size of these positive effects is +0.63.
The SFA studies can be found in Appendix U of the report.

Direct Instruction

From the CSRQ Center Report:
The CSRQ Center reviewed 56 quantitative studies for effects of DI on student achievement. Twelve studies met CSRQ Center standards for rigor of research design. The Center considers the findings of 10 of these studies conclusive, which means that the Center has confidence in the results reported. The findings of two studies are considered suggestive, which means the Center has limited confidence in the results. The findings in the conclusive and suggestive studies showed mixed results: some studies demonstrated a positive impact of DI on student achievement and other studies showed no significant effects. About 58% of the findings reported in the studies that met standards demonstrated positive effects; the average effect size of those significant findings was +0.69.
The DI studies can be found in Appendix K of the report.

I also know that Gary Adams performed a meta-analysis on the DI research and examined 34 rigorous studies. Here's what he has written about the results.
On pages 48 and 51, the meta-analysis shows that 17 studies lasted less than a year and 17 lasted over a year. The effect size can be calculated per comparison and per study but all of the results show large effect sizes: .95 for studies less than a year and .78 for studies more than a year... On page 44, the age of the publications was analyzed (1972–1980: 6 studies, 1981–1990: 22 studies, 1991–1996: 6 studies) and all of the effect sizes were large (.73, .87, 1.00, respectively).
Fifteen of the studies were conducted by researchers who have been somehow connected with Direct Instruction. In contrast, the majority of the studies (18 studies) were conducted by non-DI-connected researchers. The effect size for studies by DI-connected researchers was .99—a large effect size. The effect size for studies by non-DI-connected researchers was .76—also a large effect size.

So, there are at least three educational interventions that are capable of achieving moderate to large effect size improvements on student performance, at least in the elementary grades.

Of course, all three of these interventions completely overhaul how schools are run, changing almost every aspect of the school. Maybe such radical change is necessary.

Does anyone else know of any other valid educational research with similar moderate to large effect sizes? The comments are open.

Research that Isn't So Stinky

Is there any decent Ed research asks John from AFT? The simple answer is yes but not much of it.

John also helpfully sets out a good definition for effect sizes for social science research:
[I]f the difference between sample averages is no more than two-tenths of a standard deviation, the difference should be regarded as small; a difference of half a standard deviation should be regarded as moderate; and a difference of eight-tenths of a standard deviation or larger should be regarded as a large difference.
These are the generally accepted standards for evaluating social science research.

Your typical Title I school is performing at the 20th percentile. Only 20% of its students perform above the 50th percentile. This low performance places the typical Title I school about 0.84 standard deviation below the median school (50th percentile). So in order to improve the performance of this school so that it performs like a mainstream school, it would need to raise its performance by 0.84 standard deviation. This is a large effect size.

So now let's pretend that there were some easily implementable educational intervention that could raise the performance of all schools by 0.84 SDs. Even with such a large improvement, about 20% of students will perform below the present 50th percentile . In Title I schools, half the students would still not make the cut! In order to raise performance so that only about 5% of students perform below the 50th precentile, we'd need some combination of lowering standards and raising performance by another 0.80 SD. Most states have already dutifully complied by lowering standards quite a bit under NCLB.

Realistically, schools are going to have to raise performance by at least a full standard deviation (and count on some help from the states) in order to comply with NCLB. And even then, the Title I schools will still be failing since they will have about 20% of students not making the cut. This is the most unfair aspect of NCLB, but until we're ready to admit that student IQ affects student performance such policy discussions will be off limits.

We'll take a look at some of the existing Ed research in part II of this post to see if we know of any educational interventions that are even capable of increasing student performance in the neighborhood that we need to comply with NCLB.

June 1, 2006

Education Research Stinks

I almost fell over as I read this editorial in the Atlanta Journal-Constitution. The editorial discusses Ed research and how lousy it is:
The public risks whiplash keeping up with the latest twists and turns in the education research. One day, block scheduling is a fresh approach that all high schools ought to adopt. The next, it's derided as stale and moldy.


These conflicts reflect a seldom acknowledged truth in education: There's a lot of uncertainty about what works and what doesn't.

In the meantime, schools invest millions of dollars in innovations and reforms that sound impressive, but don't amount to much more than a new coat of paint on a rickety old house. And when those reforms don't deliver the promised results, schools cling to them anyway because they've spent too much money and time to walk away.

Wow. That's pretty good. A journalist actually understanding the quality of research. How often do you see newspapers acknowledge that research isn't sometimes all that good. Usually, they just accept the researcher's conclusion uncritically, especially if it coincides with their ideological agenda, and play-up whatever data is handed to them.

The emphasis on student achievement in the federal No Child Left Behind Act and the requirement to use data to substantiate outcomes are prompting researchers to devise more reliable ways to capture effectiveness.

"The biggest revolution caused by No Child Left Behind is the revolution in education research," says Georgia State University's Gary Henry, a scholar in educational policy and evaluation. "We are getting better at figuring out what works. But what we are seeing is almost nothing that has a very large effect."

Even when the research shows a gain, it's a very small gain produced under the best of circumstances. That's because most reforms only tug at the edges and don't address central flaws in public education: A teacher's track record of improving student performance is hidden from public view, and that performance is not used as a factor in teacher salaries.

Now go back and re-read the part I emphasized; it's probably the most important thing you'll read this week. If you're a journalist or edu-blogger, understanding this little truism will result in you making fewer silly statements, such as touting your favorite edu-theory which inevitably lacks any indicia of success.

Here are a few simple rules to live by:

1. If the results of an experiment are not statistically significant (p ≤ 0.05), then the results are not reliable. If the researcher fails to indicate the p value it is safe to assume the results are not statistically significant. However, a p value of less than 0.05 does not transform a statistical association into a causal association. It just means that there is the requisite level of confidence that the observed statistical difference between the comparison groups are not due to chance. Whether the statistical association is in fact a causal association will depend on additional factors, such as a well designed experiment.

2. If the effect size is less than 0.25 standard deviation, the results are generally not considered to be educationally significant. This just means that we're don't want to waste good money pursuing educational interventions with small effect sizes because invariably the effect size of the intervention in actual use will be less than the results observed in the lab. To put this in perspective, an intervention with a ¼ SD effect size will only raise the performance of a typical Title I school from the 20th percentile to the 28th percentile. It takes an effect size of about one standard deviation to raise their performance to the average school--the 50th percentile. In education research, effect sizes larger than ¼ standard deviation are rare.

3. Poorly designed experiments are neither science nor reliable. The big shortcomings here are lack of comparison groups, lack of randomization or pre-testing of the experimental groups, not accounting for attrition efects, control group participation in the intervention, the presence of selection or confirmation bias, and other methodological flaws. To put it bluntly, most education researchers do not know how to conduct valid experiments.

4. Results cannot be extrapolated beyond the parameters of the experiment. This one is a biggie. Experiments are only valid for the conditions under which the experiment is conducted. So, if the experiment properly concludes that reducing class sizes to 13 students may benefit kindergarten age students, it is improper to extrapolate those results and claim that similar class size reduction might benefit all K-8 students or that a class size reduction to 20 students would have a similiar effect. Such conclusions are not supported by the evidence.

There is very little education research (and subsequent advocacy) that meets all these criteria. Typically, when any serious meta-analysis of education research is conducted, about 90% of it has to be discarded as lacking. Journalists and edu-bloggers should be very leery relying on shaking education research. You don't want to make the make the mistake the new think Tank Review Project made this week by relying on shaky research to make over-hyped points criticisizingthe Reason Foundation's universal preschool report and wind-up the laughing stock of the edu-sphere.

Follow-up Post: Ed Research that Isn't So Stinky