The public risks whiplash keeping up with the latest twists and turns in the education research. One day, block scheduling is a fresh approach that all high schools ought to adopt. The next, it's derided as stale and moldy.Wow. That's pretty good. A journalist actually understanding the quality of research. How often do you see newspapers acknowledge that research isn't sometimes all that good. Usually, they just accept the researcher's conclusion uncritically, especially if it coincides with their ideological agenda, and play-up whatever data is handed to them.
These conflicts reflect a seldom acknowledged truth in education: There's a lot of uncertainty about what works and what doesn't.
In the meantime, schools invest millions of dollars in innovations and reforms that sound impressive, but don't amount to much more than a new coat of paint on a rickety old house. And when those reforms don't deliver the promised results, schools cling to them anyway because they've spent too much money and time to walk away.
Now go back and re-read the part I emphasized; it's probably the most important thing you'll read this week. If you're a journalist or edu-blogger, understanding this little truism will result in you making fewer silly statements, such as touting your favorite edu-theory which inevitably lacks any indicia of success.
The emphasis on student achievement in the federal No Child Left Behind Act and the requirement to use data to substantiate outcomes are prompting researchers to devise more reliable ways to capture effectiveness.
"The biggest revolution caused by No Child Left Behind is the revolution in education research," says Georgia State University's Gary Henry, a scholar in educational policy and evaluation. "We are getting better at figuring out what works. But what we are seeing is almost nothing that has a very large effect."
Even when the research shows a gain, it's a very small gain produced under the best of circumstances. That's because most reforms only tug at the edges and don't address central flaws in public education: A teacher's track record of improving student performance is hidden from public view, and that performance is not used as a factor in teacher salaries.
Here are a few simple rules to live by:
1. If the results of an experiment are not statistically significant (p ≤ 0.05), then the results are not reliable. If the researcher fails to indicate the p value it is safe to assume the results are not statistically significant. However, a p value of less than 0.05 does not transform a statistical association into a causal association. It just means that there is the requisite level of confidence that the observed statistical difference between the comparison groups are not due to chance. Whether the statistical association is in fact a causal association will depend on additional factors, such as a well designed experiment.
2. If the effect size is less than 0.25 standard deviation, the results are generally not considered to be educationally significant. This just means that we're don't want to waste good money pursuing educational interventions with small effect sizes because invariably the effect size of the intervention in actual use will be less than the results observed in the lab. To put this in perspective, an intervention with a ¼ SD effect size will only raise the performance of a typical Title I school from the 20th percentile to the 28th percentile. It takes an effect size of about one standard deviation to raise their performance to the average school--the 50th percentile. In education research, effect sizes larger than ¼ standard deviation are rare.
3. Poorly designed experiments are neither science nor reliable. The big shortcomings here are lack of comparison groups, lack of randomization or pre-testing of the experimental groups, not accounting for attrition efects, control group participation in the intervention, the presence of selection or confirmation bias, and other methodological flaws. To put it bluntly, most education researchers do not know how to conduct valid experiments.
4. Results cannot be extrapolated beyond the parameters of the experiment. This one is a biggie. Experiments are only valid for the conditions under which the experiment is conducted. So, if the experiment properly concludes that reducing class sizes to 13 students may benefit kindergarten age students, it is improper to extrapolate those results and claim that similar class size reduction might benefit all K-8 students or that a class size reduction to 20 students would have a similiar effect. Such conclusions are not supported by the evidence.
There is very little education research (and subsequent advocacy) that meets all these criteria. Typically, when any serious meta-analysis of education research is conducted, about 90% of it has to be discarded as lacking. Journalists and edu-bloggers should be very leery relying on shaking education research. You don't want to make the make the mistake the new think Tank Review Project made this week by relying on shaky research to make over-hyped points criticisizingthe Reason Foundation's universal preschool report and wind-up the laughing stock of the edu-sphere.
Follow-up Post: Ed Research that Isn't So Stinky