January 31, 2008

Statistical Significance in Education Research

getting away from the whole notion of statistical significance that's been drummed into us, which apparently isn't the gold standard we think it is.
What we really need to do is get away from the layman's confusion with the term "statistical significance." From Wikipedia:

In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. "A statistically significant difference" simply means there is statistical evidence that there is a difference; it does not mean the difference is necessarily large, important or significant in the common meaning of the word.

To put it in layman's terms, when we say something is statistically significant, we merely mean that we are at least 95% certain (typically) that the observed experimental difference is not the result of chance. But, here's where the confusion comes in.

A common misconception is that a statistically significant result is always of practical significance, or demonstrates a large effect in the population. Unfortunately, this problem is commonly encountered in scientific writing. Given a sufficiently large sample, extremely small and non-notable differences can be found to be statistically significant, and statistical significance says nothing about the practical significance of a difference.

Round up enough students in your experiment and even tiny differences in academic performance between the experimental group and the control group pass the test of statistical significance.

That's why we look to the effect size of the experiment, which is the magnitude of the observed effect of the intervention being tested. In education research, an effect size less than 0.25 of a standard deviation is not typically considered to be educationally significant. Here are the rules of thumb that are typically used for classifiying ffect size in education research: small effect size (> 0.25), medium effect size (>0.5), and large effect size (> 0.8).

So, when it comes to education research we want to see studies that conform to the standards of the behavioral sciences (this eliminates 90% of all "research" in education). Once we've culled the herd, we throw out all the research whose results are neither statistically significant nor educationally significant.

What is left? Not much at all. Such is the sad state of educational research.

For more on the common scams you'll find in education research see this post and the linked paper by Slavin.

Michael said...

May I suggest replacing "educationally significant" with "practically significant"? The former phrase seems ambiguous. My \$0.02.

KDeRosa said...

Michael, generally speaking I agree with suggested replacement since it clarifies he point; however, "educationally significant" is a term of art in education research.

The confusio is the result of me glossing over the practical significance of requireing a 1/4 sd change in the first place. For educational interventions, when the experiment shows a change of less than 0.25 sd, the real world effect of that intervention will likely be imperceptible. Therefore, it would not be practical to mplement the intervention outside of an experimental setting.

I generally discount any educational intervention with results less than 0.50 sd. The real world results will likely be small and not worth the opportunity cost associated with forgoing other more successful interventions.

The problem is that almost no education interventions achieve a consistent effect size greater than 0.50 which is why I find myself stuck writing about Direct Instruction so much.

Michael said...

Thank you for the clarification, Ken.

rightwingprof said...

More to the point, I think we need to get people to understand that statistics is always about probability, and that a statistically significant difference at an alpha of 0.05 has a 5% chance of being due to random variation. That's precisely why replication is so important.