June 1, 2006

Education Research Stinks

I almost fell over as I read this editorial in the Atlanta Journal-Constitution. The editorial discusses Ed research and how lousy it is:
The public risks whiplash keeping up with the latest twists and turns in the education research. One day, block scheduling is a fresh approach that all high schools ought to adopt. The next, it's derided as stale and moldy.


These conflicts reflect a seldom acknowledged truth in education: There's a lot of uncertainty about what works and what doesn't.

In the meantime, schools invest millions of dollars in innovations and reforms that sound impressive, but don't amount to much more than a new coat of paint on a rickety old house. And when those reforms don't deliver the promised results, schools cling to them anyway because they've spent too much money and time to walk away.

Wow. That's pretty good. A journalist actually understanding the quality of research. How often do you see newspapers acknowledge that research isn't sometimes all that good. Usually, they just accept the researcher's conclusion uncritically, especially if it coincides with their ideological agenda, and play-up whatever data is handed to them.

The emphasis on student achievement in the federal No Child Left Behind Act and the requirement to use data to substantiate outcomes are prompting researchers to devise more reliable ways to capture effectiveness.

"The biggest revolution caused by No Child Left Behind is the revolution in education research," says Georgia State University's Gary Henry, a scholar in educational policy and evaluation. "We are getting better at figuring out what works. But what we are seeing is almost nothing that has a very large effect."

Even when the research shows a gain, it's a very small gain produced under the best of circumstances. That's because most reforms only tug at the edges and don't address central flaws in public education: A teacher's track record of improving student performance is hidden from public view, and that performance is not used as a factor in teacher salaries.

Now go back and re-read the part I emphasized; it's probably the most important thing you'll read this week. If you're a journalist or edu-blogger, understanding this little truism will result in you making fewer silly statements, such as touting your favorite edu-theory which inevitably lacks any indicia of success.

Here are a few simple rules to live by:

1. If the results of an experiment are not statistically significant (p ≤ 0.05), then the results are not reliable. If the researcher fails to indicate the p value it is safe to assume the results are not statistically significant. However, a p value of less than 0.05 does not transform a statistical association into a causal association. It just means that there is the requisite level of confidence that the observed statistical difference between the comparison groups are not due to chance. Whether the statistical association is in fact a causal association will depend on additional factors, such as a well designed experiment.

2. If the effect size is less than 0.25 standard deviation, the results are generally not considered to be educationally significant. This just means that we're don't want to waste good money pursuing educational interventions with small effect sizes because invariably the effect size of the intervention in actual use will be less than the results observed in the lab. To put this in perspective, an intervention with a ¼ SD effect size will only raise the performance of a typical Title I school from the 20th percentile to the 28th percentile. It takes an effect size of about one standard deviation to raise their performance to the average school--the 50th percentile. In education research, effect sizes larger than ¼ standard deviation are rare.

3. Poorly designed experiments are neither science nor reliable. The big shortcomings here are lack of comparison groups, lack of randomization or pre-testing of the experimental groups, not accounting for attrition efects, control group participation in the intervention, the presence of selection or confirmation bias, and other methodological flaws. To put it bluntly, most education researchers do not know how to conduct valid experiments.

4. Results cannot be extrapolated beyond the parameters of the experiment. This one is a biggie. Experiments are only valid for the conditions under which the experiment is conducted. So, if the experiment properly concludes that reducing class sizes to 13 students may benefit kindergarten age students, it is improper to extrapolate those results and claim that similar class size reduction might benefit all K-8 students or that a class size reduction to 20 students would have a similiar effect. Such conclusions are not supported by the evidence.

There is very little education research (and subsequent advocacy) that meets all these criteria. Typically, when any serious meta-analysis of education research is conducted, about 90% of it has to be discarded as lacking. Journalists and edu-bloggers should be very leery relying on shaking education research. You don't want to make the make the mistake the new think Tank Review Project made this week by relying on shaky research to make over-hyped points criticisizingthe Reason Foundation's universal preschool report and wind-up the laughing stock of the edu-sphere.

Follow-up Post: Ed Research that Isn't So Stinky


SteveH said...

Many education researchers are really just after-the-fact validators. You have to follow the money.

Even statistically valid results may not be useful if the assumptions are all wrong. Supplemented Everyday Math might show real improvement (over what?), but won't get the kids to a proper course in algebra in 8th grade.

Laura said...

Here's where our dilemma lies, though:

"Apparently, that's just what schools fear, a flood of active parents armed with data showing that Ms. X raises test scores year after year and Mr. Y does not. Because that would force schools to do something about Mr. Y— either offer him professional development or suggest a career change."

We can get rid of Mr. Y., but then we're a teacher short. Where are we going to find another Ms. X. to replace Mr. Y.? The supply is short, my friend.

Okay, so let's say we'll seek to make Y an X. How do we do that? We study exactly HOW X gets those results and attempt to duplicate her methods. That's what all of these ill-researched theories are supposed to do. We can't simply bottle "essence of X" or previous experience. We have to pick apart her methods and see what she does that causes her to get consistently better scores, or, have her teaching entire grade levels, which, surely, you know would not work.

"No reform will revitalize public education until schools stop protecting ineffective teachers and start rewarding effective ones. Parents have the right and the responsibility to try to get their kids in the classes of the teachers with proven skills."

But just where are the magical "effective teachers" going to come from? They come from theories, misguided or not. That's how skills are gained, forming and re-forming theories plus experience. I bet even Ms. X. didn't ALWAYS have previous good scores to point to. Unless the contention is that good teachers are born and not made. Then we are hopelessly lost.

KDeRosa said...

Hi Laura.

That's why I stopped quoting the article where I did. I disagreed with the rest of the conclusions drawn for many of the same reasons you point out.

The idea is to start using what we know works and improving upon those methods, not looking for great teachers. The teacher is only one important part in the scheme of education. There are many other components that must also be present.

SteveH said...

"But just where are the magical "effective teachers" going to come from?"

Can you define and calibrate "effective" for me? One should make a distinction between inexperienced, incompetent, lazy, and just plain bad. Schools can and should bring inexperienced teachers up-to-speed quickly. Unfortunately, many ed school grads need a whole lot of help beyond the opinion-based pedagogy they learned in school. Being an effective teacher starts in school and does not require only experience or polyjuice. Parents should be able to expect a certain level of competence from a college graduate.

I have also heard the argument that public schools can be fine if you get the "good" teachers. They weren't talking about experience here. Some of the teachers with the highest seniority were the worst. One of our public schools couldn't get rid of a bad teacher, so they had to distribute his poor teaching across all of the kids so it wasn't concentrated on just a few.

However, this "effective" teacher solution to fixing schools hides the fact that one of the reasons that some teachers are effective is that they (try to) make up for bad curricula. This only goes so far.

You will have far more "effective" teachers with a proper curriculum and teaching methods. Effectiveness is more than just experience.

Laura said...

Thank you, Kderosa. I think we have reached common ground!

Steveh, I don't believe I ever implied that it was only experience that made a good teacher. I would, however, suggest that it takes any teacher a few years to come into his/her own. That is why in my state we are only "initially licensed" the first 3 years.

College graduates can only run classrooms so well. No amount of student teaching can FULLY prepare one for running one's OWN classes. Experience is not the only thing, but it IS an essential factor.

SteveH said...

"Experience is not the only thing, but it IS an essential factor."

I need to be more specific. Schools must be able to deal with teachers at all experience levels. Curricula and teaching methods can make this problem better or worse.

Schools cannot use lack of experience as an excuse. Many progressive teaching ideas (thematic coverage, teacher as guide on the side, full-inclusion, child-centered mixed ability group learning) require a much higher degree of teacher experience to be effective, if at all. My complaint is that schools implement techniques like differentiated instruction with only a fuzzy plan on how they will work. At best, schools ask for more money for teacher training in the hope that sometime in the future, everything will work out.

Experience and effective teaching are not the only things, but most schools don't want to talk about anything else.

allen said...

I'd like to point out that the observation that better teachers teach better is hardly groundbreaking. I might be worthwhile to give some consideration to why that observation has never made it into practice.

John at AFT said...


I'm interested in rule to live by #2.

Can you point out a few studies that show reforms or conditions that produce an effect size large enough to meet your criterion?

Ed Researcher said...

Thanks for posting these rules of thumb. This is useful for consumers of research.

However, they are only rules of thumb, and each piece of research evidence needs to be assessed in context.

I really don't like #2 (effect size needs to be 0.25). It all depends on what you are comparing. If you contrast small classes with regular classes then yes, you need a large effect to justify the added resources (extra teacher salary costs). But if you compare class size with another use of the same amount of money (e.g. spend it all on professional development), then a very tiny effect size is evidence that one alternative is worthwhile.

In other words, you need to set effect sizes into a cost-benefit or cost-effectiveness framework to evaluate their size.

When costs are unknown or hard to measure, I would at least want to translate effect size into another metric, such as the weeks of achievement growth or movement along the skill distribution (say, 50th to the 55th percentile).

KDeRosa said...

Hi Ed Researcher.

Everything has an opportunity cost. If a school wants to implement x this will probably prevent them from implementing y. If x is more effective than y, than you'd favor implementing x. But if the predicted effect size of x is small, you're probably not going to want to implement x either. You'll want to save your resources until you find z which has a larger effect size.

As I point out in a later post, the typical Title I school needs to improve performance by about 0.84 sd to perform like a mainstream school. This is a large effect size. It's unlikely that schools will stumble upon the right mix of small size interventions that will add up to such a large effect size. Better to pick the intevention that's close to what you need.

Ed Researcher said...

This is dumb. What if you spend $5 implementing a program that affects 100 kids? Then if you get an average effect size of 0.01 sd you have a cost effective program.

However, if you spend $1,000 per kid reducing class size and you get an effect size impact of 0.20, what do you have? Not so clear. With some other intervention that costs $1,000/student you might get a larger or smaller effect size.

KDeRosa said...

What if you spend $5 implementing a program that affects 100 kids? Then if you get an average effect size of 0.01 sd you have a cost effective program.

I'd characterize it more as an educator's choice between things that cost and perform pretty much the same. In the real world, programs are not mutually exclusive and effect sizes aren't stackable.

Let's say this school needs to improve by .20 SD and decides to implement 10 different programs having effect sizes of .02. and each costing $10. What's the chance that the school will see a combined .20 effect size or a .10 effect size for that matter? Probably zero, but the $100 will still be gone. What is really lost is the opportunity to have done something different, something more effective.