October 22, 2009

Actually, That's Not What the "Research" Shows Either

In her third post of her new blog, The Educated Reporter, Linda Perlstein makes an (all-too-common) error that reporters should not be making, especially reporters claiming to be educated.

Perlstein starts off well by criticizing the inane trope President Obama repeated about "teacher effectiveness" in a recent speech.

In his education speech to the Hispanic Chamber of Commerce in March, President Obama said, “From the moment students enter a school, the most important factor in their success is not the color of their skin or the income of their parents. It’s the person standing at the front of the classroom.”


To put it bluntly: “He’s wrong.”

Indeed.  First of all, the research on this issue isn't really research in the conventional sense that there are properly conducted controlled studies on point.  There aren't.  The "research" is merely correlational studies which does not rise to the level of real scientific research.  At best, these studies might suggest profitable future avenues to pursue for conducting real scientific research.

Furthermore, these correlational studies are on teacher effectiveness, not "the person standing at the front of the classroom."  There are a lot of variables tied up in the term "teacher effectiveness": the teacher, the pedagogy, the curriculum, the classroom environment, and the like.  Only some of these variables are under the control of the person standing in front of the classroom, that is, the teacher. Moreover, the correlational studies are incapable of teasing out which variable(s) is responsible for correlation with student outcomes, in any event.

Nonetheless, this trope gets trotted out all the time in the dopier quarters of the edusphere.  And, Perslstein is right to jump on it.

Perlstein, however, steps in it when she tries to state what the research actually shows:

Of the various factors inside school, teacher quality has had more effect on student scores than any other that has been measured. (emphasis in original)

To put it bluntly: “She’s wrong.”

First, to the extent that the studies are correlational in nature, they are incapable of showing that a variable, in this case teacher effectiveness, had "more of an effect" on anything, including student scores. The studies only show that "teacher effectiveness" (however the study attempted to define the variable) is correlated with student scores by some small amount.  Correlation is not causation.

Second, "teacher effectiveness" is not the most effective factor inside school.  See here and here.  I don't know how this particular trope got started, but it is amazing how often it gets uncritically trotted out by education reporters and bloggers.  Look at Perlstein's conclusion:

But for now, just remember: When you read that teachers are the most important school factor, you can’t drop the “school” and pass it on.

Regardless of whether you drop the "school" caveat or not, you should not be passing it on-- because it's not accurate.

Any educated reporter commenting on education should know this.

Welcome to the edusphere, Linda.

23 comments:

Corey Bunje Bower said...

So what's the most important in-school factor?

KDeRosa said...

Corey, I don't think we know which in-school factor is most important. I merely pointed out one curricular based reform that consistently has a bigger effect size than the teacher effectiveness research. That counterexample is sufficient to disprove Perlstein's premise.

This reminds me that I have to add your blog to my blogroll. In fact I;m going to to that right now before I forget again.

Matthew K. Tabor said...

Ken,

I'm glad you pointed out in such detail that what too many policy wonks and ed writers assume is valid research simply... isn't. I agree that most of the time what we read of as scientific research at best paves the way for real research.

My gripe was that Perlstein apparently didn't even read the speech she linked to. There was no need for her or others to point out that factors off school grounds are more important; his comments were explicitly about the school environment. When one reads the rest of President Obama's paragraph, one wonders why Perlstein went in this direction at all.

If we want to talk about whether curriculum or teaching matters most, *there's* a conversation worth having.

Corey Bunje Bower said...

I wholeheartedly agree with both of you that there's a lot of bad research out there that deserves closer scrutiny. But I have to ask if you (Ken) are implying that only randomized experiments count as research.

And, by the way, determining what the largest effect size and what's the most important are not the same thing. The former applies when looking at what will induce the biggest change while the latter applies when looking at what contributes most to the way things are.

Stephen Downes said...

Her point is more accurately stated as follows: "Certainly nobody has ever proven that good teaching matters more than, say, genetic endowment, or home environment."

Which is how she represents the perspective of the reserachers she cites - quality researcher Dale Ballou at Vanderbilt’s National Center on Performance Incentives. As well, Doug Harris at University of Wisconsin, Doug Staiger at Dartmouth.

Of course, you don't mention that she actually cites researchers, or that she has some other, more substantial point to make than the trivial quip you've picked up on.

Instead, you criticize some generic un-named and uncited 'research' for not satisfying your particular standards. Unless you're going to get her for a point she is actually making, you have nothing. Nothing!

Welcome to the edusphere, indeed. The seamy side.

Parry Graham said...

Ken,

I quibble with your post.

The overall tenor of the post strikes me as arguing that “teacher effectiveness” isn’t that big a deal, and I disagree. Despite your statement that “correlation is not causation”, a body of correlational studies that consistently suggest a relationship between variables is a pretty darn good place to start. And there is a body of correlational studies that suggest that the teacher to whom a student is assigned matters. To put it more specifically, research seems to suggest that some teachers are able to consistently get students to learn and achieve at higher levels than other teachers, and that these student learning disparities are related to teacher-specific variables.

The research also seems to suggest that these teacher-specific variables are not certification, years of experience (after the first couple years), or advanced degrees. Instead, it seems to be something else.

I agree that these research studies are not conclusive, and I doubt that many studies examining student learning will get close to anything approaching conclusive. Random assignment experiments are incredibly difficult to accomplish in education.

I also agree that teachers do not teach in a vacuum, and that there are factors that are not teacher-specific that likely confound “teacher effectiveness”. Curriculum matters (as suggested by studies of Direct Instruction), leadership within the school matters, there was a study I read recently that suggested that the effectiveness of colleagues matter.

Nevertheless, I have a hard time finding research that identifies school-based variables that appear to have a more significant impact on student learning than “teacher effectiveness”. Project Follow Through, if I am representing it correctly, demonstrated large effect sizes for DI for students within specific grade levels (primary and elementary, correct?) and specific subjects. But is there a DI equivalent for middle or high school subjects? Are there any other curricular programs that have demonstrated similar effect sizes?

I think one of the reasons “teacher effectiveness” is such an important issue is that it affects every child, and it appears to be a variable that varies quite a bit.

(One might even make an argument that scripted curricula, such as DI, are really an attempt to improve “teacher effectiveness” by dictating the curricular, assessment, and instructional practices that are typically left to individual teachers to decide)

Parry

KDeRosa said...

Corey, I'm not trying to imply that only randomized research counts as research. I think that quasi-experimental research designs should also count.

How would you determine what is important?

Dick Schutz said...

Not sure that this is a "trope," but it's certainly a deep and wide misconception. It can be traced back to a mis-analysis of the "First Grade Reading Studies" which preceded Follow Through and which also used "Planned Variations" methodology.

http://www.ciera.org/library/archive/2001-08/200108.htm

"In the early 1960s, in an effort
to settle the debate about the best way to teach beginning reading once and for all (this time with the tools of empirical scholarship rather than rhetoric), the Cooperative Research Branch of the United States Office of Education funded an elaborate collection of 'First Grade Studies,' loosely coupled forays into the highly charged arena of preferred approaches to beginning reading instruction.[30] While each of the studies differed from one another in the particular emphasis, most of them involved a comparison of different methods of teaching beginning reading. They were published in a brand new journal, Reading Research Quarterly, in 1966. . . The First Grade Studies had an enormous impact on beginning reading instruction and indirectly on reading pedagogy more generally. One message of the First Grade Studies was that just about any alternative, when compared to the business-as-usual basals (which served as a common control in each of 20+ separate quasi-experimental studies), elicited equal or greater performance on the part of first graders(and, as it turned out, second graders).[31] It did not seem to matter much what the alternative was-language experience, a highly synthetic phonics approach, a linguistic approach (control the text so that young readers are exposed early on only to easily decodable words grouped together in word families, such as the -an family, the -at family, the -ig family, etc.), a special alphabet (i.e., the Initial Teaching Alphabet), or even
basals infused with a heavier-than-usual dose of phonics right up front–they were all the equal or the better of the ubiquitous basal. . . By accepting this
message, the reading research community was free to turn its efforts to other, allegedly more fruitful, issues and questions–the importance of the teacher quite irrespective of method, the significance of site, and the
press of other aspects of the curriculum such as comprehension and writing.[32] With the notable exception of the Follow-Through Studies in the 1970s, which are only marginally related to reading, it would take another twenty-five years for large-scale experiments to return to center stage in reading.[33]"
________
Some studies turn out to be poster-child research. Teachers loved the fact that they were the center of the universe. Teacher ed institutions loved the fact that "more training was called for." Publishers loved the fact that they were off the hook.

The most important variables in the instruction is what is taught (task differences) and product/ protocol differences (what is used to teach the specified aspirations. Kids learn (or try to learn) what they're being taught. But to see what they've learned, one has to look at what has been taught--not just any test that happens to be handy will yield useful instructional feedback.

Ironically (present company excepted), the nuts and bolts of instructional programs is the last thing anyone is interested in
looking at.

KDeRosa said...

Parry, I agree with many of your points.

However, many educational reform failures find support in a body of correlational studies. A body of correlational studies suggest, at least to me, a small scale controlled study, to confirm the hypothesis as a first step.

I do agree that some teachers do seem to get consistently better results. But I also think that we don't know what key qualities those teachers possess or that they could be replicated at full scale.

Also, I am not convinced that randomized experiments, or at least quasi-experimental studies, shouldn't be done regardless of the expense. We do it for testing medical interventions, why not for educational interventions?

Lastly, if you are comfortable relying on the shaky evidence on teacher effectiveness for an across the board reform, why not also assume that the early grade DI evidence would also be applicable to later grades as well?

KDeRosa said...

Stephen, you have again mischaraceterized a statement (in this case Perslstein's) as the premise upon which your argument is built. Perlstein's actual statement is exactly the opposite to your characterization.

KDeRosa said...

Dick, I like the way the meaning of two of the words in the title of Reading Research Quarterly is completely different than the way the rest of the world uses those words.

Dick Schutz said...

Ken says: "I am not convinced that randomized experiments, or at least quasi-experimental studies, shouldn't be done regardless of the expense. We do it for testing medical interventions, why not for educational interventions?"

Here's why not. In testing pharmaceutical products the randomized trials come at the tail end of a long series of first lab tests and then animal tests. In human tests, there is concern both about the placebo effect (As a rule of thumb, a third of patients improve with a placebo) and detrimental side effects. Further in such tests, one has a defined population who have been carefully diagnosed as having the same etiology.

In instruction, none of these conditions are met. In instruction we are seeking robust/reliable outcomes. For example, we want to teach kids how to read, to do arithmetic, and to accomplish other specified academic aspirations. These are transparent matters that don't require artificial, ungrounded paper-pencil tests.

The students are their own controls: Before instruction they don't have expertise. The "treatment" is intended to deliver the acquisition of expertise. Each individual has to learn personally. The comparative "control" is irrelevant.

Moreover, there is a new cohort of kids entering schools each year and there is a very large population involved.

Each kid is different, but what goes unrecognized that all kids are alike in many respects. The same with teachers, and schools.
With defined product/protocol, analogous to the medical "treatments" it's possible to sort out the differential effects of programs. students, teachers, schools, aggregated by the various demographic characteristics of common interest (e.g. gender, geographical locale, SES, and so on).

For proof that the randomized controlled experiment is a wild goose chase on has to look no further than the "What Works Clearinghouse" Nothing that CCW claims "has promise" will robustly, reliably fulfill any promise.

Rather than weapons of mass instruction, each teacher has to hardscrabble idiomatic instruction on the fly.

(DI is successful because the route to instructional accomplishments is scripted. But that's a whole nother story.)

In medicine, "research" is just the beginning. The process of confirming the appropriate release/adoption is a matter of development. Education turns the matter on its head. We have scads of "research" but no development to speak of, and no regulation of the instructional junk and rhetoric that is foisted on kids and teachers.

KDeRosa said...

Dick, I understand that as currently practiced most of the conditions are not met in education, but that doesn't mean that most can't be or shouldn't be as a first step in education reform. That's why I've been spending so much time discussion the foundation of education -- knowledge, what it is, how it is acquired, how it is mastered, and its benefits.

Curriculum designers should be doing pre-pulication field try-outs. Testing instruments should be improved.

Tracy W said...

For example, we want to teach kids how to read, to do arithmetic, and to accomplish other specified academic aspirations. These are transparent matters that don't require artificial, ungrounded paper-pencil tests.


Nope, they are not transparent. You can't look at a child and say "this kid can do arithemtic" without doing some testing. And then if you test a kid and find out that they can do multi-digit multiplication, you can't from that determine whether or not they can do long division, you have to test that separately if you want to find that out.

And while I agree that ungrounded, paper-pencil tests are not required, some sort of standardisation is needed to be able to conclude with some confidence that the differences in results is from differences in teaching effectiveness as opposed to differences in the examiner's/examiners' methods. Working out standardisation is hard work, it's not transparent. For example, quite often students read things into a question that the examiner did not intend, my mother took some courses in desiging test questions when she did her teaching training and she said it was a fascinating topic and a really hard one, at first out of every 4 questions she wrote, 3 of them would trip up students in ways she hadn't expected.

Tracy W said...

The students are their own controls: Before instruction they don't have expertise.

Actually we don't know this. There are many anecdotal reports of children learning things like reading before school, either from being taught it explicilty, or from being one of the lucky ones that can work reading out from scratch.

While I didn't learn to read before starting school, I had an uncle who used to toss maths facts my way at random. Consequently I knew bits and pieces about maths, like negative numbers, well before the school got around to that subject.

On a more extreme level, my grandfather's older sister, a school teacher herself, decided for some reason that her brother was going to go to university, and tutored him after school herself. Any assessment of his performance at school would have been affected by the results of what he was learning outside of school.

Any study that doesn't control somehow for this varying background is flawed.

Tracy W said...

For proof that the randomized controlled experiment is a wild goose chase on has to look no further than the "What Works Clearinghouse"

This is faulty logic. whatever the vices of the "What Works Clearinghouse", it cannot prove by itself that randomised controlled experiments are a wild goose chase, as any faults in the WWC may be specific to the WWC.

Dick Schutz said...

Ken, I agree.

Tracy, you're pulling text out of context and missing the general point of the post.

Reading and doing arithmetic are transparent. Put a text in front of a kid and say "Read it and tell me about it." If the kid can do this the kid can read. The issue is what to do if the kid can't. But that's an instructional issue, not a testing issue.

I'd said,"The students are their own controls: Before instruction they don't have expertise."

Tracy said, "Actually we don't know this."

That's true only if you are referring to stupid-head instruction. Kids who have the expertise, don't need instruction.

Tracy said: "... whatever the vices of the "What Works Clearinghouse", it cannot prove by itself that randomised controlled experiments are a wild goose chase, as any faults in the WWC may be specific to the WWC."

The thing is, the faults are not specific to WWC. WWC demands randomized control data to determine "what works." Neither the research WWC cites, nor WWC itself is "working."--in the sense of providing useful information.

urbanteach said...

Reading and doing arithmetic are transparent. Put a text in front of a kid and say "Read it and tell me about it." If the kid can do this the kid can read.

That's an almost perfect, succinct description of the Developmental Reading Assessment (DRA).

Guess the tests schools are using are good after all.

Parry Graham said...

“I do agree that some teachers do seem to get consistently better results. But I also think that we don't know what key qualities those teachers possess or that they could be replicated at full scale…

Lastly, if you are comfortable relying on the shaky evidence on teacher effectiveness for an across the board reform, why not also assume that the early grade DI evidence would also be applicable to later grades as well?”


As for the key qualities piece, we don’t know for sure and the research around the “black box” of teaching effectiveness is definitely anything but definitive. Nevertheless, I have my suspicions, both from what I have read of the literature and my own (admittedly anecdotal) experiences working in a variety of schools. From what I can tell, a good candidate for a primary piece of teacher effectiveness is curricular, assessment, and instructional decision-making. In other words, the decisions teachers make about what to teach (which often means translating vague state standards into practical daily objectives), how to teach it, and how to measure what has been taught and ostensibly learned.

In my own experience, I see effective teachers keeping their students cognitively engaged in thinking about curricular content for large portions of instructional time; aligning activities, lessons, questions, etc. with where students’ are, which is essentially saying that they frequently meet Willingham’s litmus test of creating activities that are neither too easy nor too hard (which requires frequent assessment of where students are); providing students with frequent, accurate, specific feedback about their academic progress; and developing positive relationships with students, which aids in student motivation.

As far as across-the-board reform, I am skeptical of any large-scale reform that could have a dramatic positive impact on the teacher effectiveness components I mentioned. What is key for me, however, is the belief that any reform that doesn’t somehow lead to positive changes in those components is unlikely to accomplish anything. I would suspect that DI’s results have something to do with the fact that they standardize those components in ways that have been proven to lead to positive results. But I have a hard time conceptualizing a scripted curriculum that would work as well in later grades, as curricula become increasingly complex.

Parry

Dick Schutz said...

"That's an almost perfect, succinct description of the Developmental Reading Assessment (DRA). Guess the tests schools are using are good after all."

Guess not, unfortunately, for 3 reasons:

--The DRA is based on texts "leveled" with respect to characteristics other than the Alphabetic Code, so it implicitly supports "whole language" instruction.

--The DRA has tight restrictions on the "retell." Some children can read, but have not been taught how to meet the retell requirements.

--The DRA purports to measure the pupil's status in learning to read, not to determine whether or not the child "can read" and therefore requires no further instruction. If a child can't read and understand texts containing words within the child's spoken vocabulary, that's an instructional matter, not a testing matter. The DRA sheds no light on where the instruction went wrong or what to do about it.

The title "Developmental" is a "tell." Reading is not a "developmental" matter. A few kids do learn to read without any {apparent) formal instruction. And some learn to despite unintended mal-instruction. Schools take credit for this placebo effect and the DRA contributes to the fog that enables schools to attribute instructional failures to the kids.

Unfortunately, the DRA is part of the problem, not part of the solution.

Teachers do need indicators of each pupil's status/performance with the feedback tied to whatever program that is being relied upon to deliver reading capability. But the DRA is devoid of such feedback. It serves only to transfer instructional responsibility from the program and the teacher to the child.

Spedvet said...

I think when we think about teacher quality, we are talking about the gestalt of what the teacher is doing in the classroom, and that is: that teacher's use of methodology.

Teacher quality is synonymous with teaching methodology. A quality teacher is one who is proficient in implementing a successful teaching methodology.

Therefore I don't have a per se problem with the term.

Dick Schutz said...

"I think when we think about teacher quality, we are talking about the gestalt of what the teacher is doing in the classroom, and that is: that teacher's use of methodology."

Now if anyone were to specify the "gestalts" and the "methodologies" that are involved, we might have something. As the statement stands, if you added a couple of dollars it would buy a cup of coffee.

Tracy W said...

Tracy, you're pulling text out of context and missing the general point of the post.

Nope. I got the context exactly right, and got a perfect bullseye on the general point of your comment.

And if you doubt my statement here, on what basis do you expect me to believe yours?

Reading and doing arithmetic are transparent. Put a text in front of a kid and say "Read it and tell me about it." If the kid can do this the kid can read.

Nope. Firstly, all you have shown is that the kid can read the text in question. We don't know if the kid can read other texts, if a kid can read "Where the Wild Things Are", we cannot from that safely conclude that they can read a local newspaper. Perhaps they can, perhaps they can't, this needs to be tested.

Secondly, "tell me about it" can mean different things to different people. I suggest you have a look at page 3 of this sample reading lesson.

On this page, first the kids read a short paragraph about a ranch. It starts on page 2. Then they answer questions about the story. Some of the questions are simply asking basic descriptive information easily available in the story, eg "What was the name of the rancher?"
Some of it requires a small bit of critical thinking, eg "Why did Flop have the name Flop?" The story says that the horse Flop had a tendency to rear up and any rider would go "flop" in the grass, so a reader has to deduce that that was why Flop got its name, not exactly Sherlock Holmes material, but a tough job for a computer programme to get right. Then there's a third type of question "Why did the workers think that it was good to have Emma on their side?", which goes into human motivations as well as the text.
And these types of questions are probably not exhaustive.

Now if a kid is capable of only answering simple descriptive questions like "What was the name of the rancher"? can they read? And how many examiners will count them as reading? Different examiners can have different definitions of what it means to be able to read without conciously realising it. It's even possible that the same examiner can change their definition over time without consciously realising it. That's why standardised tests are vital for experiments, the intent is that we can be reasonably sure that a difference in results was due to a different in experimental methods and not due to a difference in measurement.

That's true only if you are referring to stupid-head instruction. Kids who have the expertise, don't need instruction.

Kids who already have the expertise may not need the instruction, but that doesn't mean that they won't get the instruction. And how do you know that the instruction being tested isn't "stupid-head"? Ever heard of Murphy's Law? Science has developed a variety of means to get around humanity's tendency to stupid-headedness, randomised trials are one of those methods.

Furthermore, there is no reason to believe that kids divide neatly into "kids who have no expertise" and "kids who have all the expertise and don't need the instruction". There may well be kids who have part of the expertise but still do need some instruction to achieve mastery. Their learning outcomes risk biasing the results unless controlled for.

As for your comment about WWC, I notice you say nothing to support your claim that randomised control trials (RCT) are a wild-goose chase. Do you now agree that RCT is a worthwhile investigative technique?