February 28, 2009

The School District of Philadelphia's Imagine 2014 Plan

Philadlephia got itself a brand spanking new "CEO" of its schools who promptly issued her plan to reform Philadelphia's well-funded yet under-performing schools. Here it goes.

It's full of pretty words and lofty goals and, by my calculation, stands a zero percent chance of working.

The plan's primary tell is found on page 8.

Elementary Schools


o Ensure that all elementary students are proficient in reading by third grade.

o Utilize multiple assessment tools to identify students who need targeted reading interventions.

o Provide students reading below the 40th percentile with the opportunity to receive half‐hour daily lessons for 12 ‐ 20 weeks from teachers specially trained in the Reading Recovery model.

o Improve reading fluency and comprehension of ELL and elementary students with language deficits by using guided oral reading instruction and building sight word knowledge in order to recognize words quickly when reading.

Guided reading instruction and Reading Recovery for the strugglers that's a recipe for reading failure.

Last I checked, poor readers don't typically find academic success. They also tend to behave poorly -- better to act up than to look stupid in front of your peers.

And just in case a few students learn to read with some proficiency they're going to be hit with:

instructional practices, such as the use of cooperative learning, inquiry‐based instruction, thinking maps, project‐based learning, simulations, hands‐on learning, and integrated technology

because this works so well with students having little background knowledge and low language skills, i.e., many of Philadelphia's currently failing students.

If they had a way to short these plans, you could make a killing.

February 27, 2009

Some Critical Thinking Skills Are Critical

This is a long one kids, so bear with me. There is a payoff in the end.

Wikipedia provides a reasonable definition of critical thinking:

Critical thinking is purposeful and reflective judgment about what to believe or do in response to observations, experience, verbal or written expressions, or arguments. Critical thinking might involve determining the meaning and significance of what is observed or expressed, or, concerning a given inference or argument, determining whether there is adequate justification to accept the conclusion as true.

Critically evaluating assertions, arguments, and proposals, whether presented orally or in print, is an important comprehension skill. Many personal, professional, and social decisions are based on what we are told by other people. Because faulty arguments and propaganda are so common, critical thinking has a role in almost every important decision we make.

The consensus is that the typical K-12 education does not result in students with good critical thinking skills. I agree with this consensus.

What I do not agree with; however, is the notion that critical thinking is some generalized skill that is independent of domain knowledge.

Let's see why this is so by looking at skills needed to critically read a passage of text. I am going to use the procedure outlined in Direct Instruction Reading, 4th edition, chapter 22, which is a simplified version of the skill suitable for elementary school students, but, which sadly most students never acquire.

The four steps in the critical-reading process can be treated as the major component skills:

(1) identify the author's conclusion; that is, what does the author want the reader to believe?

(2) Determine what evidence is presented; that is, what does the author present to convince the reader? Evidence or opinion?

(3) Determine the trustworthiness of the author, that is, can the reader trust what the author says?
  • Does the evidence come from a qualified person?
  • Does the person have biases?

(4) Determine if the conclusion derives from the evidence. Identify any faulty arguments.
  • Tradition, either old or new (sometimes called a bandwagon effect)
  • Improper generalization
  • Confusing correlation with causation (or coincidence)

Those who think critical thinking/reading is a generalized skill are confusing the general procedure with the act of performing the procedure itself. A student might have learned the procedure, but be unable to perform the procedure adequately for a particular text. That's where domain knowledge comes into play.

The best way to see this is by way of example. So, let's use the procedure to critically read a typical passage that students might encounter.

Thomas Edison, the inventor of the lightbulb, was seriously concerned about the increasing use of alternating current as a form of electricity. Edison believed that because alternating current involved so much more current than direct current, alternating current was a threat to the nation. Many fires were caused by alternating currents. In fact, alternating current was used in Sing Sing to electrocute criminals. Direct current was used with light bulbs for many years. Edison felt direct current was still the best form of electricity.

Another example of the dangers of alternating current has just occurred. A house wired with alternating current caught fire and burned to the ground. The fire started when an electrical wire became so hot that a wall caught fire. Alternating current will eventually cause a fire whenever it is used. Direct current rather than alternating current should be used for lighting.

The Daily Post used direct current to light its press room for over a year. Reporters are much happier now. They write more interesting stories. Sales of the newspaper have increased dramatically. The Daily Post is now the most popular newspaper.

Identify the Author's Conclusion

First the students must use details from the passage (seriously concerned, a threat to the nation, direct current is still the best) to form a main idea (or author's conclusion). Identifying an author's conclusion is a continuation of summarization skills.

Right off the bat you can see that critical reading/thinking is dependent on domain knowledge and the student's ability to extract the author's conclusion from the text and justify that conclusion with supporting details from the text. A student might be able to extract the main idea from Dr/ Seuss's Green Eggs and Ham, but be unable to extract the main idea from one of Plato's Dialogues. If critical thinking/reading were a generalized skill, then this wouldn't be the case. The student's inability to identify the author's conclusion will impact his ability to perform the remainder of the procedure.

Discriminating Evidence from Opinion

The second step is for the student to decide whether the author's conclusion is based on opinion or evidence. If it is based on opinion, students must understand that the conclusion is nothing more than a suggestion by the author about what people should think. A conclusion based on opinion does not imply that the student should believe or act on it.

In the example passage, both opinion and evidence are used to support the author's conclusion. The statement that alternating current is a threat to the nation is opinion. The other details are evidence used to justify the author's conclusion (occurrence of fires, electrocution of criminals, initial use of direct current as an energy source.)

This step is also intertwined with domain knowledge. Being able to discriminate between fact and opinion requires that you understand the underlying statement in the first place.

Fact or opinion?

In my judgment, the total enthalpy of any non-isolated thermodynamic system tends to decrease over time, approaching a minimum value.

It's not an opinion, even though it uses a phrase ("In my judgment") that typically indicates opinion. It is evidence. The statement is the second law of thermodynamics. But, you wouldn't know this unless you knew thermodynamics, and, thus, you wouldn't be able to discriminate between opinion and evidence.

In actuality the statement is inaccurate evidence. The statement is not the second law of thermodynamics; I changed some of the terms to make it inaccurate. Again, you wouldn't know this unless you knew thermodynamics.

Determining the Trustworthiness of an Author

The third step consists of several questions, all relating to the reliability or trustworthiness of the person presenting the argument.

Question (a) is whether the evidence comes from a qualified person. Since Edison was definitely an expert on electricity in the late 1800s, he was qualified.

Again, this determination requires that you know something about Thomas Edison. That's domain knowledge.

Question (b) concerns biases the expert might have. In Edison's case, two major biases existed. One was his deep personal and involvement in a company that provided direct current. He stood to lose money if alternating current replaced direct current. Also, his reputation was at stake. He became famous, in part, because of his discovery of the lightbulb and a distribution system for electricity based on the use of direct current. If alternating current replaced direct current, his reputation might be diminished.

Again, we need domain knowledge knowledge to make the determination.

Since Edison's biases contribute to the passage's conclusion, the evidence he cites may not be trustworthy.

Since there is doubt about the trustworthiness of the author, students must seek information from different experts. The statement that direct current is the best form of electricity is disputed by many experts. Alternating current can be transmitted great distances, but direct current cannot (at least during the relevant period). If remote areas are to receive electricity at a reasonable rate, alternating current is a necessity.

So, since the expert is biased and alternate interpretations of the evidence are compelling, the evidence is probably not trustworthy.

Again, the student must know quite a bit about electricity and its ability to be transmitted to make seek out the appropriate sources and understand the information presented. take a look at this article on electric power transmission and see if you can make sense of the differences between direct current transmission and alternating current transmission. And then you have to be aware of the history of electric power transmission to understand that the information this article presents was unknown during the time that our passage was written. But, how would you know this without quite a bit of domain knowledge in both electricity and the history of electrical power transmission? Google may be your friend, but he's not this good of a friend for you.

Identifying Faulty Arguments

The final step in the critical-reading process is deciding whether a conclusion legitimately derives from the evidence. In many arguments, valid evidence will be presented, but then a conclusion will be drawn that does not derive from the evidence. In the alternating-current example, one possible interpretation is that since direct current has been used with lightbulbs for many years, it should continue to be the best form of electricity. This faulty argument illustrates the use of tradition: what has been the best must continue to be the best. Conclusions based on tradition are not necessarily true. What has worked well may continue to be the best procedure, or a better procedure may be developed. Students can disregard conclusions based on tradition. (Note that the same attitude can be taken toward newly developing traditions; i.e., "Everybody is starting to use alternating current; therefore, you should, too." A conclusion that a product or procedure is better because it is popular is faulty.)

In the second paragraph, there is an example of improper generalization. One valid example is presented, but then a conclusion is drawn that applies to all examples. One fire caused by alternating current does not mean that alternating current will cause a fire every place it is used. Improper generalization occurs often: "I saw a rich person who was rude. What makes rich people rude?" "We sat next to a long-haired man in the movies. He smelled. I'll bet he hadn't bathed in weeks. Long-hairs should take better care of their bodies."

The third paragraph involves a confusion of causation and correlation. An event that is associated with success or some other positive outcome through coincidence is erroneously concluded to be the cause of the positive outcome.

Direct-current lighting is associated with happier reporters, more interesting stories, and greater sales; however, direct current did not necessarily cause reporters to be happier. The electric lighting that produced the positive outcomes could have been achieved with direct or alternating current. Conclusions suggesting causation that are, in fact, based on correlation can be disregarded. Confusion of correlation and causation is often made: "Joe Blow uses Squirt-Squirt deodorant, and girls always chase him." "Sally took You-Bet-Your-Life vitamins every day. She lived to be 106."

Of course, there are many other logical fallacies and faulty arguments that the student should know, but these three are a good start for primary grade students since they are among the most common.

Sadly,this last step is often completely ignored in most schools. Most students are simply not taught how to identify faulty arguments at all. And let me be crystal clear here: it should be taught. And, I'm guessing that the reason why it isn't taught more often, or at least learned by most students, is because it is the culmination of basic reading instruction which is rarely reached by many students.

Many students never learn how to decode accurately and proficiently in grade-level texts. This is criminal, quite frankly, because this is something we have known how to effectively teach for some time now. Then students need to comprehend what they've decoded well enough to extract the main idea and the justifications therefor. Most students can't do this competently when the main idea is presented to them as an alternate in a multiple choice format, much less being able to generate one on their own. And, if a student can't do this initial step in critical reading, how is she going to accomplish the accomplish the remaining steps which require an understanding of the author's conclusion in the first place?

So, what we have is two separate problems. One problem is that students are often not taught how to spot faulty arguments and logical fallacies which are important skills needed to think and read critically. No one seriously disputes that this is a problem. These skills should be taught. Period.

But, even if students are taught these skills, they will not necessarily be able to think and read critically because knowing how to identify and distinguish faulty arguments and logical fallacies is not the same thing as critically thinking/reading as many proponents of 21st Century skills seem to think. Critical thinking/reading involves much more than identifying faulty arguments, as I described above. And, the ability to identify faulty arguments is not the generalized form of critical thinking as some seem to think.

There is another problem that must also be dealt with. That problem is that students can't comprehend well enough to extract the author's conclusion, to discriminate evidence from opinion, and to determine if the author is trustworthy. All of these skills require and are a function of the student's domain knowledge. You have to know a lot of stuff, to think about a lot of stuff.

This is just a roundabout of saying that there are prerequisites to being able to think/read critically that also need to be addressed (and have never been adequately addressed) before students can make use of the "21st century skills" many think are important, like being able to identify faulty arguments and logical fallacies.

What's the sense of teaching a student astrophysics if the student doesn't know how to do basic arithmetic or understand basic science? Perhaps this is why schools haven't traditional taught these things in the first place.

February 24, 2009

Rebranding our way to better schools

Our new Education Secretary, Arne Duncan, wants to fix our education woes by, wait for it:

“Let’s rebrand it,” he said in an interview. “Give it a new name.”

How about the We Still Don't Know How to Spend the Money Effectively Act?

That gets right at the heart of the matter. It also implies a solution: find out how to spend the money effectively.

This seems to me like the logical starting point. So, who knows of any federal programs with the goal of finding out what predictably works backed by evidence?

February 18, 2009

Today's Video

Classroom Management presentation by Saul Axelrod.

Lots of good tips. Long.

Direct Link

February 17, 2009

Alfie Kohn and the Murray Gell-Mann Amnesia effect Part II

Continuing on from Part I.

We're now getting into the last prong of Kohn's main argument, after which he appears to take a kitchen sink approach and throws in all the remaining negative information he could find.

Kohn's last argument is based on a logical fallacy:

Finally, outside evaluators of the project – as well as an official review by the U. S. General Accounting Office – determined that there were still other problems in its design and analysis that undermined the basic findings. Their overall conclusion, published in the Harvard Educational Review, was that, “because of misclassification of the models, inadequate measurement of results, and flawed statistical analysis,” the study simply “does not demonstrate that models emphasizing basic skills are superior to other models.”

As a preliminary matter, Kohn fails to mention that the "outside evaluators" he's referring to, House et al. were funded by the Ford Foundation which had also funded a few of the losing models in PFT. As such, Kohn's source has "potential bias" issues which Kohn fails to alert his readers to. Kohn also fails to alert his readers to all the other similar "outside evaluators" which analyzed both the PFT data and House's analysis and came to a different conclusion. These other outside evaluators are no more biased than House, so its curious to see why Kohn would fail to mention them.

Next comes the first of Kohn's logical fallacies. Kohn commits the fallacy of division (or whole-to-part fallacy) when Kohn claims PFT "simply 'does not demonstrate that models emphasizing basic skills are superior to other models.'" Even if the basic skills models as a whole weren't superior to the other models doesn't mean that the DI model alone wasn't. The data certainly shows that the DI model was the superior performer.

The point here is that Kohn is criticizing DI yet has already started to veer off and is trying to mislead readers by dragging in information pertaining to other programs or the more general classification of basic skills programs.

Next Kohn makes an appeal to authority, another logical fallacy, when he mentions that the results were "published in the Harvard Educational Review." I'd be willing to cut Kohn some slack had he mentioned all the other journals of the research he cites. But this is the only one he cites. Coincidence? What I do know is that he failed to mention another study on PFT that was also published in the Harvard Educational Review that affirmed the findings of PFT. Another convenient oversight.

Kohn again buries the weakest the parts of his argument in a footnote. Kohn parrots the House study's findings which is based on a reanalysis of the PFT data. This reanalysis was not without it's own problems as set forth in another study by Bereiter et al which reanalyzed the House reanalysis (It should be mentioned that this study has the same potential bias problems as the House study since Bereiter had professional ties to DI before PFT).

Let us therefore consider carefully what the House committee did in their reanalysis. First, they used site means rather than individual scores as the unit of analysis. This decision automatically reduced the Follow Through planned variation experiment from a very large one, with an N of thousands, to a rather small one, with an N in the neighborhood of one hundred. As previously indicated, we endorse this decision. However, it seems to us that when one has opted to convert a large experiment into a small one, it is important to make certain adjustments in strategy. This the House committee failed to do. If an experiment is very large, one can afford to be cavalier about problems of power, since the large N will presumably make it possible to detect true effects against considerable background noise. In a small experiment, one must be watchful and try to control as much random error as possible in order to avoid masking a true effect.

However, instead of trying to perform the most powerful analysis possible in the circumstances, the House committee weakened their analysis in a number of ways that seem to have no warrant. First, they chose to compare Follow Through models on the basis of Follow Through/Non-Follow Through differences, thus unnecessarily adding error variance associated with the Non-Follow Through groups. Next, they chose to use adjusted differences based on the "local" analysis, thus maximizing error due to mismatch. Next, they based their analysis on only a part of the available data. They excluded data from the second kindergarten-entering cohort, one of the largest cohorts, even though these data formed part of the basis for the conclusions they were criticizing. This puzzling exclusion reduced the number of sites considered, thus reducing the likelihood of finding significant differences. Finally, they divided each effect-size score by the standard deviation of test scores in the particular cohort in which the effect was observed. This manipulation served no apparent purpose. And minor though its effects may be, such as they are would be in the direction of adding further error variance to the analysis.

The upshot of all these methodological choices was that, while the House group's reanalysis largely confirmed the ranking of models arrived at by Abt Associates, it showed the differences to be small and insignificant. Given the House committee's methodology, this result is not surprising. The procedures they adopted were not biased in the sense of favoring one Follow Through model over another; hence it was to be expected that their analysis, using the same effect measures as Abt, would replicate the rankings obtained by Abt. (The rank differences shown in Table 7 of the House report are probably mostly the result of the House committee's exclusion of data from one of the cohorts on which the Abt rankings were based.) On the other hand, the procedures adopted by the House committee all tended in the direction of maximizing random error, thus tending to make differences appear small and insignificant.

It is one thing to fail to mention that your source is potentially biased due to its financial ties to both some of the losing programs in PFT and the "outside" evaluators. But failing to mention that your source has been itself criticized as systematically adopting procedures which all have the effect of minimizing the direction of error, many with dubious or no scientific validity, in favor of the very programs with the corporate ties is quite another. The prudent advocate would at least attempt to explain these criticisms away, but in any event, readers should be made aware of these significant infirmities in your underlying studies.

Kohn concludes his main argument with a gratuitous swipe:

Furthermore, even if Direct Instruction really was better than other models at the time of the study, to cite that result today as proof of its superiority is to assume that educators have learned nothing in the intervening three decades about effective ways of teaching young children. The value of newer approaches – including Whole Language, as we’ll see -- means that comparative data from the 1960s now have a sharply limited relevance.

I bet Kohn wishes he could take that crack about whole language (an educational philosophy so bad it's advocates had to re-brand it to disassociate it from the lengthy trail of negative data it amassed) back now.

And, what evidence is there that today's educators have learned anything since the mid 1970's (not the 60s as Kohn claims)? The longitudinal NAEP data tells a much different story.

Next Kohn gets into the kitchen sink part of his argument. Let's take each in turn.

  1. First Kohn cites some newspaper accounts on DI. Kohn admits these accounts are anecdotal. I agree with Kohn, but I'm wondering why he included them in his argument anyway. I'm guessing lurid innuendo. I could go into detail refuting the points Kohn recounts, but it's not worth the effort for anecdotes such as these.
  2. Next, Kohn makes another fallacy of division when he states "it’s common knowledge among many inner-city educators that children often make little if any meaningful progress with skills-based instruction." Of course, there's lots of data from inner-city schools pertaining to DI that show that this statement isn't true with respect to DI.
  3. Last, Kohn claims that there is "a lot more research dating back to the same era as the Follow Through project supports a very different conclusion" and then goes about citing various longitudinal studies which purport to show better long-term outcomes for various child-centered P-3 programs (which Kohn prefers) to DI programs. Apparently, Kohn hasn't heard of "confounding variables." And, chooses to ignre th efact that many of these studies were conducted by the sponsors of these programs. And, that some had serious methodological flaws and fails to mention that the High Scope study was also sponsored by the same people responsible for one of the worst performers in PFT.

Here's Kohn's big conclusion:

Still, with the single exception of the Follow-Through study (where a skills-oriented model produced gains on a skills-oriented test, and even then, only at some sites), the results are striking for their consistent message that a tightly structured, traditionally academic model for young children provides virtually no lasting benefits and proves to be potentially harmful in many respects.

This is only true for the studies Kohn has chosen to cite which are only the "negative" ones. Kohn has basically cherry-picked the studies and excluded all the ones showing positive effects, such as Gary Adams' meta analysis and the underlying studies. Kohn also cites his research uncritically and has ignored all the criticism directed at the studies he cites. He doesn't even offer an explanation, he simply ignores them. He also ignores all the potential bias problems and methodological flaws in his cited studies. Kohn seems to have an unrealistically high standard for DI studies and a very low one for research with conclusions he likes. I find it hard to believe that any study cited by KOhn in any of his books could withstand the constraints imposed by House et al. on the PFT data.

In short, Kohn's "hard evidence" against DI appears to be almost exclusively opinion, rather than fact. Little of this opinion is supported by data, though Kohn's use of selective quotes from "research" attempts to convey that impression. And, completely ignoring all the contrary evidence and presenting such a one-sided evaluation based on that cherry-picked "evidence" is reprehensible for anyone claiming to to be dispassionate.

In this analysis of DI, Kohn has shown himself to be an untrustworthy advocate and the same pattern of scholarly malfeasance is evident in all his writings I've read.

February 13, 2009

Alfie Kohn and the Murray Gell-Mann Amnesia effect

(the introduction can be found here)

In the comments of the recent Willingham-Kohn dust-up, edu-blogger Stuart Buck brought up DI and Kohn responded by citing this article by him which immediately reminded me of the Murray Gell-Mann Amnesia effect.

The late Michael Crichton once gave a speech describing what he termed the Murray Gell-Mann Amnesia effect.

Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well... You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them.

In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know.

I know enough about the research on DI to know that Kohn's description of the DI research qualifies as one of the worst hatchet jobs in education policy advocacy. As such, it should serve as evidence that Alfie Kohn might not be a trustworthy source on education policy and that his analysis of education research should be closely scrutinized in order to stave off the Murray Gell-Mann Amnesia effect.

But don't take my word for it, let's review Kohn's description of the DI research.

After we get past an initial paragraph of over-heated inflammatory language, Kohn's first argument is related to the results of Project Follow Through (PFT) is:

Of course, even if these results could be taken at face value, we don’t have any basis for assuming that the model would work for anyone other than disadvantaged children of primary school age.

But, PFT involved some schools in middle class neighborhoods and middle-class kids took part and were evaluated as part of the research. In fact, a very diverse set of students were evaluated.

The DI model was the best performing model for "disadvantaged children" as Kohn acknowledges, but it was also the best performing model for high-performing students, White children, Native Americans, African-American students, Hispanic students, English language learners, urban children, rural children, and the very lowest of the disadvantaged children. So there is a basis, a good basis, for assuming that the DI model works for most primary school aged children. And, in fact, these results have been replicated numerous times in subsequent studies which Kohn also fails to acknowledge.

Kohn's next argument is a repetition of the "variability" argument that others have levied against the PFT results:

To begin with, the primary research analysts wrote that the “clearest finding” of Follow Through was not the superiority of any one style of teaching but the fact that “each model’s performance varies widely from site to site.”[1] In fact, the variation in results from one location to the next of a given model of instruction was greater than the variation between one model and the next. That means the site that kids happened to attend was a better predictor of how well they learned than was the style of teaching (skills-based, child-centered, or whatever).

This is spurious conclusion because most of the variability is attributed to the inclusion of two cohorts from the Grand Rapids site in the analysis which had severed ties with the DI sponsor well before the end of the study. This is well documented. The data from the Grand rapids site was the only "DI site" with low performance (a half standard deviation below the mean of the other DI sites). The Grand Rapids site is the only "DI site" that consistently fell below national norms. In fact, most of the variability in the remaining DI sites is above National norms.

In addition, subsequent research, involving researchers with at least some professional/reputational ties to DI, has shown that the variability between sites is mostly attributable to demographic factors and experimental error, and not to the DI program.

We disagree with both Abt and House et al. in that we do not find variability among sites to be so great that it overshadows variability among models. It appears that a large part of the variability observed by Abt and House et al. was due to demographic factors and experimental error. Once this variability is brought under control, it becomes evident that differences between models are quite large in relation to the unexplained variability within models.

In any event, even if the variability finding was characterized as the main finding by the primary researchers, this in no way diminishes the finding that DI was the superior performing program across the board for all measures tested for all groups tests. It's still a valid finding and has been upheld by numerous researchers examining the findings since the initial evaluation.

So, best performing program and the only program whose variability was mostly above national norms does not a valid criticism make. Strike two for Kohn.

Next Kohn, attacks the testing instruments used in PFT:

Second, the primary measure of success used in the study was a standardized multiple-choice test of basic skills called the Metropolitan Achievement Test. [(MAT)]

The MAT is not just a test of basic skills (such as Listening for Sound (sound-symbol relationships), Word Knowledge (vocabulary words), Word Analysis (word identification), Mathematic Computation (math calculations), Spelling, and Language (punctuation, capitalization, and word usage)).

It is also a test of cognitive skills as well. Several Metropolitan subtests measure indirect cognitive consequences of learning, such as the Reading subtest (which is, in effect, paragraph comprehension), the Mathematics Problem-Solving subtest, and the Mathematics Concepts test (knowledge of math principles and relationships).

This is important because Kohn goes on to claim:

While children were also given other cognitive and psychological assessments, these measures were so poorly chosen as to be virtually worthless.

Even if the other cognitive and psychological assessments were "poorly chosen" it does not diminish the fact that the MAT is a well respected test of both basic and cognitive/conceptual skills, as ackowledged by subsequent researchers. The other cognitive/conceptual skills test used was the Raven's Colored Progressive Matrices, but it did not prove to discriminate between models or show change in scores over time.

Also, the affective skills were assessed using two instruments: the Intellectual Achievement Responsibility Scale (to assess whether children attribute their success (+) or failures (-) to themselves or external forces) and the Coopersmith Self-Esteem Inventory (to assess how children feel about themselves, the way they think other people feel about them, and their feelings about school).

Kohn buries the reason why he believes the cognitive and affective skills tests were "poorly chosen" in a footnote.

There is strong reason to doubt whether tests billed as measuring complex “cognitive, conceptual skills” really did so. Even the primary analysts conceded that “the measures on the cognitive and affective domains are much less appropriate” than is the main skills test (Stebbins et al., 35). A group of experts on experimental design commissioned to review the study went even further, stating that the project “amounts essentially to a comparative study of the effects of Follow Through models on the mechanics of reading, writing, and arithmetic” (House et al., 1978, p. 145). (This raises the interesting question of whether it is even possible to measure the conceptual understanding or cognitive sophistication of young children with a standardized test.)

Let's take these "strong reasons" in order. Even if the primary analysts believed that these tests were "much less appropriate" this doesn't they didn't measure what they purported to measure. There is no evidence that they didn't. It could also be that the primary researchers believed that the basic skills tests were best suited for measuring what K-3 students are typically expected to know. In any event, the opinion that the tests were "much less appropriate" does not lead one to conclude that the tests didn't measure "complex 'cognitive, conceptual skills.'” This is an empirical question and Kohn provides no empirical support for his conclusion.

Next, Kohn relies on the opinions of the "experts" commissioned and funded by the Ford Foundation (which I'll get to later) for the proposition that PFT only measured the affects of "the mechanics of reading, writing, and arithmetic." Apparently, what these experts were getting at was that students who hadn't learned the mechanics of reading, writing, and doing arithmetic might not be able to demonstrate their cognitive skills. This is also an empirical question. But, these experts provided no empirical support for their conclusion. Other researchers, however, did look into the question once it was raised.

Conceivably, certain models-let us say those that avowedly emphasize "cognitive" objectives-are doing a superior job of teaching the more cognitive aspects of reading and mathematics, but the effects are being obscured by the fact that performance on the appropriate subtests depends on mechanical proficiency as well as on higher-level cognitive capabilities. If so, these hidden effects might be revealed by using performance on the more "mechanical" subtests as covariates.

This we did. Model differences in Reading (comprehension) performance were examined, including Word Knowledge as a covariate. Differences in Mathematics Problem Solving were examined, including Mathematics Computation among the covariates. In both cases the analyses of covariance revealed no significant differences among models. This is not a surprising result, given the high correlation among Metropolitan subtests. Taking out the variance due to one subtest leaves little variance in another. Yet it was not a forgone conclusion that the results would be negative. If the models that proclaimed cognitive objectives actually achieved those objectives, it would be reasonable to expect those achievements to show up in our analyses.

So, again we have no valid reason for discounting the results of the non-basic skills tests. Unsupported opinion is not a valid reason last I checked.

Lastly, Kohn raises a question:

This raises the interesting question of whether it is even possible to measure the conceptual understanding or cognitive sophistication of young children with a standardized test.

And then conspicuously fails to answer it. This is the poor man's version of debate. Moreover, nothing that precedes this "interesting question" is capable of actually raising it.Also , I'm not sure if Kohn is trying to claim that you can't measure these skills or that you can't measure these skills with a standardized test. Though, Kohn provides no support for either question.

Kohn's innuendo is that the students might have had conceptual understanding or cognitive sophistication that we must accept on faith despite the evidence that more students in the non-DI models were incapable of demonstrating these magical immeasurable skills on simple tests of comprehension of written paragraphs, mathematical problem-solving and knowledge of math principles and relationships, not to mention all the other "basic skills" that were measured.

Unfortunately, Kohn fails on all three counts to provide evidence that would compel a reader to follow him down his opinionated path that PFT only measured basic skills and that the other measures were "virtually worthless." Maybe this is why he buried this one in a footnote.

Next Kohn claims:

Some of the nontraditional educators involved in the study weren’t informed that their programs were going to end up being judged on this basis.

First of all, the DI educators were also non-traditional in as much as the other models' educators were. DI is about as far removed from traditional pedagogy as the other models.

Also, even if some of the other educators claimed that they were never initially told that their models weren't going to be judged on reading comprehension, math problem solving, and the like, they would have quickly learned what was coming down the pike since the PFT students were extensively tested throughout the study. And, it was the third and fourth cohorts that formed the cohorts of the evaluation. Whoever claims to not have known initially would certainly have found out during the time time the first two cohorts passed through.

Next Kohn claims:

The Direct Instruction teachers methodically prepared their students to succeed on a skills test and, to some extent at least, it worked.

Actually, the DI model systematically prepared their students to read, understand the conventions of language, to spell, and to do arithmetic with an emphasis "placed on the children's learning intelligent behavior rather than specific pieces of information by rote memorization." And, the students outperformed the other students on tests of sound-symbol relationships, vocabulary words, word identification, math calculations, spelling, punctuation, capitalization, word usage, paragraph comprehension,mathematics problem-solving , knowledge of math principles and relationships, and the affective measures. There was no evidence that the DI students engaged in test preparation as alluded to by Kohn.

PFT demonstrated, once again, that teaching these skills directly was more effective than teaching them obliquely which was what the other models believed would lead to superior performance. It turns out they were wrong and they continue to be wrong to this day.

For those keeping track at home, Kohn has now failed to establish the first two prongs of his argument. He has one more prong left which I'll take up in my next post.

February 12, 2009

The Intellectual Dishonesty of Alfie Kohn

In case you missed it, cognitive scientist Daniel Willingham recently criticised author Alfie Kohn for making factual errors, misinterpreting and oversimplifying the research, and making logical errors.

Willingham was too kind to Kohn.

Kohn responded and denied the allegations, casting most of the disagreements as merely a difference of opinion. Hopefully, Willingham will respond to Kohn's response and give him the smack-down he so rightly deserves because I, like Willingham, believe that Kohn is butchering the fair-reading of the research to lend credence to his crack-pot opinions and agenda.

Kohn is not the dispassionate advocate he pretends to be. He is a intellectually dishonest muck-raker with an agenda. A dangerous agenda for at-risk children.

You see, what Kohn does is prey on the sorry state of the quality of instruction and education research as a springboard for his opinions. For example, we know that praising students to increase motivation is difficult to do properly. It is difficult to get it right and easy to get it wrong and it is even more difficult to show positive academic results because those results are also dependent upon the quality of the delivered instruction which is often ineffective with at-risk kids, i.e., the ones who need the motivational praise. You see the problem --because Kohn doesn't. To Kohn, all praise or positive reinforcement is detrimental, unless you want to count Kohn's carefully worded weasel language he includes at the end of a long diatribe for plausibly deniability. Here's the weasel language that comes at the end of a long article informing the reader of how bad positive reinforcement is:

It’s not a matter of memorizing a new script, but of keeping in mind our long-term goals for our children and watching for the effects of what we say. The bad news is that the use of positive reinforcement really isn’t so positive. The good news is that you don’t have to evaluate in order to encourage.

Kohn ignores the large body of research in which the proper use of positive reinforcement was found to be effective in getting disruptive students to stop being disruptive so they can learn. What I haven't seen is a teacher of a classroom of disruptive kids following Kohn's advice and being able to get the classroom under control and then teach them effectively.

And, ultimately that's Kohn's main problem. He has lots of opinions on education, but no evidence of his opinions being put into practice and being effective. In fact, I'll go so far as saying that to the extent that his condoned practices have been actually been used, they've been failures. Miserable failures.

Look how poorly the Open education model and the other child-centered models fared in Follow-Through. That's some very inconvenient evidence for Kohn which he realizes and attacks. And, it's that hatchet job which I'll deconstruct in my next post.

February 10, 2009

The Cheese Stands Alone

Stephen Downes has finally seen the light as to the benefits of worked problems examples for inducing learning:

Today's newsletter is delayed a bit because I could not tear myself away from this wonderfully detailed set of instructions on how to make cheese. Makes me just want to go out and get myself some rennet. You'd probably have to practice a bit to really learn how to make cheese, but these instructions really look like all you'd need to get going. It's also important to have previously seen and tasted cheese, so you know what success looks like. (emphasis added)

At least when it comes to novice cheesemakers learning how to make cheese.

But apparently not for learning academic content which seems to require a more constructivist approach, according to Stephen.

Wouldn't the budding cheesemaker learn more from being handed all the necessary cheesemaking ingredients and provided the wonderfully engaging opportunity of floundering around making cheese on their own with some minimal guidance provided by the instructor.

I guess not. It doesn't work for chick-sexing apprentices? Why should it work for novice cheesemakers?

And for that matter why should it work for novice students of algebra?

Here's the analog in the algebra world. Behold Algebra: Structure and Method, Book 1, Dolciani (1981 Ed.). (Click to enlarge)

The classic worked product example for teaching how to solve simple equations using the multiplication property of equality (a page I selected randomly).

The "lesson" is followed by a few more example and then the student is provided the opportunity to practice what has been taught by working various oral, written, and open-ended problems, relevant to the lesson so they can " practice a bit to really learn" it.

This is the traditional way algebra is taught. Apparently, it's only "tedious lecture" and "rote learning." Of course if it were rote learning the student would only be able to solve 4x = 52 and would have to be taught 5x = 50 and 3x = 36. But as any good connectivist will tell you, the student should be able to generalize a solution for any similar problem fitting the pattern of the worked problem example after sufficient practice.

It seems to me that the primary difference between this method of learning and the constructivist method of learning is that the "wonderfully detailed set of instruction instructions" isn't provided to the student beforehand. The student is supposed to figure them out (i.e., construct) this knowledge for himself. At least that's the theory.

Of course, in the real world, even the constructivists would rather see the instructions beforehand.


In his new book, Outliers, Malcolm Gladwell makes the case that opportunities and practice determine success. No doubt luck plays a part in success. And you'll get no argument from me regarding the need for lots of practice. But, Gladwell, greatly underestimates the role that innate talent plays in success. Mozart surely did practice a lot, but he was also very talented. The Beatles practiced quite a but during their Hamburg days, but John Lennon and Paul McCartney were also very talented songwriters. Talent is an important component. The talented make better use of their practice time and will be more successful, and hence more motivated, in their practice. Just ask this kid.

February 7, 2009

From the Department of Huh?

Comes the conclusion of this study out of Ohio State.

A study of college freshmen in the United States and in China found that Chinese students know more science facts than their American counterparts -- but both groups are nearly identical when it comes to their ability to do scientific reasoning.

But when you look at the researchers' description of the underlying study, you see that this conclusion isn't supported and leads me to question the researcher's own ability to reason scientifically at least in the domain of education.

The researchers administered three tests to incoming college freshmen from China and America who had just enrolled in a calculus-based introductory physics course.

The first test, the Force Concept Inventory, measures students’ basic knowledge of mechanics and the student's understanding of mechanics and forces. "The Force Concept Inventory is not 'just another physics test.' It assesses a student’s overall grasp of the Newtonian concept of force. Without this concept the rest of mechanics is useless, if not meaningless." (Force Concept Inventory, Hestenes, Wells, and Swackhamer, The Physics Teacher, Vol. 30, March 1992, 141-158).

The second test, the Brief Electricity and Magnetism Assessment, measures students’ understanding of electric forces, circuits, and magnetism.

The third test, the Lawson Classroom Test of Scientific Reasoning, measures generic science reasoning skills. You can see the kinds of questions on the exam of the appendix of this study.

The tests were given to Chinese students and American students. According to the researcher, in China, "every student in every school follows exactly the same curriculum, which includes five years of continuous physics classes from grades 8 through 12" and "schools emphasize a very extensive learning of STEM content knowledge" In the United States, "only one-third of students take a year-long physics course before they graduate from high school. The rest only study physics within general science courses. Curricula vary widely from school to school, and students can choose among elective courses" and "science courses are more flexible, with simpler content but with a high emphasis on scientific methods."

Keep those descriptions in mind because they'll be important for the conclusions drawn by the researchers.

Now let's turn to the results.

On the FCI, "[m]ost Chinese students scored close to 90 percent, while the American scores varied widely from 25-75 percent, with an average of 50." Clearly the Chinese students understand mechanics better than their American counterparts. One the BEMA, "Chinese students averaged close to 70 percent while American students averaged around 25 percent -- a little better than if they had simply picked their multiple-choice answers randomly." I guess all those Physics course helped the Chinese students understand physics, whereas all that emphasis on scientific methods at the expense of content didn't pan out so well for the Americans. These results are hardly surprising. Knowledge is domain specific and transference between domains is generally minimal.

On the Lawson Classroom Test of Scientific Reasoning "[b]oth American and Chinese students averaged a 75 percent score." So, the Chinese students were just as capable as the American students even though their course supposedly didn't emphasize "scientific methods" like the American students did.

The researchers, however, concluded:

Lei Bao, associate professor of physics at Ohio State University and lead author of the study, said that the finding defies conventional wisdom, which holds that teaching science facts will improve students’ reasoning ability.

“Our study shows that, contrary to what many people would expect, even when students are rigorously taught the facts, they don’t necessarily develop the reasoning skills they need to succeed,” Bao said. “Because students need both knowledge and reasoning, we need to explore teaching methods that target both.”

What? This isn't the conventional wisdom. The conventional wisdom is that that learning facts in a domain will improve the ability to reason in that domain. This wasn't tested in the study. What was tested in the study, via the FCI and the BEMA, was the students' understanding in the domain (physics) which was significantly higher for the Chinese students compared to the American students. Not surprisingly, the American students didn't understand much physics since they didn't learn many physics facts and their "scientific methods" instruction failed to fill the void. Constructivists take heed.

What the study also showed is that learning facts in one domain will not necessarily lead to transference to a different domain and an improvement in reasoning skills in general, whatever they may be (assuming they exist). Again, not a surprising outcome. But, the researchers' spin obscures this conclusion.

And here's the kicker.

Bao explained that STEM students need to excel at scientific reasoning in order to handle open-ended real-world tasks in their future careers in science and engineering.

Ohio State graduate student and study co-author Jing Han echoed that sentiment. “To do my own research, I need to be able to plan what I’m going to investigate and how to do it. I can’t just ask my professor or look up the answer in a book,” she said

The irony is that this physicist didn't do a very good job conducting an investigation in a foreign domain (education). If he wanted to know who was more capable of "handl[ing] open-ended real-world tasks" he should have tested this in a domain specific way. He should have given both groups open-ended real world physics problems and determined which group handled them better. I'm thinking it would have been the Chinese students.

And then we have the most unsupported conclusion of the study:

“The general public also needs good reasoning skills in order to correctly interpret scientific findings and think rationally,” he said.
How to boost scientific reasoning? Bao points to inquiry-based learning, where students work in groups, question teachers and design their own investigations. This teaching technique is growing in popularity worldwide.

The American students who presumably were instructed in inquiry-based techniques fared no better than the Chinese students in general reasoning ability. Inquiry-based teaching once again failed to show results. And it certainly did the students no favors when it came to the students understanding of physics in which they performed poorly.

I see nothing in this study that shows any benefits for inquiry learning. If anything, the study supports the notion that you can't teach general reasoning directly, both methods of teaching failed. What the study also clearly shows is the continuing importance of learning content if you want to understand something.

February 3, 2009

Whitmire Phones One In

Richard Whitmire riffs off the recent, and soon to be disappointing, Obama effect in his latest USA News editorial as a base to point out why he believes the "pathways to success" are not in place for NAMs to find academic success. The problem, however, is that Whitmire's claim that the pathways are "closing up" is all wrong.

First he gives the Grey Lady's education reporting way too much credit vis-a-vis the Obama Effect.

Even the New York Times weighed in with a story that made the Obama effect appear based on science (relying on a single study; am I alone in thinking that was sub-NYT standards?) by writing up a study claiming that black test takers upped their scores post-Inauguration Day, apparently the dividend of a "Yes we can" self-esteem movement.

Sadly, this kind of education reporting for the NYT is very much the rule and not the exception.

But on to the Whitmire's closing pathways.

First he claims that College is not sufficiently accessible. I'm not sure that's really the problem. State colleges already admit many students who aren't sufficiently prepared for college level work. Most of these ill-prepared students aren't going to make it out of college anyway, so I don't see accessibility as the problem; lack of preparation is the problem.

Whitmire recognizes this lack of preparation as a problem, but unfortunately blames the wrong culprit:

The stimulus bill proposed by the House would bump up Pell grants for poor students to make college more affordable, but that does not solve the biggest problem faced by these students: As a result of attending subpar high schools, they are not ready for college work.

The achievement gaps are present long before high-school. It is debatable whether elementary schools have really improved, as Whitmire claims, but one thing is clear they haven't improve enough yet. Middle-schoolers remain woefully unprepared for high-school level work, so why are we blaming high-schools for being unable to deal with all these ill-prepared children?

Next Whitmire claims that "[n]ational education reforms have pushed curriculum demands lower into the grades, handing kindergartners the verbal tasks that two decades ago confronted second graders." This is only partially true. Today's kindergartners are still doing the same stuff that many kindergartners of twenty years ago did. Teh only real difference is that back then we allowed the struggling students to wait until they were "developmentally ready" which has been proven to be a large waste of valuable academic time. Yet the problem remains that we are still often not too successful in teaching these at-risk kids. In this respect Whitmire has a valid point and literacy rates will have to soar for there to be an improvement.

WHitmire's next point that black boys need to be rescued is also a valid point, as long as if by rescued he means to provide them with the effective commercially available curricula that has existed for decades.

Last, Whitmire jumps on the teacher quality bandwagon with both feet:

Knowing what we know about the value of a high quality teacher, we should be on the verge of delivering those teachers to inner-city students.

Really? I though the "research" pretty much indicated that we don't have the foggiest idea how to make average teachers into superstar teachers. Actually do know how to improve the effectiveness of all teachers: hand them a effective curricula and teach them how to use it, but this isn't what most people mean when they talk about teacher quality.

Ultimately I disagree with Whitmire's major premise that the lack of pathways are what's holding back students. The pathways have been in place for all children of a certain ability level and family stability to take advantage of. It is the access to those pathways that need to be improved to accommodate a level of student ability that has never been able (or had the opportunity) to take advantage of them up until recently.

Today's Quote

Practice makes you good at learning, but being smart makes you good at practice.*

-- me (this morning)

*A less pithy, but more accurate quote might go: Practice makes you good at learning, but being smart increases the likelihood of initial success which increases your motivation to practice, and, thus, your willingness to practice.