February 23, 2007

Some Data

Over at edwonk’s, Mike challenges me to show some data on the failed state of our schools.

Usually, my questions end the discussion because no specific answers are forthcoming.

Let’s use some data from Pennsylvania, an average performance state according to NAEP.

Let’s look at the results from PA’s 11th grade PSSA exam (2005).

As a preliminary matter, let’s establish that the PSSA is a valid test of student performance. PA takes their testing seriously and has conducted numerous analyses to determine the validity of their test. Those studies can be found here. Here’s another independent evaluation done by Achieve, Inc.

According to the evaluations, the PSSA correlates nicely with such respected measures of performance such as the CTBS, TerraNova, SAT-9, and SAT. One consistent complaint made by the evaluators is that the PSSA lacks rigor, especially at the 11th grade level. See pages 33-35 and 50-52 of the Achieve evaluation.

The math portion of the test contains 66 questions. A student needs to have answer 31 of those questions correctly to score at the basic level, not to be confused with the proficient level. This does not mean that the student needs to know the answer to 33 questions. Because the questions are multiple choice, a student only needs to know the answer to 20 of those questions. By filling in the rest of the bubbles at random, the student will on average receive credit for 11 additional correct answers. So, to score at the basic level a student only needed to know 30% of the answers on a low rigor test. Worse than that, according to Achieve Inc., 26% of the questions on the exam (almost the same amount the basic student needs to know) were at the lowest level of cognitive demand (Items require the recall of information such as a fact, definition, term or simple procedure.). Yet despite this low level of rigor and low cut score, 33% of PA 11th graders could not meet this threshold of performance. This does not include all the students who have already dropped out by the 11th grade.

The analysis for the reading portion is similar. A student only needs to 54% of the questions to score at the basic level. 27% of students couldn’t do this. According to achieve 81% of the test questions fall within the lowest level of cognitive ability.

If we look at all the high schools in PA we find 665 schools with reported PSSA scores for the 11th grade. 25% of those schools have 40% or more of their students performing at the “below basic” level. That seems to me to be an unconscionably large number. The data for all these schools are readily obtainable at schoolmatters. Hopefully, this’ll be enough data for Mike to show a small taste of the utter failure that permeates our schools. Let’s see if Mike can spin the data.

9 comments:

Anonymous said...

At one point in this post, you point out the percentage of knowledge level questions, and you kind of discount this type of assessment.

I'm a big fan of knowing specific things, and although I recognize that knowing things at the kind of "trivia" level isn't as good as being able to use the knowledge at a higher level, I also think that test must access knowledge at a simple level, at least in part, so that we can see that the problem is the lack of basic knowledge, rather than the failure of a higher level skill.

An only loosely connected question: what does Direct Instruction look like for high school? I guess I had a flawed belief that it was a knowledge heavy method of teaching. Are the assessments always at the application level (or above)?

Mike said...

Dear KDeRosa:

Very well. I pick up your gauntlet, but since you initiated the challenge, it is I who has the right to determine the weapons! And I choose...automatic grenade launchers inside a phone booth! To quote Howard Dean: "Yeeeaarrrgggggh!"

Should it be a surprise to anyone that those whose reputations are tied up in the validity and utility of a given testing instrument think it to be valid and utilitarian? No doubt one has merely to ask them and they'll confirm it. I'm sure one can find others like them to issue similar confirmations, as could one find professionals able and willing to suggest the opposite. Focusing on the results of one test (2005 in PA, no?) in one state would tend to support my discomfort with those who issue blanket condemnations regarding the state of American education. But just for the sake of argument, let's agree on the following points:

(1) The test was valid and measures what it purports to measure.

(2) The scores obtained were all properly derived and are accurate.

(3) The test measures "basic," perhaps even elementary competency.

That said, there still remain many difficulties. Ultimately, the unanswered questions here are what problem that test and others like it purport to address, and can those instruments really address the problems? Are such tests, which are famously expensive, expensive to the point that many states are loath to provide full data on their costs, a better source of information than pre-existing measures? And finally, do such tests actually improve or harm the education environment and process?

It strikes me that the most efficient means for assessing student performance and progress always has been, and remains, the individual classroom teacher, or on the high school level, teachers, who see and deal with a given student daily. Rather than judging the educational worth of the individual from the score of a single test, even a single test repeated once yearly, teachers have the benefit of direct daily observation, and student performance determined through 100+ scores on a wide variety of assignments over a school year. Teachers provide a valuable, vital service that no mandatory, high stakes test can: direct, one-on-one and immediate correction/feedback leading to directly measurable and immediate improvement.

But let us assume that a given teacher or school is not doing what they should (providing the best educational opportunity of which they are capable) and students (who are whole-heartedly and with great dedication striving to take full advantage of their educational opportunities) are not learning as well as they should. The mechanisms for discovering such deficiencies and for their improvement are present in each and every school district in the land, and they consist of voters, who install and can remove school boards; school boards, who appoint and fire superintendents; superintendents who appoint and fire principals; and principals who appoint and fire teachers. Mirable dictu! Accountability at each step of the process from top to bottom! If this mechanism is not working as well as it might, who can argue that a mandatory, high stakes test can fix it? Such tests are ingenious solutions in search of a problem (actually, expensive solutions in search of gullible buyers).

Again, to buy into high stakes, mandatory testing, one must believe that an educrat staring at a single score in a state capital knows so much more about an individual student they have never, and will never meet than the teachers who work intimately with that student over time. Ask the educrat to justify their conclusions about that student and they'll produce nicely footnoted studies lauding the validity and majesty of the testing instrument. Ask the teacher the same question and they'll produce portfolios of that student's actual work that will allow anyone willing to spend a bit of time to determine exactly how that student has progressed over the course of a semester or year. They will also be able to explain exactly why a student is doing well or poorly, and what can be done for short and long term improvement. Which method is cheaper, more effective and more meaningful to the student, their parents, and society?

As to the final question I posed, our experience in Texas, which is a leader in mandatory high stakes testing, and which has an overbearing, yet magnificent (if you don't believe it, just ask them!) education bureaucracy that is the envy of education bureaucrats everywhere, might be instructive. Remember, this was the question: "...do such tests actually improve or harm the education environment and process?"

A few years back, the state completely revamped the test (we'll talk about English here). All of the assumptions and beliefs about what was state of the art were put aside, and a new test was necessary. This put the educrats in a bit of quandry. How does one explain that the old test, which was, of course, the epitome of validity and measurement of all that was good and true is no longer so virtuous, and not only that, it is so flawed that it must be entirely, in format and content, replaced with a new instrument that is the real epitome of validity and measurement of all that is really, no kidding, we've got it right this time, good and true? Well, they just ignored the old test and everything they ever said about it.

The first year the new test was given, scores dropped dramatically compared to the last year of the old test. The following year, test scores began to climb and have continued that ascent. But here's the problem: If the tests really only measure the lowest common denominator, what is their utility in the first place? Let's observe briefly that it would hardly be unexpected that a given percentage of any population taking any test would fail, a given percentage would excel and most would fall toward the average (Lake Wobegon, where all the children are above average, only exists on the radio). When we realize that as much as 15% of a given school population might simply lack the IQ or innate intelligence to pass such tests, it tells us little to decry the fact that X percent of students couldn't pass a basic test, particularly when we realize that some percentage of the population just won't be able to do it. Ever. And interestingly, next year will see another significant drop in test scores when only 1% of the special education population of every school will be able to take an alternate test. Boy oh boy, won't American education be failing next year!

This is one of our little societal oddities. We have no difficulty in accepting, indeed, even in celebrating, the fact that most people will never qualify to play on the varsity football team. In fact, those players will commonly comprise far less than a single percent of a given student body. Yet, we panic and forecast doom when not every student can function as an intellectual varsity. But back to the TAKS (Texas Assessment of Knowledge and Skills).

Why did the students do poorly the first year of the new test and improve the next year? did teachers who were previously successful suddenly become incompetent? And did they regain that competence the follow year? What happened is the same thing that happens across the nation.

Such tests must, of necessity, measure very specific and narrow competencies, or in English writing, must demand and reward very specific writing strategies. The educrats will deny this, and will say (just ask them) that any teacher who is teaching what they should will automatically, ipso facto properly prepare students to excel on the test. Teaching to the test is absolutely not required. In making such statements, they are engaging in what we in English education call "lying." OK, let's be charitable. They may well believe what they say, and because they are far removed from actual teaching experience and practice, even though many of them may have taught at some time in the past, they're merely mistaken.

Teachers didn't lose competency and miraculously regain it, they were able to puzzle out (with little help from the educrats who drape a curtain of secrecy around the test that would put most spy agencies to shame) the exact skills/tricks necessary to pass the TAKS. They were also able to determine that many of those skills did not transfer well to competent writing, and that in fact, would guarantee failure in any college freshman English course in the land. So what did teachers do, and what do they do to this day?

They teach to the test. They spend about a month--often more--each year doing little but specific drills necessary to ensure that kids have the best chance of passing the TAKS, which ultmately tests one thing reasonably well: The ability of a given student to pass that specific type of test, a test that is literally dripping with validity (if you don't believe me, just ask the educrats who will be happy to set you straight, no doubt with neatly footnoted studies, and perhaps even Powerpoint presentations!). And each year, some students who can barely complete a coherent paragraph pass the TAKS because they have mastered the tricks, and each year, some of the most intelligent, capable, literate students in the state fail the test. In either case, competent teachers tell the kids "do what we've taught you only on the TAKS, and for God's sake, never write that way in class or anywhere else."

Is that a good use of time, money and educational opportunity? I certainly don't believe it, and most teachers I know don't either. But that's a part of the problem. Virtually no one listens to teachers these days. But then again, if tests can solve all education problems, why should they? The role of a teacher is merely to administer such tests. They don't know nothin' 'bout birthin' no education.

Via con Dios.

Parentalcation said...

"The mechanisms for discovering such deficiencies and for their improvement are present in each and every school district in the land, and they consist of voters, who install and can remove school boards; school boards, who appoint and fire superintendents; superintendents who appoint and fire principals; and principals who appoint and fire teachers."

I see a problem with this argument for several reasons.

1. parents are too emotionally involved with their kids to make rational judgements

2. there are very few alternate models out there to compare their schools too.

3. it is pretty much established that the reform models like "DI" aren't even covered at Education
Schools, so to expect parents to have a grasp of its potential is ridiculous

All and all, this premise would only work if they had viable options, thus you make a strong argument for school choice.

It seems to me that most competent teachers would welcome high stakes testing. After all it is human nature to want validation.

I also get the feeling that you are probably a middle school or high school teacher. If so, you are victimized by education failures at the lower levels just as much as everyone else.

rws1st said...

"It strikes me that the most efficient means for assessing student performance and progress always has been, and remains, the individual classroom teacher"

This is true for an individual, I think its pretty clear from my early reading (or maybe it was the video on Zig's site) of DI, that they encourage direct teacher observation of students, evaluation of where they are at, and material aimed at their needs.

However for assessing large populations of students tests can be effective. If ones aim is to compare populations of students, to weight alternatives, or to track progress over time, testing will probably give you a better view than an attempt to agrigate a multiple of different teacher evaluations. And if you did come up with a good way to use teachers to evaluate students that could be used for comparisons across diverse populations, I would expect such a method to be far more expensive than a normal test. Phychophysics has a number of statistical techniques for comparing human observations and evaluations, but they are expensive to implement for large sample sizes.

That a test can be effective does not mean all test are. That would have to be evaluated on a test by test basis.

"The mechanisms for discovering such deficiencies and for their improvement are present in each and every school district in the land, and they consist of voters, who install and can remove school boards; school boards, who appoint and fire superintendents"

Are you familiar with the public choice theory of economics and in particular of the rational ignorance of voters? Given your later objections to the testing process that the political system in Texas has developed, it is odd that you would have confidence here.

NYC Math Teacher said...

It seems to me that most competent teachers would welcome high stakes testing.

As a self-described competent teacher, I do, in fact, welcome high-stakes testing. However, for a better measure of my teaching abilities, I need the option of long-term removal of highly-disruptive students. (Currently, my options are limited to periodic removal for a class at a time, with no consequences aimed at changing the behavior.)

rightwingprof said...

And exactly as I predicted, your educrat discounted every piece of data and research with a wave of his hand -- as educrats always do.

Tracy said...

It strikes me that the most efficient means for assessing student performance and progress always has been, and remains, the individual classroom teacher, or on the high school level, teachers, who see and deal with a given student daily.

These teachers may indeed be the most efficient means for assessing student performance and progress.

However that doesn't mean they're actually doing it - or doing anything useful with their testing results. The number of students who fail to learn to read is shocking. For example, according to the 2001 AMA Survey on Workplace testing, 34.1% of job applicants tested in 2000 lacked sufficient skills for the positions they sought. (ref http://www.amanet.org/research/pdfs/bjp_2001.pdf)

It is of course entirely possible that teachers are effectively and efficiently measuring their students' actual performance but are being prevented by the education system from effectively remedying the problem. Forcing schools to pay attention to the results of standardised tests is a possible solution to thereby force schools to improve performance.

The mechanisms for discovering such deficiencies and for their improvement are present in each and every school district in the land, and they consist of voters, who install and can remove school boards; school boards, who appoint and fire superintendents; superintendents who appoint and fire principals; and principals who appoint and fire teachers.

Notably, these voters, who according to your argument are well -placed to assess improvements in education, have elected politicians who have implemented a law called No Child Left Behind, which relies on mandatory high-stakes testing in the hope of improving school performance.

So voters have used the mechanisms available to them to develop a mechanism to improve school teaching. This implies a dissatisfaction with how current schools are achieving.

KDeRosa said...

Mike, you're all over the board with your response. Let's refocus.

Let's stick to the data I offered.

You allege that the validation analyses were biased. The potential for bias always exists; however, the analysists have made their findings available and their analyses transparent. As such, you should be able to give us at least one example of actual bias or withdraw your allegation.

Now getting back to the data.

Your criticisms go to policy reasons relating to your premise that standardized testing is not desirable. This is all well and good, but outside the scope of our argument.

You haven't answered the main question. Do you accept the premise that the data I've provided shows that academic failure for a segment of the student population? If not, why not? How is the data faulty?

Roger Sweeny said...

Mike,

My experience may not be relevant--I teach high school science--but here goes.

It has been my experince that assessments throughout the year and once a year high stakes tests measure two different things.

Tests and quizzes and the other things we do largely measure what students are able to remember from the last few weeks or days. Someone who has paid attention and has a decent memory can do well whether she has learned and understood the material or not.

Tests given at longer intervals are more likely to measure actual understanding, though they may also show organization and the ability to "cram" (which are, I suppose, acutally very useful life skills).

Ask a hundred teachers if their students do worse on midterms and finals than on regular tests and quizzes and I'll bet almost all of them will say "worse." Some will say "much worse." And some will say, "About the same--but we review for two weeks beforehand."