By James Noll
Standardized testing has a surprisingly sinister origin. It was invented in 15th Century Eastern Europe by none other than Vlad the Impaler (née Dracul). Most people believe he earned his nickname because of the time he invited hundreds of his nobles to a banquet, plied them with food and alcohol, and then, as they digested their repast, ordered his soldiers to stab them where they sat, drag them out to the courtyard, and impale them on stakes.
Nothing could be further from the truth.
Yes, he invited the nobles to his palace, and yes, he plied them with booze and meat; however, far from engaging in the despicable act of digestus-interruptus, Vlad, an inveterate tinkerer and inventor of things, merely wanted to show off his latest contrivance: a bubble test. Vlad knew he couldn’t just spring an assessment on the nobles without giving them time to study, so he presented it as a brute challenge: Take these number two pencils and get to work, or be impaled on those sharp sticks in the garden.
Not a single one of the nobles subjected himself to Vlad’s bubble test.
Vlad was wholly disappointed. Didn’t anybody care about how long he worked on that thing? Disappointment turned to anger, anger to action, and his subsequent bloody tear through the Ottoman Empire eventually inspired Bram Stoker to pen the script for one of the most famous true crime podcasts of all time, Dracula.
I’m kidding, of course.
Vlad the Impaler never set up those sharp sticks. The nobles did that on their own.
In reality, standardized testing has its roots in Han Dynasty era Chinese bureaucracy. In order to create a meritocracy based on knowledge (not family wealth or connections), officials created a test that was based on rote memorization and regurgitation of Confucian principles. It was also less a test than it was an exercise in endurance, as test-takers had to sit for days to complete the assessments. Ultimately, it backfired. Rich families hired tutors to train their children how to pass the test. Poor candidates, having not the kind of resources to eat let alone hire someone to help their kids memorize something, often failed. Here’s a nice red herring to throw at someone anytime he tries to use National Assessment of Educational Progress data to prove anything: standardized tests actually had a hand in the Taiping Rebellion which left at least 20 million people dead.
In America, standardized tests have even less savory if familiar origins. They were introduced in the mid-19th Century by the celebrated education reformer Horace Mann. Mann wanted students’ scores to be longer lasting than the oral method of evaluation that was popular at the time–something they could carry with them or which future schools or employers could review. However, it didn’t quite work out the way he intended.
Mann had the Boston Board of Education make up written exams and give them to students who the city’s teachers had never seen. The board’s examiners used the results, which were predictably dismal, to blast the teachers and the quality of local education. Teachers, also predictably, argued that the written questions had little to do with what they were teaching and so were invalid. Heads rolled. Teachers were fired. School board members were forced out of their jobs.
And standardized tests were declared an unqualified success–so successful, in fact, that American eugenicists began using them to prove the superiority of the Nordic race, led by psychologist and unrepentant racist Carl Brigham, who concluded they proved the negative impact of miscegenation upon whites.
When I was in school (way back in the Analog Era), standardized tests were seen as a break from the far more difficult tasks our teachers assigned. In class, I navigated the knotty labyrinth of German grammar, wrote personal narratives, argued the ethical imperatives of conservation, interpreted Shakespeare, struggled with Algebra, did my best with physics. In comparison, one standardized test I took in tenth grade, which was designed to predict my future career, told me, a fifteen-year-old boy from Reston, the most suburban suburb of all the suburbs, whose maximum interest in animal care was feeding my cat, Mushka, dry kibble purchased at the grocery store, that I was most certainly destined to be a zookeeper. It also informed me of my excellent skills in typing and secretarial work.
These days, American students spend their entire elementary, middle, and high school careers struggling through battery after battery of wholly inappropriate high-stakes standardized tests. It starts in third grade and doesn’t let up until after graduation.
For a period of time, one of the counties that I worked for demanded that we administer standardized tests in between the standardized tests we already administered to make sure our students were prepared for the standardized tests they had to take at the end of the year. At first, these assessments, called Benchmarks, were created by teachers.
Done correctly, the process of designing, writing, and implementing a standardized test is entirely manageable. You can read the following section of this essay to get a sense of how it’s supposed to work–at least as far as any reasonable person (me) understands it–but rarely does.
Or you can trust me that what follows is NOT how the standardized test design happened in Virginia and just skip to the next subhead where it says “END OF SKIPPABLE SECTION.”
First, before unleashing a standardized test on the general population, teachers have to ensure that the test tests what it says it’s going to test (that’s a part of validity). In other words, for the EOC (End of Course) English SOL, the VDOE lists ten different standards in four different categories with several sub-standards. Under the Oral Language category there are two standards with a total of six sub-standards between them. Reading Analysis has a four to eleven ratio (standards to total sub-standards), Writing a three to sixteen ratio, and Research a one to ten ratio.
If the first quarter county benchmark test only covered Reading Analysis, but I taught Reading Analysis through a research project, that test could be invalid. OR, let’s say the first quarter county benchmark DID cover research but only in a very superficial manner, or a few of the teachers don’t teach research until the middle of the third quarter, that test could be invalid.
“Why can’t English Teachers just teach the same things at the same time across multiple grade levels using varying content?”
Because reading comprehension doesn’t work that way. Even if one ignores the context of the text selection, or how the cultural backgrounds of the students dictates the enthusiasm towards what is being read (which in turn affects engagement, which in turn affects comprehension); even if one turns a blind eye to the mounds of research demonstrating that students improve their reading comprehension when they are allowed to read and work with content that appeals to them personally; even if one ignores the subsequent research that explains how increased reading comprehension improves students’ abilities across all disciplines, it is impossible to force one person to read at the same pace as someone else.
In addition, English is a recursive subject. It doesn’t follow a necessary order like Mathematics or Science. I can teach all four standards with a research project alone, or, with an assignment on plot diagramming, I could focus entirely on a couple of sub-standards in the Reading Analysis category. In this, I see beauty and art. It lends the subject a certain openness when it comes to pacing, choice of literature, and nature of instruction and classwork–something standardized test proponents either understand but choose to ignore or don’t understand at all.
But, hey, let’s pretend our Benchmark is valid despite all of that.
After the test is given the first time, teachers have to norm the responses, excise questions that every student misses, and check to make sure that those questions weren’t mis-answered because the passage or answers evoke some kind of negative emotional response, contains any conflict, or puts any student at a cultural disadvantage. Finally, they have to administer the test a second time to the same group of students, do all the excising and cultural/negative emotional response/conflict stuff again, and confirm that each student earned the same or similar score as the first time they took the test (that’s basically what reliability means). If this is not the case, say the test has a reliability coefficient of .50 or lower out of 1.0 (anything above .50 is considered increasingly more reliable), the test should be revised and the whole process has to start over again.
END OF SKIPPABLE SECTION
This, of course, was not the way it played out. We were given no training, no time, and no framework to guide us through the process of designing standardized tests that would provide a valid measure of how well our students met the state-mandated Standards of Learning. We did, however, receive hard deadlines for administering each Benchmark (one a quarter), transcribing our students’ (invalid) scores into a spreadsheet for further (faulty) analysis, documenting exactly what we learned about our students’ abilities, and crafting lessons to reteach any disparities.
At the time, I was teaching Creative Writing I and II. You can already guess where this is going. Upon hearing from my principal that every class was supposed to take a Benchmark test, I thought (which was my first mistake), “Surely they can’t mean Creative Writing.” Technically, Creative Writing was a part of the English Department, but it’s more of a fine art than anything else. So I made my second mistake. I asked my principal if I had to write a multiple-choice test to assess my Creative Writing students. My principal, who was not unsympathetic, pressed his lips together, nodded at the ground, and said, “Every class.”
Like any good employee, I threw myself into the project. I spent many seconds Googling “Multiple Choice Test for Creative Writing.” This must have pushed my antiquated state-assigned laptop beyond its capabilities because the fan died and the motherboard overheated. I was then forced to come up with the test materials on my own, using pencil and paper like a Neanderthal. I over-delivered. I wrote a whopping three and a half questions before I quit and went home.
I cannot say that I’ve had too many proud moments in my career. Life isn’t a movie. Not once have I experienced a montage sequence of me drilling students, showing them devouring Shakespearean sonnets, or portraying them enthusiastically raising their hands in response to one of my questions. Not once have I ever shot a scene in which a difficult student finally related to a piece of literature because I did or said something to make him feel wanted. Not once have I, undaunted by bureaucratic mandates, lead my students to glorious victory over the forces working against them.
But one moment does stand out.
After collecting my first and only Creative Writing standardized test from my bemused and incredulous students, I gleefully observed that one of them had used the bubbles to spell, in dark, ground-in, capital letters, the word NO.
Once Central Office realized their first efforts at homegrown standardized tests were a bust, their next step was to create the damn things themselves, and by this I mean they gathered a bunch of teachers over the summer to cobble together three benchmark tests that consisted of previously released passages and questions from a variety of other tests, all of which were designed for vastly different purposes: ACTs, SATs, GREs, STAR tests—anything they could find. The end result was the Frankenstein’s Monster of all assessments, about as invalid and unreliable as one could get. When that test failed to provide the necessary results, we finally just received, out of the blue, a different one.
I don’t really know how to describe it. On the surface, it looked like an assessment. There were words and multiple-choice responses. I think one of the reading passages was about crickets. But upon further review, it was clear that the thing had been created by someone who was not a data analyst, had never undergone training in standardized test creation, and apparently never took a class on basic grammar or elementary communication. I can’t begin to express the wholly inappropriate nature of forcing students to read what amounted to a novelette’s worth of randomly chosen passages and respond to questions so profoundly poorly conceived, written, and executed that I doubt whoever wrote them understood the mechanics of the hinge. It was the worst case of educational malpractice I’d ever seen.
We used that test for almost a decade.
(NEXT MONTH: TWO MORE THINGS DESTROYING THE SOULS OF THE TEACHERS OF AMERICA, CONT’D: THING TWO)
***James Noll has worked as a sandwich maker, a yogurt dispenser, a day care provider, a video store clerk, a day care provider (again), a summer camp counselor, a waiter, a prep cook, a sandwich maker (again), a line cook, a security guard, a line cook (again), a bartender, a librarian, and a teacher. Somewhere in there he played drums in punk rock bands, recorded several albums, and wrote dozens of short stories and a handful of horror, sci-fi, and post-apocalyptic novels, including Raleigh’s Prep, Tracker’s Travail, Topher’s Ton, The Hive, The Rabbit, The Jaguar & The Snake, and Mungwort. You can check him out online at silverhammer.studio.