The High-Stakes Testing Playbook: False Bill of Goods?
In the summer of 2012, a Houston Chronicle editorial argued that the state was faced with “widespread panic” at the low levels of students passing the State of Texas Assessments of Academic Readiness (STAAR), TEA and Pearson’s new testing regime.
Don’t panic the scores will go up— and they will do so over the next 5-7 years or so as Pearson collects $1 Billion from the state of Texas (and more from other states). How do we know this?
To examine Pearson’s playbook for high-stakes exams, I will examined cross-sectional high school exit exam data from the inception of high-stakes testing and accountability in 1994 through 2010. During this time, Texas utilized two generations of accountability assessment systems. The first generation of relied on the Texas Assessment of Academic Skills (TAAS) and lasted 1994-2002. The second generation included the Texas Assessment of Knowledge and Skills (TAKS) and includes data from 2003-2010. The descriptive statistical analyses focus on African American and Latina/o high-stakes high school exit test score trends for 10th graders (TAAS, 1994-2002) and 11th graders (TAKS, 2003-2010).
TAAS Exit Exam
Figure 1 shows that African Americans dramatically increased their achievement on the TAAS Exit Math, from only 32% meeting minimum standards in 1994 to 85% by 2002. Concurrently, the percent of Latina/os meeting minimum standards increased from 40% to 88%. Although achievement gap between minorities and whites remained, the gap for Latina/os and African Americans narrowed to 8% and 11%, respectively, between 1994 and 2002.
Figure 2 also shows large gains in the percent of African American and Latina/o students meeting minimum standards on the TAAS Exit Reading. By 2002, TEA reported that 92% of African Americans and 90% of Latina/os in the state had met minimum standards on the TAAS Exit Reading. African Americans showed an increase of 32% more students meeting minimum standards, while Latina/os showed an overall increase of 29%. The achievement gap closed to 8% for Latina/os and 6% for African Americans.
Figure 1. Texas Assessment of Academic Skills (TAAS) Exit Math: Percent meeting minimum standards (1994–2002). Source: Statewide TAAS Results, by the Texas Education Agency, 2003b.
Figure 2. Texas Assessment of Academic Skills (TAAS) Exit Reading: Percent meeting minimum standards (1994–2002). Source: Statewide TAAS Results, by the Texas Education Agency, 2003b.
TAKS Exit Exam
In 2003, the TAKS replaced the TAAS as the exit exam in Texas. As shown in Figure 3, between 2003 and 2010 the percentage of African Americans passing the TAKS Exit Math increased from 25% to 81%, a gain of 56%. Latina/os showed a similar gain of 55% more students meeting minimum standards on the TAKS Exit Math (from 30% to 85%). Similar to the closing of the achievement gap on the TAAS Exit Math, the TAKS Exit Math gap for African Americans and Latina/os decreased to 4% and 9% by 2010 (see Figure 3).
Figure 3. Texas Assessment of Knowledge and Skills (TAKS) Exit Math: Percent meeting minimum standards (2003–2009). Source: Statewide TAKS Performance Results, by the Texas Education Agency, 2009.
Figure 4. Texas Assessment of Knowledge and Skills (TAKS) Exit English Language Arts: Percent meeting minimum standards (2003–2009). Source: Statewide TAKS Performance Results, by the Texas Education Agency, 2009.
During the past 8 years of TAKS Exit testing, the percentage of African Americans passing the TAKS Exit English Language Arts increased 43%, while the proportion of Latina/os meeting minimum standards increased 38% (see Figure 4). Similar to the closing of the achievement gap noted on the TAAS Exit Reading, the gap between African American and White students decreased to 6%. By 2010, the gap between the percent of Whites and Latina/os passing the TAKS Exit English Language Arts had declined to 7%.
Have we been sold a false bill of goods?
Critics questioned the validity of TAKS and TAAS score growth over time due to the lowering of cut scores in successive state-mandated testing regimes (Mellon, 2010; Stutz, 2011). What is a cut score? If a test includes 30 items, for example, how many must a test taker get right to deem him/her proficient in that area? Must they get 29 correct? 28? As the stakes of doing well (or poorly) increase, there are explicit and implicit forms of gaming associated with how these arbitrary cut scores are determined (Glass, 2003). For example, in the spring of 2005, Arizona State Board of Education and Schools’ Chief Tom Horne publicly debated the merits of two different cut scores: one which would have resulted in 71% of Arizona students passing, and the other resulting in 60% of them passing. In short, the state board wanted “easier” standards while Tom Horne was arguing for “tougher” standards (Kossan, 2005).
In Texas, TEA has consistently lowered the standards in successive testing regimes. Stutz (2001) reported that TEA lowered testing standards:
1.9 million students tested in math, reading and other subjects… were required to correctly answer significantly fewer questions to pass the high-stakes Texas Assessment of Academic Skills.
For example, in math, students had to get only about half the math questions right. Two years earlier, they had to get about 70 percent of the TAAS questions correct. TEA also conducted similar reductions to the TAKS testing standards. Mellon (2010) revealed:
The biggest change involved the social studies test for students in grades 8 and 10. This year, for example, eighth-graders had to answer correctly 21 of 48 questions — or 44 percent. Last year, the passing standard was 25 questions, or 52 percent.
The lower passing standards that TEA has consistently implemented over time calls into question the much- touted improvements on the state-mandated TAAS and TAKS testing regimes.
In 2005, Achieve Inc compared state high-stakes test proficiency levels with those set by National Assessment for Educational Progress (NAEP), a federally funded achievement test viewed as a comparable assessment to most state tests. When it came to fourth grade math performance in 2005, states varied widely in how they defined “proficient” (Achieve, 2005). Compared to NAEP’s standard of proficiency, Mississippi’s tests were the “easiest” compared to NAEP standards, whereas Massachusetts’ assessments were much “harder.” Texas exams were on the easier side.
The nature of cut scores brings into question the meaningfulness (or content-related validity) of resultant high-stakes testing performance. Data we review from Texas (Figures 1-4) suggest that overtime greater proportions of African American and Latina/o students have attained minimum levels of competency on the state’s TAAS/TAKS tests. Although this pattern may represent the “truth” in the public policy sphere, the empirical research on the consequential nature and limited validity of testing in Texas makes this interpretation somewhat suspect. As others have pointed out, favorable patterns of student performance on Texas’s high-stakes tests are more likely the result of lowering of cut scores standards and suspicious exclusionary practices (removing low scorers from test taking) or other forms of data manipulation (e.g., misrepresentation of dropout/graduation rates), particularly with respect to African American and Latina/o populations (Haney, 2000; Linn, Graue, & Sanders, 1990; Shepard, 1990; Stutz, 2001; Mellon, 2010; Vasquez Heilig & Darling-Hammond, 2008). Thus, it is difficult to ascertain how much students actually learn under high-stakes testing environments (Amrein & Berliner, 2002; Nichols, Glass, & Berliner, 2006)
With the STAAR, I suspect the playbook will be the same that we saw with the TAAS and TAKS— set a very low bar initially that is easy to leap over in successive years providing the appearance of improvement year after year. Meanwhile, as we showed in our report examining the past decade of Texas education data, over the past decade, independent national exams (ACT, SAT and NAEP) taken by Texas students don’t show such rosy results relative to the rest of the nation.
Have we been sold a false bill of goods?
Score: Pearson $1billion Public 0
For references see: Vasquez Heilig, J. & Nichols, S. (2013). A quandary for school leaders: Equity, high-stakes testing and accountability. L. C. Tillman & J. J. Scheurich. eds., Handbook of Research on Educational Leadership for Diversity and Equity, New York: Routledge.