A-F Accountability Letter Grades: Reform or Different Name, More of the Same?
Education “reformers” are still pushing A-F accountability letter grades. In this approach, districts and schools are given letter grades like a student might receive. Are A-F letter grades a “reform”? I was inspired to write this post because Jimmie Don Aycock, current Texas House Education Committee Chair mentioned in the TFA Policy Forum held at the UT-Austin LBJ School of Public Affairs on 4/22/14 that A-F was his preferred approach to accountability. Current Texas Republican gubernatorial candidate Greg Abbott also released an education platform this week that supports the A-F letter grade system. Policymakers (in Texas and elsewhere) think A-F is a sexy idea education “reform” idea that works. So the question is whether an A-F accountability grading system is a new, more desirable reform for education policy or whether A-F is a different name and more of the same? It turns out there is already allot of evidence out there…
Test-Based Evidence on A-F
In the post The Test-Based Evidence on the “Florida Formula” Matthew Di Carlo, a senior fellow at the non-profit Albert Shanker Institute in Washington, D.C., examined the research on A-F. He wrote:
In the late-1990s, Florida was one of the first states to adopt its own school grading system, now ubiquitous throughout the nation (see this post for a review of how Florida currently calculates these grades and what they mean).
The main purposes of these rating systems are to inform parents and other stakeholders and incentivize improvement and innovation by attaching consequences and rewards to the results. Starting in the late 1990s, the grades in Florida were high-stakes – students who attended schools that received an F for multiple years were made eligible for private school vouchers (the voucher program itself was shut down in 2006, after being ruled unconstitutional by the state’s Supreme Court).
In addition to the voucher threat, low-rated schools received other forms of targeted assistance, such as reading coaches, while high-rated schools were eligible for bonuses (discussed below). In this sense, the grading system plays a large role in Florida’s overall accountability system (called the “A+ Accountability Plan”).
Among the best analyses of the effect of the system is presented in this paper, which was originally released (in working form) in 2007. Using multiple tests, as well as surveys of principals over a five-year period during the early-2000s, the authors sought to assess both the test-based impact of the grading system as well as, importantly, how low-rated schools responded to the accountability pressure in terms of changes in concrete policy and practice.
The researchers concluded that test-based performance did indeed improve among the schools that had received F grades during the early 2000s, relative to similar schools that had received a higher grade. The difference was somewhat modest but large enough to be educationally meaningful, and it persisted in future years. A fair amount of the improvement appeared to be associated with specific steps that the schools had taken, such as increasing their focus on lower-performing students and lengthening instruction time (also see here). This, along with the inclusion of low-stakes exam data in the analysis, suggests that the improvements were not driven, at least not entirely, by “gaming” strategies, such as so-called “teaching to the test.”**
A different paper, also released in 2007 but using older data from the mid-1990s to early-2000s, reached the same conclusion – that F-rated schools responded to the pressure and were able to generate modest improvements in their performance. In this analysis, however, there was more evidence of “gaming” responses, such as focusing attention on students directly below the proficiency cutpoints and redirecting instruction toward subjects, writing in particular, in which score improvements were perceived as easier to achieve (also see here and here).
It’s important to note, however, that these findings only apply to F-rated schools, which, in any given year, are assigned to only 2-3 percent of the state’s schools. There is some tentative evidence (paper opens as PDF) that schools receiving D grades improved a little bit (relative to those receiving C’s), but that schools receiving A-C grades did not vary in their performance (this doesn’t necessarily mean they didn’t improve, only that, according to this analysis, they didn’t do any better than schools receiving higher grades).
(Interesting side note: These findings for D-rated schools, which did not face the voucher threat, in addition to other analyses [see this 2005 working paper and this very recent conference paper], suggest that the impact of the grading system may have as much to do with the response to the stigma of receiving a poor grade as that to the threat of voucher eligibility.)
Overall, based on this work, which is still growing, it’s fair to say that the small group of Florida schools that received low ratings, when faced with the threat of punishment for and/or stigma attached to those ratings, responded in strategic (though not always entirely “desirable”) ways, and that this response may have generated small but persistent test-based improvements. This is consistent with research on grading systems elsewhere (seehere and here for national analyses of accountability effects).
However, the degree to which these increases reflected “real” improvements in student/school performance is not easy to isolate. For instance, at least some schools seem to have responded in part by focusing on students near the cutoffs, and even the more “desirable” strategies may have less-than-ideal side effects – e.g., a school’s increased focused on some students/subjects may come at the expense of others. These are very common issues when assessing test-based accountability systems’ impact on testing outcomes.
I took up the impact of the Florida A-F in the post Lurking in the Bushes: Peeking at Florida Education Miracle. I wrote:
There is another education presidential candidate lurking in the Bushes with an education “miracle” being discussed extensively in the media and elsewhere. Critics have pointed out that the miracle in Florida is no more real than the education miracle in Texas that spawned No Child Left Behind a decade ago— another elegant illusion of numbers? Some say the skeptics are wrong in their analyses of recent educational success in Florida.
What were my conclusions looking at a decade of Florida data (NAEP, Graduation Rates, ACT, SAT) compared to peer states and the nation? In sum, NAEP scores seemed positive (with caveats). However, do NAEP scores determine the future of Florida’s students? When we consider the measures that actually matter for many kids’ lives: Graduation rates, ACT and SAT the results are dismal for Florida. Granted, this post was only a peek at the Florida system…
So what is the rest of the Florida story?… as Paul Harvey used to say…
The Florida A-F Miracle
Fortunately for us Mercedes Schneider, a Louisiana high school teacher, statistician, and blogger has dedicated an ENTIRE chapter in her new book to the implementation of Jeb Bush’s A-F in Florida. Her book is entitled Chronicle of Echoes: Who’s Who in the Implosion of American Public Education. The following is a never before seen excerpt from Chapter 13.
Let’s begin with school letter grades. The school letter grade is a means of narrowly defining school “success” chiefly based upon student standardized test scores in select subjects. This narrowness is illustrated in the declared purpose of school letter grades according to Bush’s FFF:
School grades reflect whether students are learning a year’s worth of knowledge in a year’s time, which is the leading indicator of a quality education…. Parents use school grades to understand the quality of education their child is receiving so they can make informed decisions for their family.15
As a statistician, professional researcher, and classroom teacher, I cannot emphasize just how naïve and limited the above statement is. For one, learning is not linear. Learning is complex and cannot be partitioned “a year at a time.” Nor can it be partitioned “by subject.” Nor is it reasonable to conclude that learning occurs at some standardized rate for all learners. Finally, it is foolish to believe that student test scores are clearly and singularly connected to the school the student attends.
School letter grades are supposed to provide parents with a means of “understanding the quality of education their child is receiving.” The Hoover Institute refers to the letter grading of schools as an “intuitive” system.16 However, immediately following this stated purpose is information noting that the school letter grades are dependent upon standardized tests in only certain subjects. How “intuitive” is that?
School grades are based on FCAT (Florida Comprehensive Assessment Test) scores in reading, writing, math and science. Half of the grade is based on performance, which is the percentage of students who have the knowledge and skills required for their grade level. Half of the grade is based on progress, which is the percentage of students who gained knowledge and improved their skills from one year to the next even if they are not yet on grade level.17
Thus, Bush and the State of Florida discount untested subjects, including all social studies, fine arts, and physical education. If testing is the end-all, then these courses simply do not matter to a well-rounded, “quality” education. Since such courses (and others, especially at the high school level) are not measured using standardized test scores, their contributions to student education are in effect declared useless.
According to the same FFF information, there are also other factors that contribute to school grade calculation:
First, schools must test at least 90% of their students to earn a grade and 95% of their students to be eligible for an A. Schools that test fewer than 90% of their students are given an Incomplete, or “I.”
Second, at least half of the students who score in the lowest 25% must show progress from one year to the next. Schools who fail to meet this requirement lose a letter grade (from an A to a B, from a B to a C, and so on).18
Such obvious capriciousness in school letter grade calculation. For example, setting threshold values requiring the “testing at least 90%” is an arbitrary practice. The practice of calculating letter grades is far from consistent from one year to the next. Consider this explanation of school letter grade calculation changes from 1999 up to 2011:
The purpose of this technical assistance paper is to provide a description of the procedures used to determine school grades for the 2011 school year. In 2011, all school grades include four measures of student achievement and four measures of student learning gains plus several non-FCAT-based components for high schools. Florida’s current school accountability system originated with state legislation passed in 1999 (the “A+ Plan”) and has been revised periodically to reflect increased standards and expectations for student performance. Florida is the first state to track annual student learning gains based on the state’s academic standards.
School grades have been issued since 1999, with the Florida Comprehensive Assessment Test (FCAT) being the primary instrument in calculating school grades. In 2002, significant improvements were made in how school grades were calculated to fully implement the intent of Florida’s original plan. The most noteworthy improvement was the inclusion of student learning gains. Additionally, a measure was added to determine whether the lowest performing students are making annual improvements in specified subjects. Florida’s accountability system allows the improvement of individual students to be tracked from one year to the next based on FCAT developmental scores in reading and mathematics in grades 3 through 10. In 2010, Florida’s school grading system was further revised to include several additional measures for high schools, including the four-year graduation rate, the graduation rate for at-risk students, participation and performance in accelerated curricula, and postsecondary readiness, as well as a component for measuring annual growth or decline in these measures. In 2011, the Grade 9 FCAT Mathematics Assessment was discontinued with the phase-in of the state’s Algebra 1 End-of-Course (EOC) Assessment (which will not be used in school grades until the 2011-12 school year). Also in 2011, the “percent proficient” criterion for the FCAT Writing component was changed to the percent scoring at 4.0 and above (from the percent scoring at 3.5 and above).19 [Emphasis added.]
If school letter grade calculation is “revised,” it is a tacit admission that the grade calculation was not as “good” as it should have been in previous years, yet schools were still subjected to the consequences of the “not as good” system. Furthermore, there is the issue of the inability to compare school letter grades from one year to the next. If the criteria for calculation are in constant flux, then comparisons from one year to the next are meaningless.
Bush doesn’t talk too much about such issues. And the Hoover Institute, in its promoting of the Bush 1999 A+ Plan, did not anticipate the potential chaos that floating criteria could make of the school letter grading system. According to Hoover:
The grading system under A+ (Bush’s 1999 Florida education reform package)does a satisfactory job of identifying higher quality schools and an even better job of identifying those that are the least effective.20
Notice the change of exam in 2012 from the FCAT to the EOC (End of Course Test):
The purpose of this technical assistance paper is to provide a description of the procedures used to determine school grades for the 2012 school year. School grades include four measures of student achievement and four measures of student learning gains plus several components for high schools that are based on measures other than state assessments, as well as a new measure for middle schools that measures participation in and performance on high-school-level end-of-course (EOC) assessments. Florida’s current school accountability system originated with state legislation passed in 1999 (the ―A+ Plan‖) and has been revised periodically to reflect increased standards and expectations for student performance. Florida is the first state to track annual student learning gains based on the state’s academic standards.
Additional substantive changes to the school grading system were adopted by the State Board of Education in 2012, including new assessments and achievement level cut scores, expansion of performance measures to include students with disabilities and English language learners, implementation of a new middle-school component measuring participation in and performance on high-school level end-of-course (EOC) assessments, and a more rigorous graduation rate formula for high school grading.21[Emphasis added.]
“Substantive changes” translates to “the school letter grades are no longer comparable from year to year, but we will continue to compare, anyway.” The press certainly does so.
So many revisions to the now-tedious Florida school letter grade formula bring increased opportunities for calculation errors. In 2012, the Florida Department of Education forgot part of the formula in the calculation of school letter grades for 213 Florida schools. This omission did not inspire confidence in an already-capricious school letter grade system:
State education administrators, who are in charge of grading schools and students, failed to follow their own formula.
In fact, they forgot part of it.
The error means 48 schools in South Florida will get higher, revised grades: 31 in Miami-Dade and 17 in Broward.
The mistake has piled more doubt on the state’s accountability system.
“A flawed accountability system that forgets to embed a critical element in its formula … is an accountability system that needs reform,” said Miami-Dade Superintendent Alberto Carvalho Monday. “And those that lead it need to consider the implications of their actions.”
The state’s accountability system has come under fire by parents who think their children take too many tests; by teachers whose evaluations now depend in part on test scores; and by educators who believe the state has made too many policy changes, too fast. The state Department of Education announced the revision of letter grades at 213 schools statewide —with the most in Miami-Dade — in a news release late Friday night. All had their grade raised one letter grade.
Carvalho joined the chorus of criticism, even though Miami-Dade schools benefited from the correction. “I have lost confidence in an accountability system that is not only ever-changing but fails to accurately depict student learning and the effectiveness of teachers,” he said.22[Emphasis added.]
Doesn’t sound like much of a “miracle,” does it?
Are you still with me? A-F = More convoluted “accountability” and definitely not a miracle in Florida.
Almost a Decade Post-Katrina: Give A-F an F in Louisiana
On her blog, Mercedes Schneider begins the discussion of Louisiana A-F in the post New Orleans’ Recovery School District: The Lie Unveiled with the following:
Yesterday, I was watching a video clip of the 2011 Aspen Institute debate between Wendy Kopp and Diane Ravitch. In a final effort to defend corporate reform, Kopp tells the audience, “I encourage everyone to see for yourself, study for yourself… New Orleans….”
So let’s take Wendy up on her challenge.
Schneider writes about the A-F grades in the “Recovery District”:
Of the 60 state-run RSD schools (59 from the RSD website plus omitted Nelson) included on the DOE 2012 school-level data spreadsheet (both admin and public versions), none received an A as a school letter grade.
Of 60 state-run RSD schools, only 6 received a B in 2012. That’s 10%.
One RSD school, Gentilly Terrace, received a T, meaning no grade this year. A free pass.
According to Jindal’s and the State of Louisiana’s definition of a failing school, the remainder of the RSD schools given letter grades are failing. That’s 90%.
In 2012, 5 state-run RSD schools received a C.
In 2012, 19 state-run RSD schools received a D.
In 2012, 29 state-run RSD schools received an F.
Given that RSD is overwhelmingly comprised of charter schools (83%, based upon information available on the RSD website), I think it safe to write that praise of unqualified charter school success in New Orleans is unfounded.
Approaching a decade since Katrina and this is all Gov. Jindal and Supt. John White have to show for it? The “Recovery District” and A-F should be given an F.
Notably, the policy brief entitled Review of the Louisiana Recovery District released by the National Educational Policy Center (NEPC) found that Louisiana had raised and lowered the accountability ratings in Louisiana “to justify converting public schools into charter schools, and then to justify keeping them as charter schools.” Has the manipulation of A-F for political gain and/or ideology been limited to Louisiana?
Arbitrary A-F In Indiana
Indiana also decided to adopt A-F under the guidance and leadership of Tony Bennett (who later ended up in Florida). What is really clear about A-F, and accountability in general, is that they are arbitrary and political. I wrote the following in the post “Oh, crap”: Accountability is Arbitrary and Political
In the 1990s, I worked for the Houston Independent School District and we had our own accountability system that ran parallel to the Texas accountability system. One year I was responsible for setting the formulas for our rating system (Exemplary, Acceptable etc) I remember one day sending my calculations for the accountability system to Coach Paige’s (former Secretary of Education and Godfather of NCLB) office and getting the reply back that I had set the bar too high, we had too many low-performing schools (~35). So I reset the accountability formula and it yielded too few low-performing schools (~5) according to the edicts from above. Then I set the formula so that it yield approximately 15 low performing schools—the Three Little Bears’ porridge was just right for the Superintendent.
Just in case you still thought that accountability was neither arbitrary, nor political. There is interesting news today out of Indiana uncovered by the AP. Turns out that emails obtained by the AP show that Tony Bennett, former Indiana State School Superintendent, and current head honcho of Florida schools, changed ENTIRE Indiana accountability system just to benefit a top GOP donor’s charter school #?!?
#cronyism #charters. The AP reported:
Former Indiana and current Florida schools chief Tony Bennett built his national star by promising to hold “failing” schools accountable. But when it appeared an Indianapolis charter school run by a prominent Republican donor might receive a poor grade, Bennett’s education team frantically overhauled his signature “A-F” school grading system to improve the school’s marks.
Emails obtained by The Associated Press show Bennett and his staff scrambled last fall to ensure influential donor Christel DeHaan’s school received an “A,” despite poor test scores in algebra that initially earned it a “C.”
“They need to understand that anything less than an A for Christel House compromises all of our accountability work,” Bennett wrote in a Sept. 12 email to then-chief of staff Heather Neal, who is now Gov. Mike Pence’s chief lobbyist.
The emails, which also show Bennett discussed with staff the legality of changing just DeHaan’s grade, raise unsettling questions about the validity of a grading system that has broad implications. Indiana uses the A-F grades to determine which schools get taken over by the state and whether students seeking state-funded vouchers to attend private school need to first spend a year in public school. They also help determine how much state funding schools receive.
…trouble loomed when Indiana’s then-grading director, Jon Gubera, first alerted Bennett on Sept. 12 that the Christel House Academy had scored less than an A.
“This will be a HUGE problem for us,” Bennett wrote in a Sept. 12, 2012 email to Neal.
Neal fired back a few minutes later, “Oh, crap. We cannot release until this is resolved.”
By Sept. 13, Gubera unveiled it was a 2.9, or a “C.”
A weeklong behind-the-scenes scramble ensued among Bennett, assistant superintendent Dale Chu, Gubera, Neal and other top staff at the Indiana Department of Education. They examined ways to lift Christel House from a “C” to an “A,” including adjusting the presentation of color charts to make a high “B” look like an “A” and changing the grade just for Christel House.
It’s not clear from the emails exactly how Gubera changed the grading formula, but they do show DeHaan’s grade jumping twice.
“That’s like parting the Red Sea to get numbers to move that significantly,” Jeff Butts, superintendent of Wayne Township schools in Indianapolis, said in an interview with The Associated Press.
“I am more than a little miffed about this,” Bennett wrote. “I hope we come to the meeting today with solutions and not excuses and/or explanations for me to wiggle myself out of the repeated lies I have told over the past six months.”
Bennett said Monday that email expressed his frustration at having assured top-performing schools like DeHaan’s would be recognized in the grading system, but coming away with a flawed formula that would undo his promises.
When requested a status update Sept. 14, his staff alerted him that the new school grade, a 3.50, was painfully close to an “A.” Then-deputy chief of staff Marcie Brown wrote that the state might not be able to “legally” change the cutoff for an “A.”
“We can revise the rule,” Bennett responded.
Over the next week, his top staff worked arduously to get Christel House its “A.” By Sept. 21, Christel House had jumped to a 3.75. Gubera resigned shortly afterward.
Is this isolated to Indiana? Of course not. All accountability levels and ratings are subjective and arbitrary. They have absolutely no empirical or evidence base. No Child Left Behind, for all its demands for scientifically based research, was never based on scientific evidence that it would close the achievement or opportunity gaps. Thus, no surprise when it won’t deliver as promised in the law by 2014.
So let’s summarize the evidence on A-F (because you may have scrolled all the way down here anyways). 1) The test based evidence is mixed in the research literature. What is especially problematic are the gaming actions noted by researchers. So, different name and more of the same 2) The Florida A-F has been convoluted, not more simple. So, different name and more of the same. 3) A-F in New Orleans gets an F because 90% of schools were still failing. So, different name and more of the same. 4) A-F is accountability arbitrary and political. So, different name and more of the same.
In conclusion, while Florida and Lousiana are prominently cited and lauded by “reformers” (i.e. Michelle Rhee) and derided by critics— what is not under debate about those two states is that they still consistently perform in the bottom fifth of all states on NAEP (See Who’s Smarter Than Texans?: Math and Science Test Scores Compared to the World and Nation). Should we really allow Louisiana and Florida “reforms” to lead by an example from behind, somewhere very near the bottom? Wait, that is exactly the approach that policymakers in D.C. chose for national education policy reform by importing NCLB from Texas. The education “reformers” public mantra for A-F should be: “Never let evidence trump our ideology.”
Please Facebook Like, Tweet, etc below and/or reblog to share this discussion with others.
Want to know about Cloaking Inequity’s freshly pressed conversations about educational policy? Click the “Follow blog by email” button in the upper left hand corner of this page.
Click here for Vitae.
Please blame Siri for any typos.
Interested in a Masters in Educational Policy and Planning from UT-Austin? It’s not too late to apply. Go here.