A Lot Has Changed Over the Past 40 Years — But Not America’s School System. Why?
By Margaret Raymond,
24 days agoThe 74 is partnering with Stanford University’s Hoover Institution to commemorate the 40th anniversary of the ‘A Nation At Risk’ report. Hoover’s A Nation At Risk +40 research initiative spotlights insights and analysis from experts, educators and policymakers as to what evidence shows about the broader impact of 40 years of education reform and how America’s school system has (and hasn’t) changed since the groundbreaking 1983 report. Below is the project’s conclusion, penned by Margaret Raymond. ( See our full series )
In 1983, the National Commission on Excellence in Education (NCEE) released A Nation at Risk (ANAR), which issued a wake-up call, named the state of US education a crisis, and presented thirty recommendations for action. It bears noting that the Commission’s recommendations were targeted in focus and scope, leaving the prevailing “one best” district-based education model intact. We will never know whether larger-scaled interventions were considered or not. Whatever the genesis, the final recommendations left education policymakers with an organizational checklist, and as the essays in this series have demonstrated, they responded accordingly.
A Nation at Risk + 40 brought together twelve exceptional scholars and thought leaders to review the nation’s response to the Commission’s challenge. At the outset of this research collaboration, compiling the record of forty years of school improvement efforts and summarizing the available evidence of their respective impacts on student outcomes appeared straightforward, if even a bit tedious. It turned out to be anything but that.
Each of the twelve essays fulfilled its assignment. In each strand of investigation, the authors documented the evolution of improvement activity and —where it exists — described the degree to which the efforts paid off. On its own, every one of the essays makes an important contribution to our ongoing national conversation about the critical state of the public K–12 education sector. While we make no claim that the scope of inquiry was definitive, the separate reviews cover billions of dollars in major programs and initiatives pursued by districts, states, and philanthropy. Many of these initiatives were incentivized by Congress and span Republican and Democratic presidential administrations. Our authors offer their own recommendations that, if followed, hold promise to improve conditions in the spheres they examined.
Related40 Years after ‘A Nation at Risk’: What Worked, and Meeting the Challenges Ahead
The research collaborative delivered an even more valuable asset, as the result is far more than the sum of the parts. Until the essays were gathered into a collection, the aggregate record of attempts to improve the K–12 education system in the United States was uncharted and unrecognized. We know of no other compilation that illuminates the sheer breadth of reform activity.
For the first time, we can compare the impacts across different areas of investment. Beyond this, taking the full collection as a whole augments the strand-specific recommendations with several crosscutting observations to inform future action.
What did we do?
There can be no dispute that, as a nation, we certainly tried hard to fix the problem. Practically speaking, we addressed every node that was mentioned by the Commission and several that weren’t. It is remarkable how doggedly educators, policy leaders, advocates, and funders have augmented policy and practice with interventions. The sheer volume and spread of reform efforts are worth examining, as they begin to shed light on the situation we currently face in public K–12 education.
Other scholars (Hattie 2023) have used evaluations and other research to rank the impact on student performance of various reforms. The impact estimates are drawn from a vast collection of meta analyses, yielding a super-meta-analysis that rank-orders reported results across different interventions. The rankings are widely interpreted as the definitive, adjudicated, and authoritative guide to improving student performance. In statehouses, state education agencies, and school districts, the rankings have taken on mythic proportions in guiding policy decisions about school improvement.
It is easy to see the appeal. The aim is noble, and the appetite is intense. Sadly, deeper inquiry into the rankings shows significant problems with the work: the desire to be expansive sits in tension with the need to apply stringent criteria about which meta-analyses are fed into the rankings. We learned that the underlying quality of the reform interventions themselves and the rigor of the research about their effects varied widely. To illustrate with a hypothetical: in the rankings, one thousand low-quality interventions with medium-strength evidence receive higher weight than one hundred high-quality interventions with a high-quality evaluation.
The concerns go beyond the problem of the quality of evidence. The implication for policymaking and educator practice is that the rankings encourage devotion to one or two marginal adjustments to schooling at the expense of lower-ranked options. The greatest risk lies in overlooking emerging successes for years until the next update to the rankings occurs.
Wishing to avoid a similar result, we chose a different approach to exploring the body of evidence. Beyond the notable volume of reform efforts attempted over the past forty years, it is useful to consider the points of the system that the various reforms were designed to change. This is important because many of the checklist items from ANAR’s recommendations aim at strengthening only one facet of the K–12 system, and the Commission did not offer recommendations on mixing, matching, or stacking multiple reform efforts.
The stability of the basic model of US K–12 public education over four decades is advantageous for our purposes because it supports a generalized theory of action, sometimes called a “logic model.” Theories of action specify the types of capital, staffing, and other resources that are needed to provide K–12 education. Theories of action also detail the policies and practices that are followed. Inputs and processes combine to produce a near-term result referred to as “outputs.” The eventual value of the results is identified as “outcomes.” With this lens, we classify the policies, programs, and initiatives discussed by the essay authors in order to learn about the targets and yields of reform activity. To be clear, some improvement efforts span our classification categories (e.g., some professional development includes input and process features); these are assigned by their most prevalent attributes.
Our authors are highly sensitive to the availability and caliber of research and evaluation. In many areas, such as public school choice and inclusion of master teachers in educator preparation programs, no evidence exists. In other areas, impact information is hindered by studies involving few examples, fuzzy specifications, or weak counterfactuals. Evaluative studies of school-based health centers and socio-emotional learning are examples where evidence of impact is lacking. The field of impact studies has evolved in constructive ways, but it still hinges critically on a weak commitment to objective assessment of impacts and the discipline to incorporate the insights into practice.
Inputs
A preponderance of the improvement efforts identified by the authors sought to adjust the inputs used by the education system. These include teacher-focused efforts such as alternative certification and incentive pay arrangements, adding school-based health centers, strengthening early childhood programs, and overhauling curriculum. System-focused input changes seek to expand the variety of inputs or the overall structure of the system, whereas marginal input reforms seek to improve the quality of the selective resources within the existing stock.
Taken together, these efforts aimed to enrich the ingredients in the “recipe” for K–12 education. Focusing reform attention on adjusting the quantity, quality, or intensity of a factor before it is used keeps the reform at arm’s length from the actual production of education. Think of upgrading tires on a race car — the improvement to the equipment takes place offline and then is brought online in the hopes of improved performance.
The evidence shows that the range of impacts for inputs-focused reforms run from zero to as much as three-quarters of a year of additional achievement for students. About half the input reforms have negligible or no effect on student academic achievement. The options that show no impact share the attribute of shallow or isolated treatment—a few hours of professional development or play-based preschool. For both system-focused and marginal input reforms, positive results point to interventions that have significant weight, scale, and duration to create and sustain the momentum for change. As examples, we see this in the small-schools movement (systems focused) and in laser-focused teacher professional development (marginal adjustments).
Input reforms assume that the rest of the system will respond organically to the change in the treated input. As the evidence shows, many efforts provided too little leverage to lift the rest of the operation. Worse, an exclusive input focus ignores the possible interactions with other components that may react in different ways than expected.
Processes
Process reforms aim to change the way education is created, delivered, and monitored by schools and their oversight bodies. To extend the recipe analogy, processes are the mixing and cooking instructions. Marginal process reforms attempt to mix inputs in new ways or interact inputs with new policies or protocols. Systemwide process changes try to ubiquitously reengineer old ways of doing things to produce better results, such as the experience of adopting the IMPACT teacher evaluation and compensation initiative in Washington, DC, or implementing a digital learning platform across all the middle schools in a district.
Given the challenges of designing and implementing new programs, it is little wonder that our authors found fewer process reform examples in their scans. Across the essays, the authors identified three general areas of process reforms.
Teacher professional development falls largely into the process category—selected areas of knowledge and skills are targeted to expand the capacity of teachers to perform their duties. This differs from input reforms, which are directed toward improving the number or quality of candidates at the point of hiring. The available evidence suggests that for much of the past forty years, there was little or no effect from a large proportion of professional development. Recent evidence, however, shows positive impacts when the programs are strictly focused, multifaceted, and sustained, producing between one and four months of extra achievement.
Incentive programs for higher teacher performance have strong impacts on student academic achievement for their duration, from about two months to an extra year of added achievement. However, these impacts are largely one sided; they did not induce low-performing teachers to move up or move out. Rather, they provided financial and work assignment flexibility incentives for teachers. Similar programs that trade extra compensation for teaching in the most challenging settings also produce strong student gains of similar magnitudes. Both types of reforms are highly vulnerable to political disruption at all points of the program, especially if teachers’ participation requires evaluation of their performance.
Technology adoptions can also be classified as process reforms. Once technology has been purchased and distributed, it serves a process function. The evidence of impact from the broad provision of education technologies has, for the most part, been disappointing, showing no impact and substantial stranding of investments. Despite that general trend, however, a number of significant and strongly positive examples of technology-supported education have emerged as promising proof points.
The third area of process reforms occurs at the governance level of the system. Since ANAR’s release, states have changed the way they fill key positions on their boards of education and within the Council of Chief State School Officers. The change in appointment mechanisms is a process change whose influence is systemwide. Likewise, changes in district school boards to a portfolio management model also flow across the district system. The evidence on these governance changes has been mixed.
It is clear that important differences exist between systemwide process changes and those that are marginal in nature. Some process reforms can work only if introduced systemwide, such as adoption of student safety protocols or school-based disciplinary programs; a “half a loaf” approach won’t work. Alternatively, marginal process change can be narrow in scope, in terms of either the focus of the reform or the organizational level that is targeted. Pilot programs are a clear example. In marginal process reforms, the rest of the schooling equation remains untouched. The balance between systems and marginal processes can shift either way depending on the interplay of cost, the scope of the planned innovation, friction with adjacent policies or practices, and political resistance.
Moreover, estimating the effects of process changes is technically and practically more difficult than measuring the effects of input shifts. The interactions of new processes with other factors and their dynamic nature over time create complexity that is difficult to measure. The body of evidence is therefore smaller than exists for input-focused changes. New instructional models such as discovery or expeditionary learning are process changes. The evidence on these is thin, except for personalized learning modalities, which show strongly positive effects on learning gains and graduation rates.
Likewise, the expansion of technology — equipment, connectivity, and content—in schools is a process change that has altered the way curriculum and instruction are organized and deployed. The impacts are sobering: unused resources cannot advance learning, but where strong implementation occurs, we also see improved student academic achievement.
The final set of process changes can be grouped as “infusion” efforts. Extended school years appear not to improve student results, but additional time in focused instruction helps; the extra time matters only if it is used well. Similarly, teacher and leader professional learning programs are seen as a mixed bag. As with extra time in school, the evidence shows that focused and targeted experience can produce positive impacts on student learning, but those conditions do not appear to be the norm.
Although they have a smaller evidence base, process reforms deal with larger segments of the education enterprise than inputs. Those that work share the attribute of internal design coherence, even if they do not fit well into the rest of the system. Finally, the larger the process reform, the more of a political target it offers to opponents.
Outputs
When we consider the near-term results of elementary and secondary education or the milestones on the way to reach these results, we are discussing outputs. These are the immediate products that reflect the end state that inputs and processes have created. In K–12 education, common outputs include meeting learning benchmarks for grade promotion, satisfying graduation requirements, and implementing performance measures for teachers and leaders. It bears noting that outputs are agnostic to inputs and processes: many combinations are possible to create a particular output.
Systems-oriented improvement efforts have been judged by both outputs and outcomes. In Cami Anderson’s essay on the results of districtwide reform strategies in Newark, New Jersey (chapter 12), early childhood enrollment increases of 35 percentage points were one output. Another was the rise of 20 points in the percent of Black students enrolled in above-average schools, followed by significant early gains in reading achievement and eventual gains in math. Ironically, the impressive improvements in Newark were not tallied to be a successful outcome, largely because of friction in the community and with elected leaders. Similar efforts under the US Department of Education School Improvement Program did not create positive results.
There are other examples of reforms that aim to change outputs. Redirecting school board activity to prioritize academics and student learning has been shown to produce positive movement on outcome measures for schools and districts.
The largest efforts to move outputs of elementary and secondary schooling lie in the national adoption of accountability programs. The consequential approach to school-based accountability advanced by the No Child Left Behind Act (NCLB) improved learning by one half per year of student achievement and narrowed achievement gaps between groups of students. High school graduation rates increased by 15 percentage points with concomitant increases in college enrollments. These improvement trends persisted through 2015, but they have all but reversed over the past eight years, with student learning falling dramatically over the course of the COVID-19 global pandemic.
Other efforts to affect teacher preparation programs also looked at outputs, but to no avail: current teacher certification exams are unable to predict future variations in teachers’ performance once they are in the classroom. Other common indicators, such as academic credentials or years of experience (also inputs), are similarly disconnected from future teacher performance.
Finally, some reform activities deliberately circumvent mainstream institutions and channels in an attempt to create better outputs. Extra-system initiatives can take the form of inputs or processes, or they can combine the two. Some options that have shown positive impacts for student results include mayoral control (significant gains in achievement and better fiscal controls) and gubernatorial appointment of state board members (better performance on the National Assessment of Educational Progress assessments).
As noted by other scholars, school choice can arise within, across, or outside of school systems (Lake 2020). Intradistrict school choice redistributes seats in schools by changing the way students are assigned to schools; it aims to improve the outputs for the students who access better classrooms. As a process reform, it is associated with stronger achievement in math for minority students. Interdistrict choice is rare, and its effects are not well studied. Charter schools operate in a separate policy stream and deliver stronger growth and achievement in reading and math, especially in urban charter school networks (CREDO 2023). For vouchers, the impact for students on balance has not been positive; the evidence on vouchers shows weaker achievement for enrolled students even as they create positive spillover impacts on public schools. Other efforts that move outside the usual institutional arrangements are less understood. Newer options such as education savings accounts (ESAs) and microschools have yet to be examined in depth.
RelatedThe Terrible Truth: Current Solutions to COVID Learning Loss Are Doomed to Fail
Outcomes
In an education theory of action, outcomes are the final results of the entire enterprise. Outcomes differ from outputs because they apply external standards and criteria to the nominal outputs to make judgments about what is “good enough.” So, while outputs may be expressed as test scores, CTE credentials, or course completions, when we apply evaluation standards such as postsecondary readiness, we are making judgments about the performance that was produced.
Since ANAR was released, we have gained clarity, if not conviction, about what we intend our schools to produce. Performance frameworks that illustrate the results that stakeholders deem desirable have grown in number and complexity. Across the country, charter school authorizers and state and local school boards use performance frameworks as central elements of school and district oversight and accountability. Newer examples of our collective expectations are seen in the work in some states to define the profile of a graduate, setting explicit criteria for what a high school diploma should represent.
By law, every state reports publicly on how its students and schools are performing. State-issued “report cards” for districts and schools generally include demographic information for teachers and students, operational and financial information, and student academic performance information. States set thresholds for student and school performance expectations, though these thresholds vary a lot. Whatever their aspirations, we are not in vastly different territory today than in 1983. Disappointing outcomes (e.g., high school math performance) have even prompted attempts to improve the optics by diluting some of the criteria (such as watering down the instructional frameworks or course requirements), but such maneuvers do nothing to alter the underlying reality.
Insights from the audience
As Walt Kelly’s cartoon character Pogo said, “We have met the enemy and he is us.” Indeed, the staggering array of treatments, interventions, redesigns, and innovations that our authors identified makes it a challenge to rationalize our collective experience into any semblance of order. If we had aimed for chaos at the outset, it is hard to imagine a better result.
Despite the cacophony, the catalog of activity amassed by the authors supports a few observations about our forty-year effort to reform that hold potential for illuminating future directions for elementary and secondary education in our country. After identification, we can characterize the record of reform efforts with six I’s: impulsive, incremental, incoherent, impatient, intransigent, and ineffective, as discussed below.
Impulsive
Most of the reforms were adopted at full scale—across an entire state or the nation. Many efforts to push programs across states or regions had roots in advocacy pressure to move reforms quickly. Many state leaders were game to bring new policies to their state if they were perceived as having been successful elsewhere, as it reduced the perception of risk and provided an existing model to copy.
Doing the “here, too” dance hobbled the new adopters in two ways. It skipped over analysis of the “fit” of the reform in the local context—and the important variation in local contexts — on the receiving end. It is impossible in hindsight to determine how many of the “mixed result” outcomes stemmed from differences in the settings on the ground, but it seems safe to say local contours were likely overlooked as most of the programs or policies were advanced. It is also true that jurisdiction-wide adoption curtailed the ability to evaluate implementation and impacts in real time, so valuable learning was lost at the get-go.
Incremental
The most pervasive attribute is the incremental nature of the interventions. This stems in part from the original recommendations of the ANAR Commission, framed as commonsensical and achievable changes. The commitment to incrementalism continued even when earlier efforts proved ineffective. One might argue that it made sense to aim small to soften implementation friction. The record suggests otherwise. Because the interventions were mostly narrowly focused, not only did they lack the scope or initial scale necessary to drive needed system changes, but in their sheer volume—so many reforms in so many areas—they led to a reform fatigue that lasts to this day.
It is important to note that the essays identified examples of successful reform that did not involve incremental adjustments. Systemwide efforts as described for Newark and new systems building as seen with charter schools have larger blueprints and therefore greater areas for change.
Incoherent
A third observation is that most of the changes undertaken over the past decades were launched with no consideration for how the reform would interact with the rest of the K–12 system. Changes to piece parts were designed and adopted as autonomous endeavors. This partially explains why many innovations fail to scale effectively.
This does not mean that things were only tried one at a time. Many examples exist of multiple incremental reforms launched simultaneously without an understanding of the interplay between them or with the rest of the equation. Reforms were “bolted on,” one after another, without regard for how they fit together. And each one that was added “diluted” the impact of the others. The resulting lack of coherence often led to unintended consequences that were never even considered, much less planned for.
One important implication of incoherence is a lost opportunity to ensure that stakeholders — especially the ground-level personnel—function with an understanding of the way the system works and how they belong in it; a well-crafted plan of action can provide that. A second implication is that it is difficult to objectively learn from experience, especially from unsuccessful ventures. When the general model is unorganized, it is hard to assign causality, for example, between lack of implementation fidelity of a sound design and a design that does not fit the context it is meant to improve.
Impatient
A separate issue that permeates the essays is the (often unstated) expectation that improvement efforts produce large demonstrable results almost immediately and without regard to the time requirements of the change being made. Changes to organizational culture need to occur rapidly, but other changes take time. Shifts in instructional methods often require more than a single year to stabilize enough to know how well they work. Incorporating new systems such as new-teacher onboarding can take even longer to reveal their true value and impact.
The expectation of quick results creates multiple harms. It doesn’t give the good parts time to take root or provide the space to iterate toward success. Moreover, it seeds unrealistic expectations about the diligence needed to give new approaches their due. From a political vantage, it gives the doubters and pouters a head start on declaring new reforms a failure. It also contributes to the “carousel,” as one teacher described it: “I don’t have to do anything but wait—in three years there will be something new.”
Compounding the problem, the governance side of the equation needs strong and enduring leadership to be patient with complicated, multifaceted reform efforts and to plan and invest for the long term. Even if the enabling conditions are understood and a proven scaling strategy is in place—such as with charter management organizations—when the reform in question needs ten to twenty years to come to fruition, rapid turnover cycles of education leaders lose important institutional knowledge, and politicians are short on patience (or incentive) to see it through.
All too often, the time needed to see results is longer than the amount of time politicians have in their seats, and it does not line up with the cyclical campaign and election cycle. Shortrun wins are coveted by political actors seeking to establish a record of success on which to build advancement. The bias toward quick returns and the lack of political will or appetite to invest in long-run solutions have a serious trickle-down effect: (1) a constant churn of reform that does not give space or time to realize success and (2) systems that learn to wait out the current wave of reforms, as “this, too, shall pass.” When the need for improvement is glaring but the actors in legislatures and education agencies prioritize their own short-run interests, we face compound system failure.
Intransigent
The authors carefully identified examples of reforms that produced positive student learning impacts, but many were subject to political interference or failed to perform at scale. Still, the examples show what may be possible. What they do not show is the complementing picture of the myriad reforms that went nowhere and evaporated into history. There is no tally of their number.
But anecdotal reports have consistently told the story of reform churn. Charles Payne’s phrase, “So much reform, so little change,” seems to apply. Instead of forty years of sustained and coherent reform, we have forty short-run reforms that each last three years. School teams are introduced to new practices during the professional development days that accompany the start of school each fall, with short windows of time to prepare for deployment and little implementation support during the year. The school teams learn about impacts indirectly — and often too late to try modifications. Decisions about continuing or terminating the effort usually do not include input from those on the front line. More often than not, new initiatives are quietly abandoned, with the cycle left to repeat itself the following year.
It is notable that, despite this endless churn of reforms, the prevailing institutional structure of “SEA, LEA school board, district administration, school leadership, grade/class grouping, teacher” remains largely unchanged, despite repeated pressures on it to adapt. The possibility exists that the summative effect of all the efforts over the years has fostered a resiliency to any improvement efforts—an adaptive state of resistance to change of its core activities. It may help to explain the tendency to shift focus to other facets of students, teachers, or teaching where ground may be more fertile for positive experience. There is no way to test this idea empirically, but it fits the pattern of the evidence and explains the abundant cynicism and burnout.
Ineffective
The strongest case for learning from our experience lies in our national trends on student performance. Given the authors’ reports, it is little wonder that, even before the blow to student learning of COVID-19 school closures, the long-run reports noted that US student performance was stagnant or in decline.
Two considerations help to explain our current state. Part of the problem is that, apart from formal pilots, most reforms launch without considering how to learn from them. We are seriously underresourced across the sector in measuring local conditions and reform effectiveness.
In addition, even after forty years, the system has significant internal inconsistency—it lacks a “unified theory” of how reform should be done. This essay collection recounts how many reforms were launched without a sufficient discussion of which level of the system (e.g., state, district, school) might be the most effective to lead the transformation efforts.
Conclusion
We face an even more daunting challenge today, which is that forty years of reform have exhausted everyone involved. The one thing we may have conclusively proven is that the system, as presently constituted, has been resilient to reforms at scale. A modern ANAR report might not fall on deaf ears—the need for school reform is real—but it would fall on ears that are tired of hearing about it.
What is clear is that we have a thin collection of reforms that have been shown to work and that can scale. None of the proven reforms seek to integrate with other proven reforms to concentrate their success. The larger the scale of innovation/reform, the larger the political target it presents for opponents of change.
What we do have is an impressive record of what not to do. We can’t assume that ideas that have been proven effective in one setting will be effective in every setting. We can’t expect change at the margins (no matter how well they are done) to be able to leverage an entire school model. We can’t impose reforms that ignore how the change affects other parts of the enterprise. We should accept these lessons as a form of learning in itself and perhaps the best final message of this exercise. Drawing on the six I’ —impulsive, incremental, incoherent, impatient, intransigent, and ineffective—may provide lodestars by which to assess new proposals toward more effective approaches to delivering strong education to our nation’s students.
See the full Hoover Institution initiative: A Nation At Risk +40 .
It’s essential to note our commitment to transparency:
Our Terms of Use acknowledge that our services may not always be error-free, and our Community Standards emphasize our discretion in enforcing policies. As a platform hosting over 100,000 pieces of content published daily, we cannot pre-vet content, but we strive to foster a dynamic environment for free expression and robust discourse through safety guardrails of human and AI moderation.
Comments / 0