Stimulating Innovation and Evidence
February 2010
After a massive infusion of stimulus money into K-12 technology through the Title IID “Enhancing Education Through Technology” (EETT) grants, known also as “ed-tech” grants, the administration is planning to cut funding for the program in future budgets.
“Technology vendors will increasingly be competing for the attention of school district decision-makers on the basis of the comparative effectiveness of their solution–not just in comparison to other technologies but in comparison to other innovative solutions.”
Well, they’re not exactly “cutting” funding for technology, but consolidating the dedicated technology funding stream into a larger enterprise, awkwardly named the “Effective Teaching and Learning for a Complete Education” program. For advocates of educational technology, here’s why this may not be so much a blow as a challenge and an opportunity.
Consider the approach stated at the White House “fact sheet”:
“The Department of Education funds dozens of programs that narrowly limit what states, districts, and schools can do with funds. Some of these programs have little evidence of success, while others are demonstrably failing to improve student achievement. The President’s Budget eliminates six discretionary programs and consolidates 38 K-12 programs into 11 new programs that emphasize using competition to allocate funds, giving communities more choices around activities, and using rigorous evidence to fund what works...Finally, the Budget dedicates funds for the rigorous evaluation of education programs so that we can scale up what works and eliminate what does not.”
From this, technology advocates might worry that policy is being guided by the findings of “no discernable impact” from a number of federally funded technology evaluations (including the evaluation mandated by the EETT legislation itself).
But this is not the case. The White House declares, “The President strongly believes that technology, when used creatively and effectively, can transform education and training in the same way that it has transformed the private sector.”
The administration is not moving away from the use of computers, electronic whiteboards, data systems, Internet connections, web resources, instructional software, and so on in education. Rather, the intention is that these tools are integrated, where appropriate and effective, into all of the other programs.
This does put technology funding on a very different footing. It is no longer in its own category. Where school administrators are considering funding from the “Effective Teaching and Learning for a Complete Education” program, they may place a technology option up against an approach to lower class size, a professional development program, or other innovations that may integrate technologies as a small piece of an overall intervention. Districts would no longer write proposals to EETT to obtain financial support to invest in technology solutions. Technology vendors will increasingly be competing for the attention of school district decision-makers on the basis of the comparative effectiveness of their solution—not just in comparison to other technologies but in comparison to other innovative solutions. The administration has clearly signaled that innovative and effective technologies will be looked upon favorably. It has also signaled that effectiveness is the key criterion.
As an Empirical Education team prepares for a visit to Washington DC for the conference of the Consortium for School Networking and the Software and Information Industry Association’s EdTech Government Forum, (we are active members in both organizations), we have to consider our message to the education technology vendors and school system technology advocates. (Coincidentally, we will also be presenting research at the annual conference of the Society for Research on Educational Effectiveness, also held in DC that week). As a research company we are constrained from taking an advocacy role—in principle we have to maintain that the effectiveness of any intervention is an empirical issue. But we do see the infusion of short term stimulus funding into educational technology through the EETT program as an opportunity for schools and publishers. Working jointly to gather the evidence from the technologies put in place this year and next will put schools and publishers in a strong position to advocate for continued investment in the technologies that prove effective.
While it may have seemed so in 1993 when the U.S. Department of Education’s Office of Educational Technology was first established, technology can no longer be considered inherently innovative. The proposed federal budget is asking educators and developers to innovate to find effective technology applications. The stimulus package is giving the short term impetus to get the evidence in place. — DN
(Respond to this opinion piece)
Rigor AND Relevance
January 2010
One of the conversations at the Institute of Education Sciences (the federal research agency) in 2010 is about rigor. How do we adhere to strict rules about what is accepted as scientific evidence while making the work sponsored by the agency more relevant to educators, as the director, John Easton, wants to do?
The conflict between rigor and relevance arises for a number of reasons that we will illustrate in this entry. The basic problem arises when rigor is defined in terms of specific methodologies such as randomized experiments or a specific criterion such as a 95% confidence interval. Defining rigor by such procedural rules restricts the body of evidence to a small number of studies and to a narrow range of questions that can be answered with the methods that would be considered acceptable. Our position is not that the education sciences have to become less rigorous in order to become more relevant. Instead, our position is that the concept of scientific rigor is being misunderstood.
“Faced with a very serious problem, a policy maker may prefer the risk of spending money on something that might not work rather than rejecting a promising program that narrowly missed the conventional threshold for statistical significance.”
Rigor, in ordinary English, is used to suggest rigidly following rules and procedures. However, because blind adherence to procedures is inappropriate in any area of science, the usage within the education sciences needs clarification and realignment. Our suggestion to IES is to focus on the underlying scientific principles rather than the procedures and criteria derived from the principles. Here are some examples.
The standard rules of research assume that a positive outcome identified in a study is very unlikely to be an artifact of a particular sample. There is a very important principle behind this that must be rigorously understood by researchers. And the appropriate statistical calculations must be applied. The rigor, however, is in understanding the trade-off between mistaking a false positive result for a real result or erroneously rejecting a positive effect of a new program as statistically insignificant when, in fact, there is a real difference. Scientific practice favors protecting against the first kind of mistake and conventionally sets the bar high. But changing the trade-off to favor avoiding the mistake of considering a program ineffective when it is really effective would not constitute less rigor. Faced with a very serious problem, a policy maker may prefer the risk of spending money on something that might not work rather than rejecting a promising program that narrowly missed the conventional threshold for statistical significance.
Randomization provides another example. The fundamental principle that has to be understood is how results of quantitative studies can be biased by confounding and how controlling for the effects of confounders produces a more accurate estimate of the treatment effect. While randomizing units (e.g., teachers, grade-level teams, schools) into treatment and control groups is recognized as the gold standard for controlling for the effects of potential confounding variables so as to isolate the impact of treatment, rigor is not accomplished by restricting education science to randomized experiments. A relevant study can often benefit from the use of observational data stored in school district information systems. Rigor would then consist of understanding how other designs and statistical controls can be appropriately applied to reduce potential bias (and when statistics can’t fix a bad design). There is nothing rigorous in discarding a dataset outright because it has not been created in a fully controlled experimental setting or because it is not free of measurement error.
While controlling selection bias through experimental designs and statistical adjustments must be understood by education scientists, it is also essential to attend to the context of the study and the range of its generalizability—what we can usefully conclude from the research. The experiment itself may have interfered with usual processes (a situation called ecological invalidity) such as when teacher-level randomization breaks up the existing team teaching within a grade-level team. We need a record of differences in program implementation that shows the relationship between quality of implementation and student performance and also prevents us from mistaking attributes of better-implementing teachers for attributes of the program. While the world of schools can be a messy place to conduct research, taking implementation issues seriously in the study design does not equal less rigor.
Ultimately, it comes down to knowing what we can say to the stakeholders, whether they are educators, publishers, or government agencies. What can be said derives from rigorous application of research principles and, to some extent, calls upon the art of careful audience-sensitive communication. It is not more rigorous to leave out the results of post-hoc explorations. Rigor in education science includes framing the results with appropriate cautions about preliminary findings, limitations on generalization, and results that are interesting and warrant continued tracking or more targeted investigations. Making progress in education science calls for rigor, and rigor includes clear communication and the participation of stakeholders in interpreting results. — DN
(Respond to this opinion piece)
New ED Research Agenda Taking Shape
November 2009
We’ve heard administration officials say that the stimulus programs provide a laboratory for ideas that can be built into the ESEA (aka NCLB) reauthorization, as well as into the reauthorization of the Education Sciences Act. So we are studying closely the RFAs, draft RFAs, and other guidance from the US Department of Education for stimulus programs such as Race to the Top (R2T), Investing in Innovation (i3), Enhancing Education Through Technology (EETT), and the State Longitudinal Data Systems (SLDS) looking for clues about the new research agenda. They are not hard to find.
As a general trend, there is no doubt that the new administration is seriously committed to evidence-based policy. Peter Orszag of the White House budget office has recently called for (and made the case for, in his blog) systematic evaluations of federal policies consistent with the president’s promise to “restore science to its rightful place.” But how does this play out with ED?
“First, we are seeing a major shift from a static notion of ‘scientifically based research’ (SBR) to a much more dynamic approach to continuous improvement. In NCLB there was constant reference to SBR as a necessary precondition for spending ESEA funds...”
First, we are seeing a major shift from a static notion of “scientifically based research” (SBR) to a much more dynamic approach to continuous improvement. In NCLB there was constant reference to SBR as a necessary precondition for spending ESEA funds on products, programs, or services. In some cases, it meant that the product’s developers had to have consulted rigorous research. In other cases, it was interpreted as there having to be rigorous research showing the product itself was effective. But in either case, the SBR had to precede the purchase.
Evidence of a more dynamic approach is found in all of the competition-based stimulus programs. Take for example the discussion of “instructional improvement systems.” While this term usually refers to classroom-based systems for formative testing with feedback to the teacher allowing differentiation of instruction, it is used in a broader sense in the current RFAs and guidance documents. The definition provided in the R2T draft RFA reads as follows (bullets and highlights added for clarity):
“Instructional improvement systems means technology-based and other strategies that tools that provide
- teachers,
- principals,
- and administrators
with meaningful support and actionable data to systemically manage continuous instructional improvement, including activities such as:
- instructional planning;
- gathering information (e.g., through formative assessments (as defined in this notice), interim assessments (as defined in this notice), summative assessments, and looking at student work and other student data);
- analyzing information with the support of rapid-time (as defined in this notice) reporting;
- using this information to inform decisions on appropriate next steps;
- and evaluating the effectiveness of the actions taken.”
It is important to notice, first of all, that tools are provided to administrators, not just to teachers. Moreover, the final activity in the cycle is to evaluate the effectiveness of the actions. (Joanne Weiss, who heads up the R2T program, uses the same language inclusive of effectiveness evaluation by district administrators in a recent speech).
We have pointed out in a previous entry that the same cycle of needs analysis, action, and evaluation that works for teachers in the classroom also works for district-level administrators. The same assessments that help teachers differentiate instruction can, in many cases, be aggregated up to the school and district level where broader actions, programs, and policies can be implemented and evaluated based on initial identification of the needs. An important difference exists between these parallel activities at the classroom and central office level. At the district level, where larger datasets extend over a longer period, evaluation design and statistical analysis are called for. In fact this level of information calls for scientifically based research.
Research is now viewed as integral to the cycle of continuous improvement. Research may be carried out by the district’s or state’s own research department or data may be made available to outside researchers as called for in the SLDS and other RFAs. The fundamental difference now is that the research conducted and published before federal funds are used is not the only relevant research. Of course, ED strongly prefers (and at the highest level of funding in i3 requires) that programs have prior evidence. But now the further gathering of evidence is required both in the sense of a separate evaluation and in the sense that funding is to be put toward continuous improvement systems that build research into the innovation itself.
Our recent news item about the i3 program takes note of other important ideas about the research agenda we can expect to influence the reauthorization of ESEA. It is worth noting that the methods called for in i3 are also those most appropriate and practical for local district evaluations of programs. We welcome this new perspective on research considered as a part of the cycle of continuous instructional improvement. — DN
(Respond to this opinion piece)
Research as Innovation
September 2009
Many of us heard Jim Shelton, the ED Assistant Deputy Secretary for Innovation and Improvement, speak to the education publishing industry last week about the $650 million fund now called “Investing in Innovation” (i3). Through i3, Shelton wants to fund the scaling up of innovations having some evidence that they’re worth investing in. These i3 grants could be as large as $50 million.
With that amount at stake, it makes sense for government funders to look for some track record of scientifically documented success. The frequent references in ED documents to processes of “continuous improvement” as part of innovations suggest that proposers would do well to supplement the limited evidence for their innovation by showing how scientific evidence can be generated as an ongoing part of a funded project, that is, how in-course corrections and improvements can be made to the innovation as it is being put into place in a school system.
“...the real innovation in research would consist of states and districts building their own capacity to evaluate whether the intervention they decided to implement actually strengthened the area of weakness or arrested the worrisome trend they have identified and chosen to address.”
In his speech to the education industry, Shelton complained about the low quality of the evidence currently being put forward. Although some publishers have taken the initiative and done serious tests of their products, there has never been a strong push for them to produce evidence of effectiveness.
School systems usually haven’t demanded such evidence, partly because there are often more salient decision criteria and partly because little qualified evidence exists, even for programs that are effective. Moreover, district decision makers may find studies of a product conducted in schools that are different from their schools to have marginal relevance, regardless of how “rigorously” the studies were conducted.
The ED appears to recognize that it will be counter-productive for grant programs like i3 to depend entirely on the pre-existing scientific evidence. An alternative research model based on continuous improvement may help states and districts to succeed with their i3 proposals—and with their projects, once funded.
Now that improved state and district data systems are increasing the ability of school systems to quickly reference several years of data on students and teachers, i3 can start looking at how rigorous research is built into the innovations they fund—not just the one-time evaluation typically built into federal grant proposals.
This kind of research for continuous improvement is an innovation in itself—an innovation that may start with the “data-driven decision making” mode in which data are explored to identify an area of weakness or a worrisome trend. But the real innovation in research would consist of states and districts building their own capacity to evaluate whether the intervention they decided to implement actually strengthened the area of weakness or arrested the worrisome trend they have identified and chosen to address. Perhaps it did so for some schools but not others, or maybe it caught on with some teachers but not with all. The ability of educators to look at this progress in relation to the initial goals completes the cycle of continuous improvement and sets the stage for refocusing, tweaking, or fully redesigning the intervention under study.
We predict that i3 reviewers, rather than depending solely on strong existing evidence, will look for proposals that also include a plan for continuous improvement that can be a part of how the innovation assures its success. In this model, research need not be limited to the activity of an “external evaluator” that absorbs 10% of the grant. Instead, routine use of research processes can be an innovation that builds the internal capacity of states and districts for continuous improvement. — DN
(Respond to this opinion piece)
Easton Sets a New Agenda for IES
August 2009
John Easton, now officially confirmed as the director of the Institute of Education Sciences, gave a brief talk July 24th to explain his agenda to the directors and staff of the Regional Education Labs. This is a particularly pivotal time, not only because the Obama administration is setting an aggressive direction for changes in the K-12 schools, but also because Easton is starting his six-year term just as IES is preparing its RFP for the re-competition for the 10 RELs. (The budget for the RELs accounts for about 11% of the approximately $600 million IES budget.)
Easton made five points.
“The quality of work will be maintained, and the usability and local relevance will be greatly increased.”
First, he is not retreating from the methodological rigor, which was the hallmark of his predecessor, Russ Whitehurst. This simply means that IES will not be funding poorly designed research that does not have the proper controls to support conclusions the researcher wants to assert. Randomized control is still the strongest design for effectiveness studies, although weaker designs are recognized as having value.
Second, there has to be more emphasis on relevance and usability for practitioners. IES can’t ignore how decisions are made and what kind of evidence can usefully inform them. He sees this as requiring a new focus on school systems as learning organizations. This becomes a topic for research and development.
Third, although randomized experiments will still be conducted, there needs to be a stronger tie to what is then done with the findings. In a research and development process, rigorous evaluation should be built in from the start and should relate more specifically to the needs of the practitioners who are part of the R&D process. In this sense, the R&D process should be linked more directly to the needs of the practitioners.
Fourth, IES will move away from the top-down dissemination model in which researchers seem to complete a study and then throw the findings over the wall to practitioners. Instead, researchers should engage practitioners in the use of evidence, understanding that the value of research findings comes in its application, not simply in being released or published. IES will take on the role of facilitating the use of evidence.
Fifth, IES will take on a stronger role in building capacity to conduct research at the local level and within state education agencies. There’s a huge opportunity presented by the investment (also through IES) in state longitudinal data systems. The combination of state systems and the local district systems makes gathering the data to answer policy questions and questions about program effectiveness much easier. The education agencies, however, often need help in framing their questions, applying an appropriate design, and deploying the necessary and appropriate statistics to turn the data into evidence.
These five points form a coherent picture of a research agency that will work more closely through all phases of the research process with practitioners, who will be engaged in developing the findings and putting them into practice. This suggests new roles for the Regional Education Labs in helping their constituencies to answer questions pertinent to their local needs, in engaging them more deeply in using the evidence found, and in building their local capacity to answer their own questions. The quality of work will be maintained, and the usability and local relevance will be greatly increased. — DN
(Respond to this opinion piece)
The Problem with National Experiments
June 2009
We welcome the statement of the director of the Office of Management and Budget (OMB), Peter R. Orszag, issued as a blog entry, calling for the use of evidence.
“I am trying to put much more emphasis on evidence-based policy decisions here at OMB. Wherever possible, we should design new initiatives to build rigorous data about what works and then act on evidence that emerges — expanding the approaches that work best, fine-tuning the ones that get mixed results, and shutting down those that are failing.”
This suggests a continuous process of improving programs based on evaluations built into the fabric of program implementations, which sounds very valuable. Our concern, however, at least in the domain of education, is that Congress or the Department of Education will contract for a national experiment to prove a program or policy effective. In contrast, we advocate a more localized and distributed approach based on the argument Donald Campbell made in the early 70s in his classic paper “The Experimenting Society” (updated in 1988). He observes that “the U.S. Congress is apt to mandate an immediate, nationwide evaluation of a new program to be done by a single evaluator, once and for all, subsequent implementations to go without evaluation.” Instead, he describes a “contagious cross-validation model for local programs” and recommends a much more distributed approach that would “support adoptions that included locally designed cross-validating evaluations, including funds for appropriate comparison groups not receiving the treatment.” Using such a model, he predicts that “After five years we might have 100 locally interpretable experiments.” (p.303)
Dr. Orszag’s adoption of the “top tier” language from the Coalition for Evidence Based Policy is buying into the idea that an educational program can be proven effective in a single large scale randomized experiment. There are several weaknesses in this approach.
First, the education domain is extremely diverse and, without the “100 locally interpretable experiments,” it is unlikely that educators would have an opportunity to see a program at work in a sufficient number of contexts to begin to build up generalizations. Moreover, as local educators and program developers improve their programs, additional rounds of testing are called for (and even the “top tier” programs should engage in continuous improvement).
Second, the information value of local experiments is much higher for the decision-maker who will always be concerned with performance in his or her school or district. National experiments generate average impact estimates, while giving little information about any particular locale. Because concern with achievement gaps between specific populations differs across communities, it follows that, in a local experiment, reducing a specific gap—not the overall average effect—may well be the effect of primary interest.
Third, local experiments are vastly less expensive than nationally contracted experiments, even while obtaining comparable statistical power. Local experiments can easily be one-tenth the cost of national experiments, thus conducting 100 of them is quite feasible. (We say more about the reasons for the cost differential in a separate policy brief). Better yet, local experiments can be completed in a more timely manner—it need not take five years to accumulate a wealth of evidence. Ironically, one factor making national experiments expensive, as well as slow, is the review process required by OMB!
So while we applaud Dr. Orszag’s leadership in promoting evidence-based policy decisions, we will continue to be interested in how this impacts state and local agencies. We hope that, instead of contracting for national experiments, the OMB and other federal agencies can help state and local agencies to build evaluation for continuous improvement into the implementation of federally funded programs. If nothing else, it helps to have OMB publicly making evidence-based decisions. —DN
Campbell, D. T. (1988). The Experimenting Society. In E. S. Overman (Ed.), Methodology and epistemology for social science: Selected Papers. (pp. 303). Chicago: University of Chicago Press.
(Respond to this opinion piece)
It’s Not the Money, It’s What You Spend It On
June 2009
Our neighbor from the Hoover Institution, Eric (Rick) Hanushek, who also currently chairs the National Education Sciences Board, has just published a very interesting book (with co-author Alfred Lindseth) on the financing of schools1. It provides a very readable narrative of the last couple of decades’ court decisions about how much money it should take to provide an equitable and adequate K-12 education. The authors’ basic thesis is that the amounts of money schools spend are generally unrelated to increases in achievement, unless one considers what the money is spent on. Clearly, if spending were focused on policies and programs that lead to achievement gains and to decreases in the achievement gaps between populations, things would improve. But court-ordered increases in education spending have seldom used credible estimates of likely impact of various programs, even though the programs' costs were used in calculating how much an equitable or adequate education will cost. The authors document in fascinating detail the irrationality of the process of producing these cost estimates.
“If states and school districts were to get into the habit of routinely pilot testing programs locally (and collecting and analyzing the data systematically) before scaling up within the district or state, the gains in efficiency could be substantial.”
Hanushek and Lindseth propose that, where administrators and teachers are accountable and rewarded for results, they will consider the trade-offs in efficiency of spending money one way or another. For example, smaller class size may lead to better results but, if the same money were spent to increase teacher quality, the results may be much more substantial. This proposal, of course, depends on there being sufficient evidence that various programs, policies, or approach have a measurable impact. And they further acknowledge that getting this information is not a matter of running one-time experimental evaluations. The wide variation of populations, resources, and standards in US school systems means that a large number of smaller scale evaluations are called for. If states and school districts were to get into the habit of routinely pilot testing programs locally (and collecting and analyzing the data systematically) before scaling up within the district or state, the gains in efficiency could be substantial.
Hanushek and Lindseth do not address the question of how local evaluations of sufficient quality and quantity can be paid for. If one depends solely on the Institute of Education Sciences for grants and contracts, the process will be slow and the resources inadequate. Setting aside a certain percentage of federal grants to states and districts for evaluations is often unproductive because the evaluations are not designed or timed to provide feedback for continuous improvement. Too often, educators and administrators treat the evaluation as a requirement that takes money from the program. We have argued elsewhere that integrating research into program implementation at the local level calls for building local school district capacity for rigorous evaluations. It also calls for a reform agenda that changes how decisions are connected both to explorations of district data and to locally generated evidence as to whether programs and policies are having the desired impact. This is different from contracting with the evaluator once a program is underway because the plan for the evaluation is part of the plan for implementation. Directing a good portion of the program funds to a process of continuous improvement will make the program more efficient and provide educators with the hundreds of studies that will begin to accumulate the kind of evidence that they need to make a rational choice about what programs are worth trying out in their own locale.
Educators, especially those who spend their days engaged with children in a classroom, may find the rational economic model on which the authors’ proposals are based unsatisfying and perhaps simplistic. Most people don’t go into education because they are maximizing their economic return. Nonetheless, it is hard to find a rationale for retaining teachers who are demonstrably ineffective beyond the traditional practice of union solidarity that militates against differentiation of skills among its members. The authors' arguments are thought provoking in that they demonstrate in rich narrative detail the obvious irrationality of considering only the amount of money put into schools and not considering the effectiveness of the programs, policies, and approaches that the money is spent on. —DN
1 Hanushek, E.A. & Lindseth, A.A. (2009). Schoolhouses, courthouses and statehouses: Solving the funding-achievement puzzle in America’s schools. Princeton NJ: Princeton University Press.
(Read responses to this opinion piece)
(Respond to this opinion piece)
Compliance Anxiety
May 2009
Stimulus funds are beginning to flow. But not as quickly as needed to provide a boost to the economy. One source of hesitation might be called “compliance anxiety.” People in school systems know that the Department of Education is looking for bold innovations and progress toward lasting reforms of the schools (see, for example, the recently published suggestions), but are not sure exactly what is going to be asked of them in terms of accounting for the funds they spend. The third guiding principle of ARRA calls for K-12 districts to “ensure transparency, reporting, and accountability.” This is meant to prevent fraud and abuse, to support the most effective uses of ARRA funds, and to accurately measure and track results.
Over the past few weeks, in webinars and similar venues, educators have been asking what this means. Many are hesitant to commit funds without knowing what evidence of compliance will be called for. The following quotes were compiled by Jennifer House, Ph.D., Founder of Redrock Reports:
Superintendent of a large suburban district: “We just need to know what kind of data needs to be collected for the accountability portion of ARRA—especially funds in the State Fiscal Stabilization Fund.”
Superintendent of an urban district: “When is the Department of Education going to tell us what data they need for the accountability and reporting requirements of ARRA?”
Title I director of a major urban district: “I know what I need to do for Title I reporting. Is there any other data I need to collect to report on the use and impact of the ARRA funds?”
IDEA director of a large suburban district: “What other data is needed about ARRA funds”
Paraphrase of seven questions from a single MDR webinar: “When will we hear what the accountability requirements are for ARRA?”
CIO of a large suburban district: “We need to accommodate the data that needs to be collected in our system for ARRA. When do we get the word?”
These educators need to know what is meant by “accurately measure and track results.” Will this information just be used to audit who was paid for what? Or will ED be calling for a measure of results in terms of impact on schools, teachers, and student achievement?
State Education Agencies are asked for “baseline data that demonstrates the State’s current status in each of the four education reform areas.” Will the states and districts be asked for subsequent data showing an improvement over baseline?
Educators have heard that, in the near future, ED will describe specific data metrics that states will use to make transparent their status in the four education reform areas for the purpose of “showing how schools are performing and helping schools improve.” They expect that this will not be a one-time data collection; instead, they expect an element of tracking to help them with continuous improvement.
ED has a one-time opportunity to move education toward an evidence-based enterprise on a massive scale by calling for evidence of outcomes—not just the starting baseline. Conditions are ripe for quickly and easily promoting a major reform in how districts measure their own results. Educators already expect this. A simple time series design is all that is needed. Training and support for this can be readily supplied through existing IES funding mechanisms. —DN
(Respond to this opinion piece)
The Advantages of Research on Local Problems
April 2009
The nomination of John Q. Easton as the new head of IES highlights a debate that has been going on for quite a long time. As Donald Campbell noted in the early 70s in his classic paper “The Experimenting Society” (updated in 1988), “The U.S. Congress is apt to mandate an immediate, nationwide evaluation of a new program to be done by a single evaluator, once and for all, subsequent implementations to go without evaluation.” In contrast, he describes a “contagious cross-validation model for local programs” and recommends a much more distributed approach that would “support adoptions that included locally designed cross-validating evaluations, including funds for appropriate comparison groups not receiving the treatment.” Using such a model, he predicts that “After five years we might have 100 locally interpretable experiments.” (p.303) The work of the Consortium on Chicago School Research, which Easton has led, has a local focus on Chicago schools consistent with the idea that experiments should be locally interpretable. Elsewhere, we have argued that local experiments can also be vastly less expensive; thus having 100 of them is quite feasible. These experiments also can be completed in a more timely manner—it need not take five years to accumulate a wealth of evidence. We welcome a change in orientation at IES from organizing single large national experiments to the more useful, efficient, and practical model of supporting many local rigorous experiments. —DN
Campbell, D. T. (1988). The Experimenting Society. In E. S. Overman (Ed.), Methodology and epistemology for social science: Selected Papers. (pp. 303). Chicago: University of Chicago Press.
(Respond to this opinion piece)
Stimulating Times!
March 2009
Very soon, an additional $80 billion or so will be flowing into K-12 schools. Spending it in a way that will improve education will be a challenge. While the majority of the funds are intended to ward off teacher layoffs or to supplement existing funding streams—and will therefore keep things mostly as they are—other large chunks of money are aimed at promoting change. It is hard to keep the zeros straight but it appears that $650 million goes to education technology; $250 million to state data systems; $200 million to teacher incentive projects; and billions to what Secretary Duncan is calling a “Race for the Top” fund. Of course in the current context a mere billion dollars seems like small change. But it is enormous, considering that it has to be spent quickly, and there is little precedent for how to do the spending.
Mike Smith, a senior advisor to Secretary Duncan, recently outlined the goals of the stimulus plan this way: (1) get the money out fast; (2) create jobs; and (3) stimulate reform. While some hope that this funding amount will constitute a new default level, we are being warned to expect a “cliff” when the one-time funding is exhausted. Thus a wise use of this money would be for new jobs that have the effect of creating something that won’t have to be paid for at the same level on an ongoing basis. Repairing a school building illustrates the idea. The work can start quickly and generate short-term jobs and, once completed, the repaired building will be around for a long time before another infusion of repair money is needed.
“By themselves, data systems are technical capacities, not reforms; alone they do not change how educators can generate useful evidence that can improve instruction. With the stimulus funding, we now have an opportunity to put in place local district capacities for data use—not just at the classroom level but also at the level of district and state policies and strategies.”
What is the analogy in the domain of education reform? If a school system uses stimulus money to purchase services with a fixed annual cost, the service may have to be discontinued when the money is spent. On the other hand, if the school system purchases two years of professional development, then teachers may be able to carry their new practices forward without continuing operational costs. Similarly, investing in data systems and building school district capacity for analyzing local data and for evaluating programs could pay off in sustained systemic improvements in district decision processes.
The stimulus package does contain funds for data systems-an investment that will pay off beyond the initial implementation. By themselves, data systems are technical capacities, not reforms; alone they do not change how educators can generate useful evidence that can improve instruction. With the stimulus funding, we now have an opportunity to put in place local district capacities for data use-not just at the classroom level but also at the level of district and state policies and strategies.
To harness the stimulus funds for reforms involving data systems, we need to look at the instances in the stimulus package where data would actually be used. This leads us to the places calling for evaluations. For example, the $650 million in technology grants will flow through the mechanism of Title II D, which includes the suggestion to use funds to “support the rigorous evaluation of programs funded under this part, particularly regarding the impact of such programs on student academic achievement…” Thus these funds may be used to enhance the use of data systems to conduct local evaluations of the technologies, thereby building the capacity of districts to generate useful evidence.
In a section on teacher incentive programs, the stimulus bill specifically calls for “a rigorous national evaluation.” We don’t think this should be interpreted as a single large national evaluation. In our view, a large number of local evaluations would be a more productive use of the funds, as long as each is rigorous. Doing so—that is, distributing the available funds to the local districts implementing the incentive programs—stimulates district hiring (or retaining district staff who specialize in evaluations).
We should be viewing data use and evidence generation in districts as an important part of the reform agenda and not just as an isolated technical problem of data warehousing. By distributing program evaluation to districts, we can fulfill the goals of creating (or retaining) jobs while stimulating reform that will endure beyond the two years when the extraordinary funds are available. —DN
(Respond to this opinion piece)
Education Week Reports that ‘Scientifically Based’ is Giving Way to ‘Development’ and ‘Innovation’
February 2009
The headline in the January 28 issue of Education Week suggests that the pendulum is swinging from obtaining rigorous scientific evidence to providing greater freedom for development and innovation.
“...we are about four months into it in Chicago. We have a control group where this is not going on, so we’re going to follow what the data tells us. And if it’s successful, we’ll look to expand it. If it’s not successful, we’ll stop doing it.”
– Arne Duncan, January 30, 2009
Is there reason to believe this is more than a war of catch phrases? Does supporting innovative approaches take resources away from “scientifically based research”? A January 30 interview with Arne Duncan, our new Secretary of Education, by CNN’s Campbell Brown is revealing. She asked him about the innovative program in Chicago that pays students for better grades. Here is how the conversation went:
Duncan: ...in every other profession we recognize, reward and incent excellence. I think we need to do more of that in education.
CNN: For the students specifically, you think money is the way to do that? It’s the best incentive?
Duncan: I don’t think it is the best incentive; I think it's one incentive. This is a pilot program we started this fall so it’s very early on. But so far the data is very encouraging—so far the students’ attendance rates have gone up, students’ grades have gone up, and these are communities where the drop out rate has been unacceptably high and whatever we can do to challenge that status quo. When children drop out today, Campbell as you know, they are basically condemned to social failure. There are no good jobs out there so we need to be creative; we need to push the envelope. I don’t know if this is the right answer. We’ve got a control group.
CNN: But is it something that you would like to try across the country, to have other schools systems adopt?
Duncan: Again, Campbell, this is...we are about four months into it in Chicago. We have a control group where this is not going on, so we’re going to follow what the data tells us. And if it’s successful, we’ll look to expand it. If it’s not successful, we’ll stop doing it. We want to be thoughtful but I think philosophically I am pro pushing the envelope, challenging the status quo, and thinking outside the box...
Read more from the interview here.
Notice that he is calling for innovation: “pushing the envelope challenging the status quo, thinking outside the box.” But he is not divorcing innovation from rigorously controlled effectiveness research. He is also looking at preliminary findings of changes such as attendance rates that can be detected early. The “scientifically based” research is built into the innovation’s implementation at the earliest stage. And it is used as a basis for decisions about expansion.
While there may be reason to increase funding for development, we can’t divorce rigorous research from development; nor should we consider experimental evaluations as activities that kick in after an innovation has been fielded. We are skeptical that there is a real conflict between scientific research and innovation. The basic problem for research and development in education is not too much attention to rigorous research, but too little overall resources going into education. The graphic included in the Ed Week story makes clear that the R&D investment in education is miniscule compared to R&D for the military, energy, and heath sectors (health gets 100 times as much as education, whereas the category “other” gets 16 times as much).
The Department of Defense, of course, gets a large piece of the pie, and we often see the Defense Advanced Research Projects Agency (DARPA) held up as an example of a federal agency devoted to addressing innovative and often futuristic, requirements. The Internet started as one such engineering project, the ARPANET. In this case, researchers needed to access information flexibly from computers situated around the country and the innovative distributed approach turned out to be massively scalable and robust. Although we don’t always see that research is built in as a continuous part of engineering development, every step was in fact an experiment. While testing an engineering concept may take a few minutes of data collection, in education, the testing can be more cumbersome. Cognitive development and learning take months or years to generate measurable gains and experiments need careful ways to eliminate confounders that often don’t trouble engineering projects. Education studies are also not as amenable to clever technical solutions (although the vision of something as big the Internet coming in to disrupt and reconfigure education is tantalizing).
It is always appealing to see the pendulum swing with a changing of the guard. In the current transition, what is coming about looks more like a synthesis in which research and development are no longer treated as separate— and certainly not seen as competitors. —DN
(Respond to this opinion piece)
Role of Technology as Infrastructure for Schools
January 2009
We are excited and optimistic about the New Year. It will be a time of great challenges as well as critical transitions and important debates about the future of education in this country. The emerging proposal for a massive stimulus package gives reason both for optimism and caution. Thus far the package has included repairing school buildings, improving their broadband connections, and bringing in more technology.
“...a technology infrastructure for schools contains its own mechanism for accountability. The argument for school technology should drive home the notion that schools can be capable of determining whether the stimulus investment is having an impact on learning, discipline, graduation rates, and other measurable outcomes.”
The Consortium for School Networking (CoSN), of which Empirical Education is a member and long time supporter, advocates for technology for schools. In an article entitled “Why Obama Can’t Ignore Ed Tech”, Jim Goodnight, founder and CEO of SAS, and Keith Krueger, CEO of CoSN, argue for investing in education technology as a way to support “21st century learning” while creating jobs in the technology and telecommunications sectors. They also suggest that the investment will lead school districts to hire staff members specializing in technical and technology curricula, a function they note as currently being “vastly understaffed.”
As a research organization, we have to maintain a cautious attitude about claims, such as those in the CoSN article, that technology products will reduce discipline problems and dropout rates generally. We do agree that an investment in school technology will call for increased staffing—that is, creating jobs—which is the primary goal of the stimulus package.
But we believe there is a better argument for an investment in technical infrastructure. Network and data warehouse technologies inherently provide the mechanisms for measuring whether the investments are making a difference. Combined with online formative testing, automatic generation of usage data, and analytic tools, these technologies will put schools in a position to keep technology accountable for promised results. Using technology as a tool for tracking results of the stimulus package will, of course, create jobs. It will call for the creation of additional positions for data coaches, data analysts, trainers, and staff to handle the test administration, data cleaning, and communication functions.
The fear that a stimulus package will just throw money at the problem is justified. Yes, it will provide jobs and benefits to certain industries in the short term, whereas any lasting improvement may be elusive. While building a new bridge employs construction workers and the lasting benefit can be measured, for example, by improved traffic, the lasting benefits of school technology may seem more subtle. We would argue, to the contrary, that a technology infrastructure for schools contains its own mechanism for accountability. The argument for school technology should drive home the notion that schools can be capable of determining whether the stimulus investment is having an impact on learning, discipline, graduation rates, and other measurable outcomes. Policy makers will not have to depend on promises of new forms of learning when they can put in place a technology infrastructure that provides school decision-makers with the information about whether the investment is making a difference. —DN
(Respond to this opinion piece)
Focusing Evaluations on Achievement Gaps
December 2008
The standard design for experimental program evaluations in educational settings may not be doing justice to the questions that matter most to district decision makers. In many sites where we have worked, the most important question had to do with a gap between two populations within the district. For example, one district’s improvement plan specifically targeted the gap in science achievement between black students and white students. In another, there was a specific concern with the performance of new, and often uncertified, teachers compared to experienced teachers. NCLB, with its requirement for disaggregating the performance of specific subgroups, has reinforced this perspective. A new science curriculum that has a modest positive impact on performance across the district could be rejected if it had the effect of increasing the gap between the two populations of concern.
“...a local experiment primarily interested in an interaction can be smaller, and less expensive, than a traditional experiment looking for an overall average effect.”
When a new program favors one kind of student or teacher over another, we call it an interaction, that is, an interaction between the experimental “treatment” and some pre-existing “trait” of the population involved. In experimental design, we call these characteristics of the people or the setting moderators because they are seen as moderating the impact of the new program. Moderators are often considered secondary or even exploratory outcomes in experimental program evaluations, which are designed primarily to find out whether the new program makes an overall difference for the study population as a whole. Who gets and doesn’t get the program can be manipulated experimentally. By contrast, the moderator is a pre-existing characteristic that (usually) can’t be manipulated. While the experiment focuses on a specific program (treatment), any number of moderators can be examined after the fact.
Many of our experiments in school systems are aimed at answering a question of local interest. In this case, we often find that the most important question concerns an interaction rather than the average impact of the experimental intervention itself. The potential moderator of interest, such as minority status, under-achievement, or certification can be specified in advance, based on the identified gap in performance the new program was intended to address in the first place. When the interaction is the primary outcome of interest, its status goes beyond even the emphasis that many experts put on interactions as a means for getting a fuller picture of the effectiveness of an intervention (Cook, 2002; Shadish, Cook, & Campbell, 2002). But because investigations of interactions are usually exploratory and not the primary question (except perhaps for the specific setting in which the experiment took place), it is difficult to look across studies of the same intervention to come to any generalization about the moderating effects of certain variables. Research reviews that synthesize multiple studies of the same intervention such as found on the What Works Clearinghouse and Best Evidence Encyclopedia are not concerned with interactions, even if an individual study finds one to be quite substantial. This is unfortunate because, in many studies that find no overall impact for a program, we may discover that it is differentially effective for an important subgroup. It would therefore be useful, for example, to examine whether the moderating effect of a certain variable varies more than is expected by chance across experimental settings. This would indicate whether the moderating effect is robust or whether it depends on local circumstances.
This situation points to the importance of conducting local program evaluations that can focus on the achievement gap of greatest concern. Fortunately, recent theoretical work by Howard Bloom (Bloom, 2005) of MDRC provides an indication that statistical power for detecting differences among subgroups of students in the impact of an intervention (that is, the interaction) can be larger than for detecting a net impact of the same size for that program. This means that a local experiment primarily interested in an interaction can be smaller, and less expensive, than a traditional experiment looking for an overall average effect. The need for information about gaps, as well as the possible greater efficiency of studying gaps, provides support for a strategy of conducting relatively small experiments to answer questions of local interest to a school district (Newman, 2008). Small, and less expensive, experimental program evaluations focused on moderating effects can provide more valuable information to decision makers than large-scale experiments intended for broad generalization, which cannot provide useful evidence for all interactions of interest to schools.
Empirical Education is now engaged in research to empirically verify Bloom’s observation about statistical power; we expect to be reporting the results next spring. —DN
Bloom, H. S. (2005). Randomizing groups to evaluate place-based programs. In H. S. Bloom (Ed)., Learning More From Social Experiments. New York, NY: Sage.
Cook, T. D. (2002). Randomized experiments in educational policy research: A critical
examination of the reasons the education evaluation community has offered for not doing them, Educational Evaluation and Policy Analysis, 24, 175-199.
Newman, D. (2008) Toward School Districts Conducting Their Own Rigorous Program
Evaluations: Final Report on the “Low Cost Experiments to Support Local School District Decisions” Project. Empirical Education Research Reports, Palo Alto, CA: Empirical Education Inc.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-
experimental designs for generalized causal inference. Boston: Houghton Mifflin.
(Respond to this opinion piece)
Climate Change: Innovation
November 2008
Congratulations to Barack Obama on his sweeping victory. We can expect a change of policy climate with a new administration bringing new players and new policy ideas to the table. The appointment of a new director of the Institute of Education Sciences will provide an early opportunity to set direction for research and development. Reauthorization of NCLB and related legislation — including negotiating the definition and usage of “scientific research” — will be another, although pundit consensus was that this change will take two more years, given the urgency of fixing the economy and resolving the war in Iraq. But already change is in the air with proposals for dramatic shifts in priorities. Here we raise a question about the big new idea that is getting a lot of play: innovation.
“So let’s be innovative and call this R&D with the intention of empowering both the R and the D. Rather than offer a token concession to the research community, build ongoing formative research and impact evaluations into the development and scale-up processes themselves.”
Educational innovation being called for includes funding for research and development [R&D (with a capital D for a focus on new ideas)], acquisition of school technology, and funding for dissemination of new charter school models. The Brookings Institution recently published a policy paper Changing the Game: The Federal Role in Supporting 21st Century Educational Innovation by Sara Mead and Andy Rotherham. The paper imagines a new part of the US Department of Education called the Office of Educational Entrepreneurship and Innovation (OEEI) that would be charged with the job of implementing “a game-changing strategy [that] requires the federal government to make new types of investments, form new partnerships with philanthropy and the nonprofit sector, and act in new ways to support the growth of entrepreneurship and innovation within the public education system” (p34). The authors see this as complementary to standards-based reform, which is yielding diminishing returns. “To reach the lofty goals that standards-based reform has set, we need more than just pressure. We need new models of organizing schooling and new tools to support student learning that are dramatically more effective or efficient than what schools doing today” (p35).
As an entrepreneurial education business, we applaud the idea behind the envisioned OEEI. The question for us arises when we think about how OEEI would know whether a game-changing model is “dramatically more effective or efficient.” How will the OEEI decide which entrepreneurs should receive or continue to receive funds? Although the authors call for a “relentless focus on results,” they do not say how results would be measured. The venture capital (VC) model bases success on return on investment. Many VC investments fail but, if a good percentage succeeds, the overall monetary return to the VC is positive. While venture philanthropies often work the same way, the profits go back into supporting more entrepreneurs instead of back to the investors. Scaling up profitably is a sufficient sign of success. Perhaps we can assume that parents, communities, and school systems would not choose to adopt new products if they were ineffective or inefficient. If this were true, then scaling up would be an indirect indication of educational effectiveness. Will positive results for innovations in the marketplace be sufficient, or should there perhaps be a role for research to determine their effectiveness?
The authors suggest a $300 million per year “Grow What Works” fund of which less than 5% would be set aside for “rigorous independent evaluations of the results achieved by the entrepreneurs” (p48). Similarly, their suggestion for a program like the Defense Advanced Research Projects Agency (DARPA) would allow only up to 10%. Budgeting research at this level is unlikely to have much influence over what is likely to be an overwhelming imperative for market success. Moreover, what will be the role of independent evaluations if they fail to show the innovation to be dramatically more effective or efficient? Funding research as a set-aside from a funded program is always an uphill battle because it appears to take money away from the core activity. So let‘s be innovative and call this R&D with the intention of empowering both the R and the D. Rather than offer a token concession to the research community, build ongoing formative research and impact evaluations into the development and scale-up processes themselves. This may more closely resemble the “design-engineering-development” activities that Tony Bryk describes.
Integrating the R with the D will have two benefits. First it will provide information to federal and private funding agencies on the progress toward whatever measurable goal is set for an innovation. Second, it will help the parents, communities, and school systems make informed decisions about whether the innovation will work locally. The important partner here is the school district, which can take an active role in evaluation as well as development. These are the entities that ultimately have to decide whether the innovations are more effective and efficient that what they already do. They are also the ones with all the student, teacher, financial, and other data needed to conduct quasi-experiments or interrupted time series studies. If an agency like OEEI is created, it should insist that school districts become partners in the R&D for innovations they consider introducing. —DN
(Respond to this opinion piece)
Needed: A Developmental Approach to District Data Use
October 2008
The recent publication of Data Driven School Improvement: Linking Data and Learning, edited by Ellen Mandinach and Margaret Honey, takes a useful step toward documenting innovative practices at the classroom, school, district, and state levels. The book’s 14 chapters (and 30 authors) avoid the advocacy orientation frequently found in discussions of data-driven decision making (D3M). Case studies provide rich detail that is often missing in other discussions. It is useful to get a sense of some of the actual questions that were addressed and that motivated the setup of the technology of data warehouses, assessment tools, and dashboards.
“It is reasonable to posit stages in a developmental sequence where descriptive needs assessment would be a logical first step before moving on to more complex analyses that would, for example, introduce statistical controls.”
The book provides a good introduction to a complicated field that is currently attracting much attention from practitioners and researchers, as well as from technology vendors. In some ways, however, it does not go deep enough in providing a framework for understanding the topic. While one of the key chapters provides a conceptual framework in terms of a set of processes and related skills such as for collecting, analyzing, prioritizing data, the framework is static in the sense that there is no account of or theory as to how teachers, principals, or district administrators might acquire these skills or come to be interested in using them. Without a developmental theory, we can’t predict what processes or skills are likely to be prerequisites for others or how processes can be scaffolded, for example, by using some of the useful technologies described in several of the chapters. Many of the examples of how data are used can be loosely described as data mining and moving toward identifying needs or gaps or problems. Situations where statistical analysis (beyond averages of descriptive data) is called for are mentioned only occasionally. Such a question might have asked for a comparison of what happened when a new program was put in place compared to what would have happened without the program as well as compared to the level of need identified for which the program was considered a solution. The chapters for the most part keep the discussion at a level that does not call for a statistical test or an examination of a correlation. This may be reasonable when considering decisions within a classroom, but is an oversimplification when it comes to decisions considered at the district central office.
It is reasonable to posit stages in a developmental sequence where descriptive needs assessment would be a logical first step before moving on to more complex analyses that would, for example, introduce statistical controls. On a technical level, it is reasonable to consider data on a single school year to be both more readily available to school district administrators and also to address more straightforward questions than multi-year longitudinal data. For example, a question about mean differences among ethnic groups calls for simpler analytic tools than a question about changes over time in the size of the gap between groups. Both may feed into a needs analysis, but the latter calls for statistical calculations that go beyond a simple comparison. Similarly, a question about whether a new program had an impact not only calls for statistical machinery but requires the introduction of experimental design in setting up an appropriate comparison. Again it is reasonable to posit that incorporating research design into the “data-driven” decisions is a more advanced stage that builds upon the tools and processes that explore correlations to identify potential areas of need. A developmental theory of data-driven school improvement may provide a basis for tools, supports, and professional development for school district personnel that can accelerate adoption of these valuable processes. A development theory would provide a guide for starting where they are and for providing the scaffold to a next level that builds incrementally on what is already in place. —DN
Mandinach, E and Honey, M. (Eds) (2008) Data Driven School Improvement: Linking Data and Learning. New York: Teachers College Press.
(Respond to this opinion piece)
Where Advocating Accountability Falls Short
September 2008
Calling for greater accountability continues to be a theme in American education policy. Recently, Senator Barack Obama made this proposition: “I’ll recruit an army of new teachers, and pay them higher salaries, and give them more support. And in exchange, I’ll ask for higher standards and more accountability” (August 28, 2008).
“The educational enterprise can be held accountable at many levels. While teachers have face-to-face contact with students who may do well or poorly, the team of teachers working at a grade level or in a small school could be collectively accountable. Moving up a level, a principal could be held accountable for the school’s results. And the district superintendent and the state schools chief can also be accountable for results in their jurisdictions. From purely an accountability point of view, teachers aren’t necessarily the best focus for federal policy.”
Although the details of policy positions are not generally provided in political speeches, this one is worth pulling apart to see what might be the issues in implementing such a policy.
First, what is the accountability that there will be more of? In this case we may presume that, since accountability is linked specifically to teacher salaries, teachers will be held accountable. Is this appropriate—or even possible—as federal policy? The educational enterprise can be held accountable at many levels. While teachers have face-to-face contact with students who may do well or poorly, the team of teachers working at a grade level or in a small school could be collectively accountable. Moving up a level, a principal could be held accountable for the school’s results. And the district superintendent and the state schools chief can also be accountable for results in their jurisdictions. From purely an accountability point of view, teachers are not necessarily the best focus for federal policy. Certainly, recruiting and incentive efforts can be federally funded, but it seems at best awkward to legislate sanctions for individual teachers based on holding them accountable for their individual performance in raising their students’ scores.
It is currently possible technically to hold teachers accountable. Database and statistical technologies are now available to link teacher identities to student records. District data systems routinely provide links between teachers and their rosters of students and, in many cases, these are extended longitudinally. Many state data systems have also begun providing unique teacher IDs so that linkages to the achievement of individual students can be tracked. And drawing on these longitudinal linkages, “value-added” analyses are being used to quantify the contribution over time of individual teachers to students in their classes.
However, as an approach to federal policy, taking a top-down tactic—making superintendents and principals accountable—may be better than promoting technologies that attempt to measure individual teachers. (The technical controversies about the statistics used in some versions of value-added analysis are a noteworthy topic that we’ll save for another day.) A more productive approach may be to focus on the disincentives for teachers to collaborate or help one another when accountability is at the individual level. We find that many teachers report teaching students other than those officially registered in their classes. Frequently these are informal arrangements that can increase aggregate achievement for the students involved but muddy the district or state records for individual teachers. A school may be a more appropriate unit of accountability and, in that local context, data on individual teachers can more accurately be evaluated. The school principals will know both their schools’ standing among other schools in the district and will have school-level data to help in making staffing decisions. The central office staff, in turn, will have a broader view of the progress of the individual schools on which to base decisions about allocation of resources.
For any achievement-based accountability approach—whether at the district, school, or teacher level—it is important to understand achievement in relation to challenges related to, for example, the economic status of the district or neighborhood or the prior preparation of the students in the classroom. We must also consider the growth of the students, not just their proficiency status at the end of the year. These considerations require statistical calculations, not just counting up percentages of proficient students. And once we begin looking at analyses such as trajectories of some schools compared to others facing similar challenges, we can take the next step and begin tracking the success of interventions, professional development programs, and other local policies aimed at addressing areas of weakness or supporting teachers who are not helping their students make the kind of progress the school is looking for. The data systems and the analytic tools needed to track a teacher’s or a school’s progress over time can also be turned to guiding resource allocations and interventions and, as a next logical step, providing the capability of tracking whether the additional interventions, support, or professional development are having the desired impact.
Given the capacities of current data systems, how might a policy involving greater support and greater accountability for teachers be implemented? Here is one example. Federal funding to districts for principal leadership training could be tied to district-level labor contracts giving the building leaders greater control over personnel decisions. The leadership training will include the interpretation and use of longitudinal data, constituting tools for comparing the principal’s school to others facing similar challenges. Professional learning communities for the principals can be part of this leadership program, assisting district teams to work through ideas for interventions. While the achievement-based accountability measures of teacher performance can be used as one of the factors in building-level decisions, the leadership training would include how to use the data systems for tracking both teacher job performance and the impact of support and training on that performance. —DN
