Evidence from 2011
- Need for Product Evaluations Continues to Grow
- Comment on the NY Times: “In Classroom of Future, Stagnant Scores”
- A Conversation About Building State and Local Research Capacity
- Looking Back 35 Years to Learn about Local Experiments
Need for Product Evaluations Continues to Grow
There is a growing need for evidence of the effectiveness of products and services being sold to schools. A new release of SIIA’s product evaluation guidelines is now available at the Selling to Schools website (with continued free access to SIIA members), to help guide publishers in measuring the effectiveness of the tools they are selling to schools.
It’s been almost a decade since NCLB made its call for “scientifically-based research,” but the calls for research haven’t faded away. This is because resources available to schools have diminished over that time, heightening the importance of cost benefit trade-offs in spending.
NCLB has focused attention on test score achievement, and this metric is becoming more pervasive; e.g., through a tie to teacher evaluation and through linkages to dropout risk. While NCLB fostered a compliance mentality—product specs had to have a check mark next to SBR— the need to assure that funds are not wasted is now leading to a greater interest in research results. Decision-makers are now very interested in whether specific products will be effective, or how well they have been working, in their districts..
Fortunately, the data available for evaluations of all kinds is getting better and easier to access. The US Department of Education has poured hundreds of millions of dollars into state data systems. These investments make data available to states and drive the cleaning and standardizing of data from districts. At the same time, districts continue to invest in data systems and warehouses. While still not a trivial task, the ability of school district researchers to get the data needed to determine if an investment paid off—in terms of increased student achievement or attendance—has become much easier over the last decade.
The reauthorization of ESEA (i.e., NCLB) is maintaining the pressure to evaluate education products. We are still a long way from the draft reauthorization introduced in Congress becoming a law, but the initial indications are quite favorable to the continued production of product effectiveness evidence. The language has changed somewhat. Look for the phrase “evidence based”. Along with the term “scientifically-valid”, this new language is actually more sophisticated and potentially more effective than the old SBR neologism. Bob Slavin, one of the reviewers of the SIIA guidelines, says in his Ed Week blog that “This is not the squishy ‘based on scientifically-based evidence’ of NCLB. This is the real McCoy.” It is notable that the definition of “evidence-based” goes beyond just setting rules for the design of research, such as the SBR focus on the single dimension of “internal validity” for which randomization gets the top rating. It now asks how generalizable the research is or its “external validity”; i.e., does it have any relevance for decision-makers?
One of the important goals of the SIIA guidelines for product effectiveness research is to improve the credibility of publisher-sponsored research. It is important that educators see it as more than just “market research” producing biased results. In this era of reduced budgets, schools need to have tangible evidence of the value of products they buy. By following the SIIA’s guidelines, publishers will find it easier to achieve that credibility.
Comment on the NY Times: “In Classroom of Future, Stagnant Scores”
The New York Times is running a series of front-page articles on “Grading the Digital School.” The first one ran Labor Day weekend and raised the question as to whether there’s any evidence that would persuade a school board or community to allocate extra funds for technology. With the demise of the Enhancing Education Through Technology (EETT) program, federal funds dedicated to technology will no longer be flowing into states and districts. Technology will have to be measured against any other discretionary purchase. The resulting internal debates within schools and their communities about the expense vs. value of technology promise to have interesting implications and are worth following closely.
“It may be a mistake to look for a direct impact of technology on test scores (aside from technologies aimed specifically at that goal), but it is also a mistake to assume the impact is, in principle, not measurable.”
The first article by Matt Richtel revisits a debate that has been going on for decades between those who see technology as the key to “21st Century learning” and those who point to the dearth of evidence that technology makes any measurable difference to learning. It’s time to try to reframe this discussion in terms of what can be measured. And in considering what to measure, and in honor of Labor Day, we raise a question that is often ignored: what role do teachers play in generating the measurable value of technology?
Let’s start with the most common argument in favor of technology, even in the absence of test score gains. The idea is that technology teaches skills “needed in a modern economy,” and these are not measured by the test scores used by state and federal accountability systems. Karen Cator, director of the US Department of Education office of educational technology, is quoted as saying (in reference to the lack of improvement in test scores), “...look at all the other things students are doing: learning to use the Internet to research, learning to organize their work, learning to use professional writing tools, learning to collaborate with others.” Presumably, none of these things directly impact test scores. The problem with this perennial argument is that many other things that schools keep track of should provide indicators of improvement. If as a result of technology, students are more excited about learning or more engaged in collaborating, we could look for an improvement in attendance, a decrease in drop-outs, or students signing up for more challenging courses.
Information on student behavioral indicators is becoming easier to obtain since the standardization of state data systems. There are some basic study designs that use comparisons among students within the district or between those in the district and those elsewhere in the state. This approach uses statistical modeling to identify trends and control for demographic differences, but is not beyond the capabilities of many school district research departments1 or the resources available to the technology vendors. (Empirical has conducted research for many of the major technology providers, often focusing on results for a single district interested in obtaining evidence to support local decisions.) Using behavioral or other indicators, a district such as that in the Times article can answer its own questions. Data from the technology systems themselves can be used to identify users and non-users and to confirm the extent of usage and implementation. It is also valuable to examine whether some students (those in most need or those already doing okay) or some teachers (veterans or novices) receive greater benefit from the technology. This information may help the district focus resources where they do the most good.
A final thought about where to look for impacts of technologies comes from a graph of the school district’s budget. While spending on technology and salaries have both declined over the last three years, spending on salaries is still about 25 times as great as on technologies. Any discussion of where to find an impact of technology must consider labor costs, which are the district’s primary investment. We might ask whether a small investment in technology would allow the district to reduce the numbers of teachers by, for example, allowing a small increase in the number of students each teacher can productively handle. Alternatively, we might ask whether technology can make a teacher more effective, by whatever measures of effective teaching the district chooses to use, with their current students. We might ask whether technologies result in keeping young teachers on the job longer or encouraging initiative to take on more challenging assignments.
It may be a mistake to look for a direct impact of technology on test scores (aside from technologies aimed specifically at that goal), but it is also a mistake to assume the impact is, in principle, not measurable. We need a clear picture of how various technologies are expected to work and where we can look for the direct and indirect effects. An important role of technology in the modern economy is providing people with actionable evidence. It would be ironic if education technology was inherently opaque to educational decision makers.
1 Or we would hope, the New York Times. Sadly, the article provides a graph of trends in math and reading for the district highlighted in the story compared to trends for the state. The graphic is meant to show that the district is doing worse than the state average. But the article never suggests that we should consider the population of the particular district and whether it is doing better or worse than one would expect, controlling for demographics, available resources, and other characteristics.
A Conversation About Building State and Local Research Capacity
John Q Easton, director of the Institute of Education Sciences (IES), came to New Orleans recently to participate in the annual meeting of the American Educational Research Association. At one of his stops, he was the featured speaker at a meeting of the Directors of Research and Evaluation (DRE), an organization composed of school district research directors. (DRE is affiliated with AERA and was recently incorporated as a 501(c)(3)). John started his remarks by pointing out that for much of his career he was a school district research director and felt great affinity to the group. He introduced the directions that IES was taking, especially how it was approaching working with school systems. He spent most of the hour fielding questions and engaging in discussion with the participants. Several interesting points came out of the conversation about roles for the researchers who work for education agencies.
“...in parallel to building a research culture in districts, it will be necessary to build a practitioner culture among researchers.”
Historically, most IES research grant programs have been aimed at university or other academic researchers. It is noteworthy that even in a program for “Evaluation of State and Local Education Programs and Policies,” grants have been awarded only to universities and large research firms. There is no expectation that researchers working for the state or local agency would be involved in the research beyond the implementation of the program. The RFP for the next generation of Regional Education Labs (REL) contracts may help to change that. The new RFP expects the RELs to work closely with education agencies to define their research questions and to assist alliances of state and local agencies in developing their own research capacity.
Members of the audience noted that, as district directors of research, they often spend more time reviewing research proposals from students and professors at local colleges who want to conduct research in their schools, rather than actually answering questions initiated by the district. Funded researchers treat the districts as the “human subjects,” paying incentives to participants and sometimes paying for data services. But the districts seldom participate in defining the research topic, conducting the studies, or benefiting directly from the reported findings. The new mission of the RELs to build local capacity will be a major shift.
Some in the audience pointed out reasons to be skeptical that this REL agenda would be possible. How can we build capacity if research and evaluation departments across the country are being cut? In fact, very little is known about the number of state or district practitioners whose capacity for research and evaluation could be built by applying the REL resources. (Perhaps, a good first research task for the RELs would be to conduct a national survey to measure the existing capacity.)
John made a good point in reply: IES and the RELs have to work with the district leadership—not just the R&E departments—to make this work. The leadership has to have a more analytic view. They need to see the value of having an R&E department that goes beyond test administration, and is able to obtain evidence to support local decisions. By cultivating a research culture in the district, evaluation could be routinely built in to new program implementations from the beginning. The value of the research would be demonstrated in the improvements resulting from informed decisions. Without a district leadership team that values research to find out what works for the district, internal R&E departments will not be seen as an important capacity.
Some in the audience pointed out that in parallel to building a research culture in districts, it will be necessary to build a practitioner culture among researchers. It would be straightforward for IES to require that research grantees and contractors engage the district R&E staff in the actual work, not just review the research plan and sign the FERPA agreement. Practitioners ultimately hold the expertise in how the programs and research can be implemented successfully in the district, thus improving the overall quality and relevance of the research.
Looking Back 35 Years to Learn about Local Experiments
With the growing interest among federal agencies in building local capacity for research, we took another look at an article by Lee Cronbach published in 1975. We found it has a lot to say about conducting local experiments and implications for generalizability. Cronbach worked for much of his career at Empirical’s neighbor, Stanford University, and his work has had a direct and indirect influence on our thinking. Some may interpret Cronbach’s work as stating that randomized trials of educational interventions have no value because of the complexity of interactions between subjects, contexts, and the experimental treatment. In any particular context, these interactions are infinitely complex, forming a “hall of mirrors” (as he famously put it, p. 119), making experimental results—which at most can address a small number of lower-order interactions—irrelevant. We don’t read it that way. Rather, we see powerful insights as well as cautions for conducting the kinds of field experiments that are beginning to show promise for providing educators with useful evidence.
We presented these ideas at the Society for Research in Educational Effectiveness conference in March, building the presentation around a set of memorable quotes from the 1975 article. Here we highlight some of the main ideas.
Quote #1: “When we give proper weight to local conditions, any generalization is a working hypothesis, not a conclusion...positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (p. 125).
Practitioners are making decisions for their local jurisdiction. An experiment conducted elsewhere (including over many locales, where the results are averaged) provides a useful starting point, but not “proof” that it will or will not work in the same way locally. Experiments give us a working hypothesis concerning an effect, but it has to be tested against local conditions at the appropriate scale of implementation. This brings to mind California’s experience with class size reduction following the famous experiment in Tennessee, and how the working hypothesis corroborated through the experiment did not transfer to a different context. We also see applicability of Cronbach’s ideas in the Investing in Innovation (i3) program, where initial evidence is being taken as a warrant to scale-up intervention, but where the grants included funding for research under new conditions where implementation may head in unanticipated directions, leading to new effects.
Quote #2: “Instead of making generalization the ruling consideration in our research, I suggest that we reverse our priorities. An observer collecting data in one particular situation...will give attention to whatever variables were controlled, but he will give equally careful attention to uncontrolled conditions.... As results accumulate, a person who seeks understanding will do his best to trace how the uncontrolled factors could have caused local departures from the modal effect. That is, generalization comes late, and the exception is taken as seriously as the rule” (pp. 124-125).
Finding or even seeking out conditions that lead to variation in the treatment effect facilitates external validity, as we build an account of the variation. This should not be seen as a threat to generalizability because an estimate of average impact is not robust across conditions. We should spend some time looking at the ways that the intervention interacts differently with local characteristics, in order to determine which factors account for heterogeneity in the impact and which ones do not. Though this activity is exploratory and not necessarily anticipated in the design, it provides the basis for understanding how the treatment plays out, and why its effect may not be constant across settings. Over time, generalizations can emerge, as we compile an account of the different ways in which the treatment is realized and the conditions that suppress or accentuate its effects.
Quote #3: “Generalizations decay” (p. 122).
In the social policy arena, and especially with the rapid development of technologies, we can’t expect interventions to stay constant. And we certainly can’t expect the contexts of implementation to be the same over many years. The call for quicker turn-around in our studies is therefore necessary, not just because decision-makers need to act, but because any finding may have a short shelf life.
—AJ & DN
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 116-127.