blog posts and news stories

Ending a Two-Decade-Old Research Legacy

This is the first of a four-part blog posting about changes needed to the legacy of NCLB to make research more useful to school decision-makers. This post focuses on research methods established in the last 20 years and the reason that much of that research hasn’t been useful to schools. Subsequent posts will present an approach that works for schools.

With the COVID-19 crisis and school closures in full swing, use of edtech is predicted to not just continue, but expand when the classrooms they have replaced come back in use. There is little information available about which edtech products work for whom and under what conditions and the lack of evidence is noted. As a research organization specializing in rigorous program evaluations, Empirical Education, working with Evidentally, Inc., has been developing methods for providing schools with useful and affordable information.

The “modern” era of education research was established with No Child Left Behind, the education law passed in 2002, which established the Institute of Education Sciences (IES). The declaration of “scientifically-based research” in NCLB sparked a major undertaking assigned to the IES. To meet NCLB’s goals for improvement, it was essential to overthrow education research methods that lacked the practical goal of determining whether programs, products, practices, or policies moved the needle for an outcome of interest. These outcomes could be test scores or other measures, such as discipline referrals, that the schools were concerned with.

The kinds of questions that the learning sciences of the time answered were different: for example, how can we compare tests with tasks outside of the test? Or what is the structure of the classroom dialogue? Instead of these qualitative methods, IES looked to the medical field and other areas of policy like workforce training where randomization was used to assign subjects to a treatment group and a control group. Statistical summaries were then used to decide whether the researcher can reasonably conclude that a difference of the magnitude observed is big enough that it is unlikely to happen without a real effect.

In an attempt to mimic the medical field, IES set up the What Works Clearinghouse (WWC), which became the arbiter of acceptable research. WWC focused on getting the research design right. Many studies would be disqualified, with very few, and sometimes just one, meeting design requirements considered acceptable to provide valid causal evidence. This led to the idea that all programs, products, practices, or policies needed at least one good study that would prove its efficacy. The focus on design (or “internal validity”) was to the exclusion of generalizability (or “external validity”). Our team has conducted dozens of RCTs and appreciates the need for careful design. But we try to keep in mind the information that schools need.

Schools need to know whether the current version of the product will work, when implemented using district resources and with its specific student and teacher populations. A single acceptable study may have been conducted a decade ago with ethnicities different from the district that is looking for evidence of what will work for them. The WWC has worked hard to be fair to each intervention, especially where there are multiple studies that meet their standards. But, the key is that each separate study comes with an average conclusion. While the WWC notes the demographic composition of the subjects in the study, difference results for subgroups when tested by the study are considered secondary.

The Every Student Succeeds Act (ESSA) was passed with bi-partisan support in late 2015 and replaced NCLB. ESSA replaced scientifically-based research required by NCLB with four evidence tiers. While this was an important advance, ESSA retained the WWC as providing the definition of the top two tiers. The WWC, which remains the arbiter of evidence validity, gives the following explanation of the purpose of the evidence tiers.

“Evidence requirements under the Every Student Succeeds Act (ESSA) are designed to ensure that states, districts, and schools can identify programs, practices, products, and policies that work across various populations.”

The key to ESSA’s failings is in the final clause: “that work across various populations.” As a legacy of the NCLB-era, the WWC is only interested in the average impact across the various populations in the study. The problem is that district or school decision-makers need to know if the program will work in their schools given their specific population and resources.

The good news is that the education research community is recognizing that ignoring the population, region, product, and implementation differences is no longer necessary. Strict rules were needed to make the paradigm shift to the RCT-oriented approach. But now that NCLB’s paradigm has been in place for almost two decades, and generations of educational researchers have been trained in it, we can broaden the outlook. Mark Schneider, IES’s current director, defines the IES mission as being “in the business of identifying what works for whom under what conditions.” This framing is a move toward a broader focus with more relevant results. Researchers are looking at how to generalize results; for example, the Generalizer tool developed by Tipton and colleagues uses demographics of a target district to generate an applicability metric. The Jefferson Education Exchange’s EdTech Genome Project has focused on implementation models as an important factor in efficacy.

Our methods move away from the legacy of the last 20 years to lower the cost of program evaluations, while retaining the scientific rigor and avoiding biases that give schools misleading information. Lowering cost makes it feasible for thousands of small local studies to be conducted on the multitude of school products. Instead of one or a small handful of studies for each product, we can use a dozen small studies encompassing the variety of contexts so that we can determine for whom and under what conditions the product works.

Read the second part of this series next.