blog posts and news stories

Classrooms and Districts: Breaking Down Silos in Education Research and Evidence

I just got back from Edsurge’s Fusion conference. The theme, aimed at classroom and school leaders, was personalizing classroom instruction. This is guided by learning science, which includes brain development and the impact of trauma, as well as empathetic caregiving, as Pamela Cantor beautifully explained in her keynote. It also leads to detailed characterizations of learner variability being explored at Digital Promise by Vic Vuchic’s team, which is providing teachers with mappings between classroom goals and tools and strategies that can address learners who vary in background, cognitive skills, and socio-emotional character.

One of the conference tracks that particularly interested me was the workshops and discussions under “Research & Evidence”. Here is where I experienced a disconnect between Empirical ’s research policy-oriented work interpreting ESSA and Fusion’s focus on improving the classroom.

  • The Fusion conference is focused at the classroom level, where teachers along with their coaches and school leaders are making decisions about personalizing the instruction to students. They advocate basing decisions on research and evidence from the learning sciences.
  • Our work, also using research and evidence, has been focused on the school district level where decisions are about procurement and implementation of educational materials including the technical infrastructure needed, for example, for edtech products.

While the classroom and district levels have different needs and resources and look to different areas of scientific expertise, they need not form conceptual silos. But the differences need to be understood.

Consider the different ways we look at piloting a new product.

  • The Digital Promise edtech pilot framework attempts to move schools toward a more planful approach by getting them to identify and quantify the problem for which the product being piloted could be a solution. The success in the pilot classrooms is evaluated by the teachers, where detailed understandings by the teacher don’t call for statistical comparisons. Their framework points to tools such as the RCE Coach that can help with the statistics to support local decisions.
  • Our work looks at pilots differently. Pilots are excellent for understanding implementability and classroom acceptance (and working with developers to improve the product), but even with rapid cycle tools, the quantitative outcomes are usually not available in time for local decisions. We are more interested in how data can be accumulated nationally from thousands of pilots so that teachers and administrators can get information on which products are likely to work in their classrooms given their local demographics and resources. This is where review sites like Edsurge product reviews or Noodle’s ProcureK12 could be enhanced with evidence about for whom, and under what conditions, the products work best. With over 5,000 edtech products, an initial filter to help choose what a school should pilot will be necessary.

A framework that puts these two approaches together is promulgated in the Every Student Succeeds Act (ESSA). ESSA defines four levels of evidence, based on the strength of the causal inference about whether the product works. More than just a system for rating the scientific rigor of a study, it is a guide to developing a research program with a basis in learning science. The base level says that the program must have a rationale. This brings us back to the Digital Promise edtech pilot framework needing teachers to define their problem. The ESSA level 1 rationale is what the pilot framework calls for. Schools must start thinking through what the problem is that needs to be solved and why a particular product is likely to be a solution. This base level sets up the communication between educators and developers about not just whether the product works in the classroom, but how to improve it.

The next level in ESSA, called “correlational,” is considered weak evidence, because it shows only that the product has “promise” and is worth studying with a stronger method. However, this level is far more useful as a way for developers to gather information about which parts of the program are driving student results, and which patterns of usage may be detrimental. Schools can see if there is an amount of usage that maximizes the value of the product (rather than depending solely on the developer’s rationale). This level 2 calls for piloting the program and examining quantitative results. To get correlational results, the pilot must have enough students and may require going beyond a single school. This is a reason that we usually look for a district’s involvement in a pilot.

The top two levels in the ESSA scheme involve comparisons of students and teachers who use the product to those who do not. These are the levels where it begins to make sense to combine a number of studies of the same product from different districts in a statistical process called meta-analysis so we can start to make generalizations. At these levels, it is very important to look beyond just the comparison of the program group and the control group and gather information on the characteristics of schools, teachers, and students who benefit most (and least) from the product. This is the evidence of most value to product review sites.

When it comes to characterizing schools, teachers, and students, the “classroom” and the “district” approach have different, but equally important, needs.

  • The learner variability project has very fine-grained categories that teachers are able to establish for the students in their class.
  • For generalizable evidence, we need characteristics that are routinely collected by the schools. To make data analysis for efficacy studies a common occurrence, we have to avoid expensive surveys and testing of students that are used only for the research. Furthermore, the research community must reach consensus on a limited number of variables that will be used in research. Fortunately, another aspect of ESSA is the broadening of routine data collection for accountability purposes, so that information on improvements in socio-emotional learning or school climate will be usable in studies.

Edsurge and Digital Promise are part of a west coast contingent of researchers, funders, policymakers, and edtech developers that has been discussing these issues. We look forward to continuing this conversation within the framework provided by ESSA. When we look at the ESSA levels as not just vertical but building out from concrete classroom experience to more abstract and general results from thousands of school districts, then learning science and efficacy research are combined. This strengthens our ability to serve all students, teachers, and school leaders.


Jefferson Education Accelerator Contracts with Empirical for Evidence as a Service™

Jefferson Education Accelerator (JEA) has contracted with Empirical Education Inc. for research services that will provide evidence of the impact of education technology products developed by their portfolio companies. JEA’s mission is to support and evaluate promising edtech solutions in order to help educators make more informed decisions about the products they invest in. The study is designed to meet level 2 or “moderate” evidence as defined by the Every Student Succeeds Act. Empirical will provide a Student Impact Report under its Evidence as a Service offering, which combines student-level product usage data and a school district’s administrative data to conduct a comparison group study. Denis Newman, Empirical’s CEO stated, “This is a perfect application of our Evidence as a Service product, which provides fast answers to questions about which kids will benefit the most from any particular learning program.” Todd Bloom, JEA’s Chief Academic Officer and Research Associate Professor at UVA’s Curry School of Education, commented: “Empirical Education is a highly respected research firm and offers the type of aggressive timeline that is sorely needed in the fast-paced edtech market.” A report on impact in the school year 2017-2018 is expected to be completed in July.


Sure, the edtech product is proven to work, but will it work in my district?

It’s a scenario not uncommon in your district administrators’ office. They’ve received sales pitches and demos of a slew of new education technology (edtech) products, each one accompanied with “evidence” of its general benefits for teachers and students. But underlying the administrator’s decision is a question often left unanswered: Will this work in our district?

In the conventional approach to research advocated, for example, by the U.S. Department of Education and the Every Student Succeeds Act (ESSA), the finding that is reported and used in the review of products is the overall average impact for any and all subgroups of students, teachers, or schools in the study sample. In our own research, we have repeatedly seen that who it works for and under what conditions can be more important than the average impact. There are products that are effective on average but don’t work for an important subgroup of students, or vice versa, work for some students but not all. Some examples:

  • A math product, while found to be effective overall, was effective for white students but ineffective for minority students. This effect would be relevant to any district wanting to close (rather than further widen) an achievement gap.
  • A product that did well on average performed very well in elementary grades but poorly in middle school. This has obvious relevance for a district, as well as for the provider who may modify its marketing target.
  • A teacher PD product greatly benefitted uncertified teachers but didn’t help the veteran teachers do any better than their peers using the conventional textbook. This product may be useful for new teachers but a poor choice for others.

As a research organization, we have been looking at ways to efficiently answer these kinds of questions for products. Especially now, with the evidence requirements built into ESSA, school leaders can ask the edtech salesperson: “Does your product have evidence that ESSA calls for?” They may well hear an affirmative answer supported by an executive summary of a recent study. But, there’s a fundamental problem with what ESSA is asking for. ESSA doesn’t ask for evidence that the product is likely to work in your specific district. This is not the fault of ESSA’s drafters. The problem is built into the conventional design of research on “what works”. The U.S. Department of Education’s What Works Clearinghouse (WWC) bases its evidence rating only on an average; if there are different results for different subgroups of students, that difference is not part of the rating. Since ESSA adopts the WWC approach, that’s the law of the land. Hence, your district’s most pressing question is left unanswered: will this work for a district like mine?

Recently, the Software & Information Industry Association, the primary trade association of the software industry, released a set of guidelines for research explaining to its member companies the importance of working with districts to conduct research that will meet the ESSA standards. As the lead author of this report, I can say it was our goal to foster an improved dialog between the schools and the providers about the evidence that should be available to support buying these products. As an addendum to the guidelines aimed at arming educators with ways to look at the evidence and questions to ask the edtech salesperson, here are three suggestions:

  1. It is better to have some information than no information. The fact that there’s research that found the product worked somewhere gives you a working hypothesis that it could be a better than average bet to try out in your district. In this respect, you can consider the WWC and newer sites such as Evidence for ESSA rating of the study as a screening tool—they will point you to valid studies about the product you’re interested in. But you should treat previous research as a working hypothesis rather than proof.
  2. Look at where the research evidence was collected. You’ll want to know whether the research sites and populations in the study resemble your local conditions. WWC has gone to considerable effort to code the research by the population in the study and provides a search tool so you can find studies conducted in districts like yours. And if you download and read the original report, it may tell you whether it will help reduce or increase an achievement gap of concern.
  3. Make a deal with the salesperson. In exchange for your help in organizing a pilot and allowing them to analyze your data, you get the product for a year at a steep discount and a good ongoing price if you decide to implement the product on a full scale. While you’re unlikely to get results from a pilot (e.g., based on spring testing) in time to support a decision, you can at least lower your cost for the materials, and you’ll help provide a neighboring district (with similar populations and conditions) with useful evidence to support a strong working hypothesis as to whether it is likely to work for them as well.