blog posts and news stories

Updating Evidence According to ESSA

The U.S. Department of Education (ED) sets validity standards for evidence of what works in schools through The Every Student Succeeds Act (ESSA), which provides usefully defined tiers of evidence.

When we helped develop the research guidelines for the Software & Information Industry Association, we took a close look at ESSA and how it is often interpreted. Now, as research is evolving with cloud-based online tools that automatically report usage data, it is important to review the standards and to clarify both ESSA’s useful advances and how the four tiers fail to address some critical scientific concepts. These concepts are needed for states, districts, and schools to make the best use of research findings.

We will cover the following in subsequent postings on this page.

  1. Evidence According to ESSA: Since the founding of the Institute of Education Sciences and NCLB in 2002, the philosophy of evaluation has been focused on the perfection of one good study. We’ll discuss the cost and technical issues this kind of study raises and how it sometimes reinforces educational inequity.
  2. Who is Served by Measuring Average Impact: The perfect design focused on the average impact of a program across all populations of students, teachers, and schools has value. Yet, school decision makers need to also know about the performance differences between specific groups such as students who are poor or middle class, teachers with one kind of preparation or another, or schools with AP courses vs. those without. Mark Schneider, IES’s director defines the IES mission as “IES is in the business of identifying what works for whom under what conditions.” This framing is a move toward a broader focus with more relevant results.
  3. Differential Impact is Unbiased. According to the ESSA standards, studies must statistically control for selection bias and other sources of bias. But biases that impact the average for the population in the study don’t impact the size of the differential effect between subgroups. The interaction between the program and the population characteristic is unaffected. And that’s what the educators need to know about.
  4. Putting many small studies together. Instead of the One Good Study approach we see the need for multiple studies each collecting data on differential impacts for subgroups. As Andrew Coulson of Mind Research Institute put it, we have to move from the One Good Study approach and on to valuing multiple studies with enough variety to be able to account for commonalities among districts. We add that meta-analysis of interaction effects are entirely feasible.

Our team works closely with the ESSA definitions and has addressed many of these issues. Our design for an RCE for the Dynamic Learning Project shows how Tiers 2 and 3 can be combined to answer questions involving intermediate results (or mediators). If you are interested in more information on the ESSA levels of evidence, the video on this page is a recorded webinar that provides clarification.

2020-05-12

Evidentally, a New Company Taking on Edtech Efficacy Analytics

Empirical Education has launched Evidentally, Inc., a new company that specializes in helping edtech companies and their investors make more effective products. Founded by Denis Newman, CEO, and Val Lazarev, Chief Product Architect, the company conducts rapid cycle evaluations that meet the federal Every Student Succeeds Act standards for moderate and promising evidence. The efficacy analytics leverage the edtech product’s usage metrics to efficiently identify states and districts with sufficient usage to make impact studies feasible. Evidentally is actively servicing and securing initial clients, and is seeking seed funding to prepare for expansion. In the meantime, the company is being incubated by Empirical Education, which has transferred intellectual property relating to its R&D prototypes of the service and is providing staffing through a services agreement. The Evidentally team will be meeting with partners and investors at SXSW EDU, EdSurge Immersion, ASU GSV Summit, and ISTE. Let’s talk!

2019-02-06

For Quasi-experiments on the Efficacy of Edtech Products, it is a Good Idea to Use Usage Data to Identify Who the Users Are

With edtech products, the usage data allows for precise measures of exposure and whether critical elements of the product were implemented. Providers often specify an amount of exposure or the kind of usage that is required to make a difference. Furthermore, educators often want to know whether the program has an effect when implemented as intended. Researchers can readily use data generated by the product (usage metrics) to identify compliant users, or to measure the kind and amount of implementation.

Since researchers generally track product implementation and statistical methods allow for adjustments for implementation differences, it is possible to estimate the impact on successful implementers, or technically, on a subset of study participants who were compliant with treatment. It is, however, very important that the criteria researchers use in setting a threshold be grounded in a model of how the program works. This will, for example, point to critical components that can be referred to in specifying compliance. Without a clear rationale for the threshold set in advance, the researcher may appear to be “fishing” for the amount of usage that produces an effect.

Some researchers reject comparison studies in which identification of the treatment group occurs after the product implementation has begun. This is based in part on the concern that the subset of users who comply with the suggested amount of usage will get more exposure to the program. More exposure will result in a larger effect. This assumes of course, that the product is effective, otherwise the students and teachers will have been wasting their time and will likely perform worse than the comparison group.

There is also the concern that the “compliers” may differ from the non-compliers (and non-users) in some characteristic that isn’t measured. And that even after controlling for measurable variables (prior achievement, ethnicity, English proficiency, etc.), there could be a personal characteristic that results in an otherwise ineffective program becoming effective for them. We reject this concern and take the position that a product’s effectiveness can be strengthened or weakened by many factors. A researcher conducting any matched comparison study can never be certain that there isn’t an unmeasured variable that is biasing it. (That’s why the What Works Clearinghouse only accepts Quasi-Experiments “with reservations.”) However, we believe that as long as the QE controls for the major factors that are known to affect outcomes, the study can meet the Every Student Succeeds Act requirement that the researcher “controls for selection bias.”

With those caveats, we believe that a QE, which identifies users by their compliance to a pre-specified level of usage, is a good design. Studies that look at the measurable variables that modify the effectiveness of a product can not only be useful for school in answering their question, “is the product likely to work in my school?” but points the developer and product marketer to ways the product can be improved.

2018-07-27

How Are Edtech Companies Thinking About Data and Research?

Forces of the rebellion were actively at work at SIIA’s Annual Conference last week in San Francisco. Snippets of conversation revealed a common theme of harnessing and leveraging data in order to better understand and serve the needs of schools and districts.

This theme was explored in depth during one panel session, “Efficacy and Research: Why It Matters So Much in the Education Market”, where edtech executives discussed the phases and roles of research as it relates to product improvement and marketing. Moderated by Pearson’s Gary Mainor, session panelists included Andrew Coulson of the MIND Research Institute, Kelli Hill of Kahn Academy, and Shawn Mahoney of McGraw Hill Education.

Coulson, who was one of the contributing reviewers of our Research Guidelines, stated that all signs are pointing to an “exponential increase” of school district customers asking for usage data. He advised fellow edtech entrepreneurs to start paying attention to fine-grained usage data, as it is becoming necessary to provide this for customers. Panelist Kelli Hill agreed with the importance of making data visible, adding that Kahn Academy proactively provides users with monthly usage reports.

In addition to providing helpful advice for edtech sales and marketing teams, the session also addressed a pervasive misconception that that all it takes is “one good study” to validate and prove the effectiveness of a program. A company could commission one rigorous randomized trial reporting positive results and obtaining endorsement from the What Works Clearinghouse, but that study might be outdated, and more importantly, not relevant to what schools and districts are looking for. Panelist Shawn Mahoney, Chief Academic Officer of McGraw-Hill Education, affirmed that school districts are interested in “super contextualized research” and look for recent and multiple studies when evaluating a product. Q&A discussions with the panelists revealed that school decision makers are quick to claim “what works for someone else might not work for us”, supporting the notion that the conduct of multiple research studies, reporting effects for various subgroups and populations of students, is much more useful and reflective of district needs.

SIIA’s gathering proved to be a fruitful event, allowing us to reconnect with old colleagues and meet new ones, and leaving us with a number of useful insights and optimistic possibilities for new directions in research.

2018-06-22

AERA 2018 Recap: The Possibilities and Necessity of a Rigorous Education Research Community

This year’s AERA annual meeting on “The Dreams, Possibilities, and Necessity of Public Education,” was fittingly held in the city with the largest number of public school students in the country—New York. Against this radically diverse backdrop, presenters were encouraged to diversify both the format and topics of presentations in order to inspire thinking and “confront the struggles for public education.”

AERA’s sheer size may risk overwhelming its attendees, but in other ways, it came as a relief. At a time when educators and education remain under-resourced, it was heartening to be reminded that a large, vibrant community of dedicated and intelligent people exists to improve educational opportunities for all students.

One theme that particularly stood out is that researchers are finding increasingly creative ways to use existing usage data from education technology products to measure impact and implementation. This is a good thing when it comes to reducing the cost of research and making it more accessible to smaller businesses and nonprofits. For example, in a presentation on a software-based knowledge competition for nursing students, researchers used usage data to identify components of player styles and determine whether these styles had a significant effect on student performance. In our Edtech Research Guidelines, Empirical similarly recommends that edtech companies take advantage of their existing usage data to run impact and implementation analyses, without using more expensive data collection methods. This can help significantly reduce the cost of research studies—rather than one study that costs $3 million, companies can consider multiple lower-cost studies that leverage usage data and give the company a picture of how the product performs in a greater diversity of contexts.

Empirical staff themselves presented on a variety of topics, including quasi-experiments on edtech products; teacher recruitment, evaluation, and retention; and long-term impact evaluations. In all cases, Empirical reinforced its commitment to innovative, low-cost, and rigorous research. You can read more about the research projects we presented in our previous AERA post.

photo of Denis Newman presenting at AERA 2018

Finally, Empirical was delighted to co-host the Division H AERA Reception at the Supernova bar at Novotel Hotel. If you ever wondered if Empirical knows how to throw a party, wonder no more! A few pictures from the event are below. View all of the pictures from our event on facebook!


We had a great time and look forward to seeing everyone at the next AERA annual meeting!

2018-05-03

Where's Denis?

It’s been a busy month for Empirical CEO Denis Newman, who’s been in absentia at our Palo Alto office as he jet-sets around the country to spread the good word of rigorous evidence in education research.

His first stop was Washington, DC and the conference of the Society for Research on Educational Effectiveness (SREE). This was an opportunity to get together with collaborators, as well as plot proposal writing, blog postings, webinars, and revisions to our research guidelines for edtech impact studies. Andrew Jaciw, Empirical’s Chief Scientist, kept up the company’s methodological reputation with a paper presentation on “Leveraging Fidelity Data to Make Sense of Impact Results.” For Denis, a highlight was dinner with Peg Griffin, a longtime friend and his co-author on The Construction Zone. Then it was on to Austin, TX, for a very different kind of meeting—more of a festival, really.

At this year’s SXSWEDU, Denis was one of three speakers on the panel, “Can Evidence Even Keep Up with Edtech?” The problem presented by the panel was that edtech, as a rapidly moving field, seems to be outpacing the rate of research that stakeholders may want to use to evaluate these products. How, then, could education stakeholders make informed decisions about whether to use edtech products?

According to Denis, the most important thing is for a district to have enough information to know whether a given edtech product may or may not work for that district’s unique population and context. Therefore, researchers may need to adapt their methods both to be able to differentiate a product’s impact between subgroups, as well as to meet the faster timelines of edtech product development. Empirical’s own solution to this quandry, Evidence as a ServiceTM, offers quick-turnaround research reports that can examine impact and outcomes for specific student subgroups, with methodology that is flexible but rigorous enough to meet ESSA standards.

Denis praised the panel, stating, “In the festival’s spirit of invention, our moderator, Mitch Weisberg, masterfully engaged the audience from the beginning to pose the questions for the panel. Great questions, too. I got to cover all of my prepared talking points!”

You can read more coverage of our SXSWEDU panel on EdSurge.

After the panel, a string of meetings and parties kept the energy high and continued to show the growing interest in efficacy. The ISTE meetup was particularly important following this theme. The concern raised by the ISTE leadership and its members—which are school-based technology users—was that traditional research doesn’t tell the practitioners whether a product is likely to work in their school, given its resources and student demographics. Users are faced with hundreds of choices in any product category and have little information for narrowing down the choice to a few that are worth piloting.

Following SXSWEDU, it was back to DC for the Consortium for School Networking (CoSN) conference. Denis participated in the annual Feedback Forum hosted by CoSN and the Software & Information Industry Association (SIIA), where SIIA—representing edtech developers—looked for feedback from the CIOs and other school district leaders. This year, SIIA was looking for feedback that would help the Empirical team improve the edtech research guidelines, which are sponsored by SIIA’s Education Technology Industry Network (ETIN). Linda Winter moderated and ran the session like a focus group, asking questions such as:

  • What data do you need from products to gauge engagement?
  • How can the relationship of engagement and achievement indicate that a product is working?
  • What is the role of pilots in measuring success?
  • And before a pilot decision is made, what do CoSN members need to know about edtech products to decide if they are likely to work?

The CoSN members were brutally honest, pointing out that as the leaders responsible for the infrastructure, they were concerned with implementability, bandwidth requirements, and standards such as single sign-on. Whether the software improved learning was secondary—if teachers couldn’t get the program to work, it hardly mattered how effective it may be in other districts.

Now, Denis is preparing for the rest of the spring conference season. Next stop will be New York City and the American Education Research Association (AERA) conference, which attracts over 20,000 researchers annually. The Empirical team will be presenting four studies, as well as co-hosting a cocktail reception with AERA’s school research division. Then, it’s back on the plane for ASU-GSV in San Diego.

For more information about Evidence as a Service, the edtech research guidelines, or to invite Denis to speak at your event, please email rmeans@empiricaleducation.com

2018-03-26

Join Our Webinar: Measuring Ed Tech impact in the ESSA Era

Tuesday, November 7, 2017 … 2:00 - 3:00pm PT

Our CEO, Denis Newman, will be collaborating with Andrew Coulson (Chief Strategist, MIND Research Institute) and Bridget Foster (Senior VP and Managing Director, SIIA) to bring you an informative webinar next month!

This free webinar (Co-hosted by edWeb.net and MCH Strategic Data) will introduce you to a new approach to evidence about which edtech products really work in K-12 schools. ESSA has changed the game when it comes to what counts as evidence. This webinar builds on the Education Technology Industry Network’s (ETIN) recent publication of Guidelines for EdTech Impact Research that explains the new ground rules.

The presentation will explore how we can improve the conversation between edtech developers and vendors (providers), and the school district decision makers who are buying and/or piloting the products (buyers). ESSA has provided a more user-friendly definition of evidence, which facilitates the conversation.

  • Many buyers are asking providers if there’s reason to think their product is likely to work in a district like theirs.
  • For providers, the new ESSA rules let them start with simple studies to show their product shows promise without having to invest in expensive trials to prove it will work everywhere.

The presentation brings together two experts: Andrew Coulson, a developer who has conducted research on their products and is concerned with improving the efficacy of edtech, and Denis Newman, a researcher who is the lead author of the ETIN Guidelines. The presentation will be moderated by Bridget Foster, a long-time educator who now directs the ETIN at SIIA. This edWebinar will be of interest to edtech developers, school and district administrators, education policy makers, association leaders, and any educator interested in the evidence of efficacy in edtech.

If you would like to attend, click here to register.

2017-09-28

Academic Researchers Struggle with Research that is Too Expensive and Takes Too Long

I was in DC for an interesting meeting a couple weeks ago. The “EdTech Efficacy Research Academic Symposium” was very much an academic symposium.

The Jefferson Education Accelerator—out of the University of Virginia school of education—and Digital Promise—an organization that invents ways for school districts to make interesting use of edtech products and concepts—sponsored the get together. About 32% of the approximately 260 invited attendees were from universities or research organizations that conduct academic style research. About 16% represented funding or investment organizations and agencies, and another 20% were from companies that produce edtech (often being funded by the funders). 6% were school practitioners and, as would be expected at a DC event, about 26% were from associations and the media.

I represented a research organization with a lot of experience evaluating commercial edtech products. While in the midst of writing research guidelines for the software industry, i.e., the Software & Information Industry Association (SIIA), I felt a bit like an anthropologist among the predominantly academic crowd. I was listening to the language and trying to discern thinking patterns of professors and researchers, both federally- and foundation-funded. A fundamental belief is that real efficacy research is expensive (in the millions of dollars) and slow (a minimum of several years for a research report). A few voices said the cost could be lowered, especially for a school-district-initiated pilot, but the going rate—according to discussions at the meeting—for a simple study starts at $250,000. Given a recent estimate of 4,000 edtech products, (and assuming that new products and versions of existing products are being released at an accelerating rate), the annual cost of evaluating all edtech products would be around $1 billion—an amount unlikely to be supported in the current school funding climate.

Does efficacy research need to be that expensive and slow given the widespread data collection by schools, widely available datasets, and powerful computing capabilities? Academic research is expensive for several reasons. There is little incentive for research providers to lower costs. Federal agencies offer large contracts to attract the large research organizations with experience and high overhead rates. Other funders are willing to pay top dollar for the prestige of such organizations. University grant writers aim to support a whole scientific research program and need to support grad students and generally conduct unique studies that will be attractive to journals. In conventional practice, each study is a custom product. Automating repeatable processes is not part of the culture. Actually, there is an odd culture clash between the academic researchers and the edtech companies needing their services.

Empirical Education is now working with Reach Capital and their portfolio to develop an approach for edtech companies and their investors to get low-cost evidence of efficacy. We are also getting our recommendations down in the form of guidelines for edtech companies to get usable evidence. The document is expected to be released at SIIA’s Education Impact Symposium in July.

2017-05-30

Importance is Important for Rules of Evidence Proposed for ED Grant Programs

The U.S. Department of Education recently proposed new rules for including serious evaluations as part of its grant programs. The approach is modeled on how evaluations are used in the Investing in Innovation (i3) program where the proposal must show there’s some evidence that the proposed innovation has a chance of working and scaling and must include an evaluation that will add to a growing body of evidence about the innovation. We like this approach because it treats previous research as a hypothesis that the innovation may work in the new context. And each new grant is an opportunity to try the innovation in a new context, with improved approaches that warrant another check on effectiveness. But the proposed rules definitely had some weaknesses that were pointed out in the public comments available online. We hope ED heeds these suggestions.

Mark Schneiderman representing the Software and Information Industry Association (SIIA) recommends that outcomes used in effectiveness studies should not be limited to achievement scores.

SIIA notes that grant program resources could appropriately address a range of purposes from instructional to administrative, from assessment to professional development, and from data warehousing to systems productivity. The measures could therefore include such outcomes as student test scores, teacher retention rates, changes in classroom practice or efficiency, availability and use of data or other student/teacher/school outcomes, and cost effectiveness and efficiency that can be observed and measured. Many of these outcome measures can also be viewed as intermediate outcomes—changes in practice that, as demonstrated by other research, are likely to affect other final outcomes.

He also points out that quality of implementation and the nature of the comparison group can be the deciding factors in whether or not a program is found to be effective.

SIIA notes that in education there is seldom a pure control condition such as can be achieved in a medical trial with a placebo or sugar pill. Evaluations of education products and services resemble comparative effectiveness trials in which a new medication is tested against a currently approved one to determine whether it is significantly better. The same product may therefore prove effective in one district that currently has a weak program but relatively less effective in another where a strong program is in place. As a result, significant effects can often be difficult to discern.

This point gets to the heart of the contextual issues in any experimental evaluation. Without understanding the local conditions of the experiment the size of the impact for any other context cannot be anticipated. Some experimentalists would argue that a massive multi-site trial would allow averaging across many contextual variations. But such “on average” results won’t necessarily help the decision-maker working in specific local conditions. Thus, taking previous results as a rough indication that an innovation is worth trying is the first step before conducting the grant-funded evaluation of a new variation of the innovation under new conditions.

Jon Baron, writing for the Coalition for Evidence Based Policy expresses a fundamental concern about what counts as evidence. Jon, who is a former Chair of the National Board for Education Sciences and has been a prominent advocate for basing policy on rigorous research, suggests that

“the definition of ‘strong evidence of effectiveness’ in §77.1 incorporate the Investing in Innovation Fund’s (i3) requirement for effects that are ‘substantial and important’ and not just statistically significant.”

He cites examples where researchers have reported statistically significant results, which were based on trivial outcomes or had impacts so small as to have no practical value. Including “substantial and important” as additional criteria also captures the SIIA’s point that it is not sufficient to consider the internal validity of the study—policy makers must consider whether the measure used is an important one or whether the treatment-control contrast allows for detecting a substantial impact.

Addressing the substance and importance of the results gets us appropriately into questions of external validity, and leads us to questions about subgroup impact, where, for example, an innovation has a positive impact “on average” and works well for high scoring students but provides no value for low scoring students. We would argue that a positive average impact is not the most important part of the picture if the end result is an increase in a policy-relevant achievement gap. Should ED be providing grants for innovations where there has been a substantial indication that a gap is worsened? Probably yes, but only if the proposed development is aimed at fixing the malfunctioning innovation and if the program evaluation can address this differential impact.

2013-03-17

Final Report Released on The Efficacy of PCI’s Reading Program

Empirical has released the final report of a three-year longitudinal study on the efficacy of the PCI Reading Program, which can be found on our reports page. This study, the first formal assessment of the PCI Reading Program, evaluated the program among a sample of third- through eighth-grade students with supported-level disabilities in Florida’s Brevard Public Schools and Miami-Dade County Public Schools. The primary goal of the study was to identify whether the program could achieve its intended purpose of teaching specific sight words. The study was completed in three “phases,” or school years. The results from Phase 1 and 2 showed a significant positive effect on student sight word achievement and Phase 2 supported the initial expectation that two years of growth would be greater than one year (read more on results of Phase 1 and Phase 2).

“Working with Empirical Education was a win for us on many fronts. Their research was of the highest quality and has really helped us communicate with our customers through their several reports and conference presentations. They went beyond just outcomes to show how teachers put our reading program to use in classrooms. In all their dealings with PCI and with the school systems they were highly professional and we look forward to future research partnership opportunities.” - Lee Wilson, President & CEO, PCI Educational Publishing

In Phase 3, the remaining sample of students was too small to conduct any impact analyses, so researchers investigated patterns in students’ progress through the program. The general findings were positive in that the exploration confirmed that students continue to learn more sight words with a second year of exposure to PCI although at a slower pace than expected by the developers. Furthermore, findings across all three phases show high levels of teacher satisfaction with the program. Along with this positive outcome, teacher-reported student engagement levels were also high.

2011-12-09
Archive