blog posts and news stories

Academic Researchers Struggle with Research that is Too Expensive and Takes Too Long

I was in DC for an interesting meeting a couple weeks ago. The “EdTech Efficacy Research Academic Symposium” was very much an academic symposium.

The Jefferson Education Accelerator—out of the University of Virginia school of education—and Digital Promise—an organization that invents ways for school districts to make interesting use of edtech products and concepts—sponsored the get together. About 32% of the approximately 260 invited attendees were from universities or research organizations that conduct academic style research. About 16% represented funding or investment organizations and agencies, and another 20% were from companies that produce edtech (often being funded by the funders). 6% were school practitioners and, as would be expected at a DC event, about 26% were from associations and the media.

I represented a research organization with a lot of experience evaluating commercial edtech products. While in the midst of writing research guidelines for the software industry, i.e., the Software & Information Industry Association (SIIA), I felt a bit like an anthropologist among the predominantly academic crowd. I was listening to the language and trying to discern thinking patterns of professors and researchers, both federally- and foundation-funded. A fundamental belief is that real efficacy research is expensive (in the millions of dollars) and slow (a minimum of several years for a research report). A few voices said the cost could be lowered, especially for a school-district-initiated pilot, but the going rate—according to discussions at the meeting—for a simple study starts at $250,000. Given a recent estimate of 4,000 edtech products, (and assuming that new products and versions of existing products are being released at an accelerating rate), the annual cost of evaluating all edtech products would be around $1 billion—an amount unlikely to be supported in the current school funding climate.

Does efficacy research need to be that expensive and slow given the widespread data collection by schools, widely available datasets, and powerful computing capabilities? Academic research is expensive for several reasons. There is little incentive for research providers to lower costs. Federal agencies offer large contracts to attract the large research organizations with experience and high overhead rates. Other funders are willing to pay top dollar for the prestige of such organizations. University grant writers aim to support a whole scientific research program and need to support grad students and generally conduct unique studies that will be attractive to journals. In conventional practice, each study is a custom product. Automating repeatable processes is not part of the culture. Actually, there is an odd culture clash between the academic researchers and the edtech companies needing their services.

Empirical Education is now working with Reach Capital and their portfolio to develop an approach for edtech companies and their investors to get low-cost evidence of efficacy. We are also getting our recommendations down in the form of guidelines for edtech companies to get usable evidence. The document is expected to be released at SIIA’s Education Impact Symposium in July.


Carnegie Summit 2017 Recap

If you’ve never been to Carnegie Summit, we highly recommend it.

This was our first year attending Carnegie Foundation’s annual conference in San Francisco, and we only wish we had checked it out sooner. Chief Scientist Andrew Jaciw attended on behalf of Empirical Education, and he took over our twitter account for the duration of the event. Below is a recap of his live tweeting, interspersed with additional thoughts too verbose for twitter’s strict character limitations.

Day 1

Curious about what I will learn. On my mind: Tony Bryk’s distinction between evidence-based practice and practice-based evidence. I am also thinking of how the approaches to be discussed connect to ideas of Lee Cronbach - he was very interested in timeliness and relevance of research findings and the limited reach of internal validity.

I enjoyed T. Bryk’s talk. These points resonated.

Improvement Science involves a hands-on approach to identifying systemic sources of predictable failure. This is appealing because it puts problem solving at the core, while realizing the context-specificity of what will actually work!

Day 2

Jared Bolte - Great talk! Improvement Science contrasts with traditional efficacy research by jumping right in to solve problems, instead of waiting. This raises an important question: What is the cost of delaying action to wait for efficacy findings? I am reminded of Lee Cronbach’s point: the half-life of empirical propositions is short!

This was an excellent session with Tony Bryk and John Easton. There were three important questions posed.

Day 3

Excited to Learn about PDSA cycles


Presenting at AERA 2017

We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in San Antonio, TX from April 27 – 30, 2017.

Research Presentations will include the following.

Increasing Accessibility of Professional Development (PD): Evaluation of an Online PD for High School Science Teachers
Authors: Adam Schellinger, Andrew P Jaciw, Jenna Lynn Zacamy, Megan Toby, & Li Lin
In Event: Promoting and Measuring STEM Learning
Saturday, April 29 10:35am to 12:05pm
Henry B. Gonzalez Convention Center, River Level, Room 7C

Abstract: This study examines the impact of an online teacher professional development, focused on academic literacy in high school science classes. A one-year randomized control trial measured the impact of Internet-Based Reading Apprenticeship Improving Science Education (iRAISE) on instructional practices and student literacy achievement in 27 schools in Michigan and Pennsylvania. Researchers found a differential impact of iRAISE favoring students with lower incoming achievement (although there was no overall impact of iRAISE on student achievement). Additionally, there were positive impacts on several instructional practices. These findings are consistent with the specific goals of iRAISE: to provide high-quality, accessible online training that improves science teaching. Authors compare these results to previous evaluations of the same intervention delivered through a face-to-face format.

How Teacher Practices Illuminate Differences in Program Impact in Biology and Humanities Classrooms
Authors: Denis Newman, Val Lazarev, Andrew P Jaciw, & Li Lin
In Event: Poster Session 5 - Program Evaluation With a Purpose: Creating Equal Opportunities for Learning in Schools
Friday, April 28 12:25 to 1:55pm
Henry B. Gonzalez Convention Center, Street Level, Stars at Night Ballroom 4

Abstract: This paper reports research to explain the positive impact in a major RCT for students in the classrooms of a subgroup of teachers. Our goal was to understand why there was an impact for science teachers but not for teachers of humanities, i.e., history and English. We have labelled our analysis “moderated mediation” because we start with the finding that the program’s success was moderated by the subject taught by the teacher and then go on to look at the differences in mediation processes depending on the subject being taught. We find that program impact teacher practices differ by mediator (as measured in surveys and observations) and that mediators are differentially associated with student impact based on context.

Are Large-Scale Randomized Controlled Trials Useful for Understanding the Process of Scaling Up?
Authors: Denis Newman, Val Lazarev, Jenna Lynn Zacamy, & Li Lin
In Event: Poster Session 3 - Applied Research in School: Education Policy and School Context
Thursday, April 27 4:05 to 5:35pm
Henry B. Gonzalez Convention Center, Ballroom Level, Hemisfair Ballroom 2

Abstract: This paper reports a large scale program evaluation that included an RCT and a parallel study of 167 schools outside the RCT that provided an opportunity for the study of the growth of a program and compare the two contexts. Teachers in both contexts were surveyed and a large subset of the questions are asked of both scale-up teachers and teachers in the treatment schools of the RCT. We find large differences in the level of commitment to program success in the school. Far less was found in the RCT suggesting that a large scale RCT may not be capturing the processes at play in the scale up of a program.

We look forward to seeing you at our sessions to discuss our research. You can also view our presentation schedule here.


SREE Spring 2017 Conference Recap

Several Empirical Education team members attended the annual SREE conference in Washington, DC from March 4th - 5th. This year’s conference theme, “Expanding the Toolkit: Maximizing Relevance, Effectiveness and Rigor in Education Research,” included a variety of sessions focused on partnerships between researchers and practitioners, classroom instruction, education policy, social and emotional learning, education and life cycle transitions, and research methods. Andrew Jaciw, Chief Scientist at Empirical Education, chaired a session about Advances in Quasi-Experimental Design. Jaciw also presented a poster on developing a “systems check” for efficacy studies under development. For more information on this diagnostic approach to evaluation, watch this Facebook Live video of Andrew’s discussion of the topic.

Other highlights of the conference included Sean Reardon’s keynote address highlighting uses of “big data” in creating context and generating hypotheses in education research. Based on data from the Stanford Education Data Archive (SEDA), Sean shared several striking patterns of variation in achievement and achievement gaps among districts across the country, as well as correlations between achievement gaps and socioeconomic status. Sean challenged the audience to consider how to expand this work and use this kind of “big data” to address critical questions about inequality in academic performance and education attainment. The day prior to the lecture, our CEO, Denis Newman, attended a workshop lead by Sean and colleagues that provided a detailed overview of the SEDA data and how it can be used in education research. The psychometric work to generate equivalent scores for every district in the country, the basis for his findings, was impressive and we look forward to their solving the daunting problem of extending the database to encompass individual schools.


Presentation at the 2016 Learning Forward Annual Conference

Learning Forward announced that our proposal was accepted for the 2016 annual conference being held in Vancouver, Canada this year. Teacher Evaluation Specialist K.C. MacQueen will join Fort Wayne Community Schools’ (FWCS) Todd Cummings and Learning Forward’s Kay Psencik in presenting “Principals Collaborating to Deepen Understanding of High-Quality Instruction.” They will highlight how FWCS is engaged in a process to ensure equitable evaluation of teacher effectiveness using Observation Engine™. If you or someone you know is attending the annual conference in December 2016, here are the details of the presentation.

  • Day/time: Tuesday, December 6, 2016 from 10AM-Noon
  • Session: I 15

SREE Spring 2016 Conference Presentations

We are excited to be presenting two topics at the annual Spring Conference of The Society for Research on Educational Effectiveness (SREE) next week. Our first presentation addresses the problem of using multiple pieces of evidence to support decisions. Our second presentation compares the context of an RCT with schools implementing the same program without those constraints. If you’re at SREE, we hope to run into you, either at one of these presentations (details below) or at one of yours.

Friday, March 4, 2016 from 3:30 - 5PM
Roosevelt (“TR”) - Ritz-Carlton Hotel, Ballroom Level

6E. Evaluating Educational Policies and Programs
Evidence-Based Decision-Making and Continuous Improvement

Chair: Robin Wisniewski, RTI International

Does “What Works”, Work for Me?: Translating Causal Impact Findings from Multiple RCTs of a Program to Support Decision-Making
Andrew P. Jaciw, Denis Newman, Val Lazarev, & Boya Ma, Empirical Education

Saturday, March 5, 2016 from 10AM - 12PM
Culpeper - Fairmont Hotel, Ballroom Level

Session 8F: Evaluating Educational Policies and Programs & International Perspectives on Educational Effectiveness
The Challenge of Scale: Evidence from Charters, Vouchers, and i3

Chair: Ash Vasudeva, Bill & Melinda Gates Foundation

Comparing a Program Implemented under the Constraints of an RCT and in the Wild
Denis Newman, Valeriy Lazarev, & Jenna Zacamy, Empirical Education


Learning Forward Presentation Highlights Fort Wayne Partnership

This past December, Teacher Evaluation Specialist K.C. MacQueen presented at the annual Learning Forward conference. MacQueen presented alongside Fort Wayne Community Schools’ (FWCS) Todd Cummings and Laura Cain, and Learning Forward’s Kay Psencik. The presentation titled, “Implementing Inter-Rater Reliability in a Learning System,” highlighted how FWCS has used Calibration & Certification Engine (CCE), School Improvement Network’s branded version of Observation Engine™, to ensure equitable evaluation of teacher effectiveness. FWCS detailed the process they used to engage instructional leaders in developing a common rubric vocabulary around their existing teacher observation rubric. While an uncommon step and one that definitely added to the implementation timeline, FWCS prioritized this collaboration and found that it increased both inter-rater reliability and buy-in to the process with the ultimate goal of assisting teachers in improving classroom instruction to result in greater student growth.


Empirical Education Visits Chicago

We had such a great time in windy Chicago last month for the annual meeting of the American Educational Research Association (AERA). All of the presentations we attended were thought-provoking, and our presentations also seemed to be well-received.

The highlight of our trip, as always, was our annual reception. This year was our first time entertaining friends in a presidential suite, and the one at the Fairmont did not disappoint. Thanks to everyone who came and enjoyed an HLM, our signature cocktail. (The Hendricks Lemontwist Martini of course, what else would it stand for?)

Many of the pictures taken at our AERA reception can be found on facebook, but here is a sneak peek of some of our favorites.


Conference Season 2015

Empirical researchers are traveling all over the country this conference season. Come meet our researchers as we discuss our work at the following events. If you plan to attend any of these, please get in touch so we can schedule a time to speak with you, or come by to see us at our presentations.


We are pleased to announce that we will have our fifth appearance at the 40th annual conference of the Association for Education Finance and Policy (AEFP). Join us in the afternoon on Friday, February 27th at the Marriott Wardman Park, Washington DC as Empirical’s Senior Research Scientist Valeriy Lazarev and CEO Denis Newman present on Methods of Teacher Evaluation in Concurrent Session 7. Denis will also be the acting discussant and chair on Friday morning at 8am in Session 4.07 titled Preparation/Certification and Evaluation of Leaders/Teachers.


Attendees of this spring’s Society for Research on Effectiveness (SREE) Conference, held in Washington, DC March 5-7, will have the opportunity to discuss instructional strategies and programs to improve mathematics with Empirical Education’s Chief Scientist Andrew P. Jaciw. The presentation, Assessing Impacts of Math in Focus, a ‘Singapore Math’ Program for American Schools, will take place on Friday, March 6 at 1pm in the Park Hyatt Hotel, Ballroom Level Gallery 3.


This year’s 70th annual conference for ASCD will take place in Houston, TX on March 21-23. We invite you to schedule a meeting with CEO Denis Newman while he’s there.


We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in Chicago, Illinois from April 16-20, 2015. Our presentations will cover research under the Division H (Research, Evaluation, and Assessment in Schools) Section 2 symposium: Program Evaluation in Schools.

  1. Formative Evaluation on the Process of Scaling Up Reading Apprenticeship Authors: Jenna Lynn Zacamy, Megan Toby, Andrew P. Jaciw, and Denis Newman
  2. The Evaluation of Internet-based Reading Apprenticeship Improving Science Education (iRAISE) Authors: Megan Toby, Jenna Lynn Zacamy, Andrew P. Jaciw, and Denis Newman

We look forward to seeing you at our sessions to discuss our research. As soon as we have the schedule for these presentations, we will post them here. As has become tradition, we plan to host yet another of our popular AERA receptions. Details about the reception will follow in the months to come.


Getting Different Results from the Same Program in Different Contexts

The spring 2014 conference of the Society for Research in Educational Effectiveness (SREE) gave us much food for thought concerning the role of replication of experimental results in social science research. If two research teams get the same result from experiments on the same program, that gives us confidence that the original result was not a fluke or somehow biased.

But in his keynote, John Ioannidis of Stanford showed that even in medical research, where the context can be more tightly controlled, replication very often fails—researchers get different results. The original finding may have been biased, for example, through the tendency to suppress null findings where no positive effect was found and over-report large, but potentially spurious results. Replication of a result over the long run helps us to get past the biases. Though not as glamorous as discovery, replication is fundamental to science, and educational science is no exception.

In the course of the conference, I was reminded that the challenge to conducting replication work is, in a sense, compounded in social science research. “Effect heterogeneity”—finding different results in different contexts—is common for many legitimate reasons. For instance, experimental controls seldom get placebos. They receive the program already in place, often referred to as “business as usual,” and this can vary across experiments of the same intervention and contribute to different results. Also, experiments of the same program carried out in different contexts are likely to be adapted given demands or affordances of the situation, and flexible implementation may lead to different results. The challenge is to disentangle differences in effects that give insight into how programs are adapted in response to conditions, from bias in results that John Ioannidis considered. In other fields (e.g., the “hard sciences”), less context dependency and more-robust effects may make it easier to diagnose when variation in findings is illegitimate. In education, this may be more challenging and reminds me why educational research is in many ways the ‘hardest science’ of all, as David Berliner has emphasized in the past.

Once separated from distortions of bias and properly differentiated from the usual kind of “noise” or random error, differences in effects can actually be leveraged to better understand how and for whom programs work. Building systematic differences in conditions into our research designs can be revealing. Such efforts should, however, be considered with the role of replication in mind—an approach to research that purposively builds in heterogeneity, in a sense, seeks to find where impacts don’t replicate, but for good reason. Non-reproducibility in this case is not haphazard, it is purposive.

What are some approaches to leveraging and understanding effect heterogeneity? We envision randomized trials where heterogeneity is built into the design by comparing different versions of a program or implementing in diverse settings across which program effects are hypothesized to vary. A planning phase of an RCT would allow discussions with experts and stakeholders about potential drivers of heterogeneity. Pertinent questions to address during this period include: what are the attributes of participants and settings across which we expect effects to vary and why? Under which conditions and how do we expect program implementation to change? Hypothesizing which factors will moderate effects before the experiment is conducted would add credibility to results if they corroborate the theory. A thoughtful approach of this sort can be contrasted with the usual approach whereby differential effects of program are explored as afterthoughts, with the results carrying little weight.

Building in conditions for understanding effect heterogeneity will have implications for experimental design. Increasing variation in outcomes affects statistical power and the sensitivity of designs to detect effects. We will need a better understanding of the parameters affecting precision of estimates. At Empirical, we have started using results from several of our experiments to explore parameters affecting sensitivity of tests for detecting differential impact. For example, we have been documenting the variation across schools in differences in performance depending on student characteristics such as individual SES, gender, and LEP status. This variation determines how precisely we are able to estimate the average difference between student subgroups in the impact of a program.

Some may feel that introducing heterogeneity to better understand conditions for observing program effects is going down a slippery slope. Their thinking is that it is better to focus on program impacts averaged across the study population and to replicate those effects across conditions; and that building sources of variation into the design may lead to loose interpretations and loss of rigor in design and analysis. We appreciate the cautionary element of this position. However, we believe that a systematic study of how a program interacts with conditions can be done in a disciplined way without giving up the usual strategies for ensuring the validity of results.

We are excited about the possibility that education research is entering a period of disciplined scientific inquiry to better understand how differences in students, contexts, and programs interact, with the hope that the resulting work will lead to greater opportunity and better fit of program solutions to individuals.