blog posts and news stories

Which Came First: The Journal or the Conference?

You may have heard of APPAM, but do you really know what they do? They organize an annual conference? They publish a journal? Yes, they do all that and more!

APPAM stands for the Association for Public Policy Analysis and Management. APPAM is dedicated to improving public policy and management by fostering excellence in research, analysis, and education. The first APPAM Fall Research Conference occurred in 1979 in Chicago. The first issue of the Journal of Policy Analysis and Management appeared in 1981.

Why are we talking about APPAM now? While we’ve attended the APPAM conference multiple years in the past, the upcoming conference poses a unique opportunity for us. This year, our chief scientist, Andrew Jaciw, is acting as guest editor of a special issue of Evaluation Review on multi-armed randomized experiments. As part of this effort, to encourage discussion of the topic, he proposed three panels that were accepted at APPAM.

Andrew will chair the first panel titled Information Benefits and Statistical Challenges of Complex Multi-Armed Trials: Innovative Designs for Nuanced Questions.

In the second panel, Andrew will be presenting a paper that he co-wrote with Senior Research Manager Thanh Nguyen titled Using Multi-Armed Experiments to Test “Improvement Versions” of Programs: When Beneficence Matters. This presentation will take place on Friday, November 9, 2018 at 9:30am (in Marriott Wardman Park, Marriott Balcony B - Mezz Level).

In the third panel he submitted, Larry Orr, Joe Newhouse, and Judith Gueron (with Becca Maynard as discussant) should provide an important retrospective. As pioneers of social science experiments, the contributors will share experiences and important lessons learned.

Some of these panelists will also be submitting their papers to the special edition of the Evaluation Review. We will update this blog with a link to that journal issue once it has been published.

2018-08-21

How Are Edtech Companies Thinking About Data and Research?

Forces of the rebellion were actively at work at SIIA’s Annual Conference last week in San Francisco. Snippets of conversation revealed a common theme of harnessing and leveraging data in order to better understand and serve the needs of schools and districts.

This theme was explored in depth during one panel session, “Efficacy and Research: Why It Matters So Much in the Education Market”, where edtech executives discussed the phases and roles of research as it relates to product improvement and marketing. Moderated by Pearson’s Gary Mainor, session panelists included Andrew Coulson of the MIND Research Institute, Kelli Hill of Kahn Academy, and Shawn Mahoney of McGraw Hill Education.

Coulson, who was one of the contributing reviewers of our Research Guidelines, stated that all signs are pointing to an “exponential increase” of school district customers asking for usage data. He advised fellow edtech entrepreneurs to start paying attention to fine-grained usage data, as it is becoming necessary to provide this for customers. Panelist Kelli Hill agreed with the importance of making data visible, adding that Kahn Academy proactively provides users with monthly usage reports.

In addition to providing helpful advice for edtech sales and marketing teams, the session also addressed a pervasive misconception that that all it takes is “one good study” to validate and prove the effectiveness of a program. A company could commission one rigorous randomized trial reporting positive results and obtaining endorsement from the What Works Clearinghouse, but that study might be outdated, and more importantly, not relevant to what schools and districts are looking for. Panelist Shawn Mahoney, Chief Academic Officer of McGraw-Hill Education, affirmed that school districts are interested in “super contextualized research” and look for recent and multiple studies when evaluating a product. Q&A discussions with the panelists revealed that school decision makers are quick to claim “what works for someone else might not work for us”, supporting the notion that the conduct of multiple research studies, reporting effects for various subgroups and populations of students, is much more useful and reflective of district needs.

SIIA’s gathering proved to be a fruitful event, allowing us to reconnect with old colleagues and meet new ones, and leaving us with a number of useful insights and optimistic possibilities for new directions in research.

2018-06-22

AERA 2018 Recap: The Possibilities and Necessity of a Rigorous Education Research Community

This year’s AERA annual meeting on “The Dreams, Possibilities, and Necessity of Public Education,” was fittingly held in the city with the largest number of public school students in the country—New York. Against this radically diverse backdrop, presenters were encouraged to diversify both the format and topics of presentations in order to inspire thinking and “confront the struggles for public education.”

AERA’s sheer size may risk overwhelming its attendees, but in other ways, it came as a relief. At a time when educators and education remain under-resourced, it was heartening to be reminded that a large, vibrant community of dedicated and intelligent people exists to improve educational opportunities for all students.

One theme that particularly stood out is that researchers are finding increasingly creative ways to use existing usage data from education technology products to measure impact and implementation. This is a good thing when it comes to reducing the cost of research and making it more accessible to smaller businesses and nonprofits. For example, in a presentation on a software-based knowledge competition for nursing students, researchers used usage data to identify components of player styles and determine whether these styles had a significant effect on student performance. In our Edtech Research Guidelines, Empirical similarly recommends that edtech companies take advantage of their existing usage data to run impact and implementation analyses, without using more expensive data collection methods. This can help significantly reduce the cost of research studies—rather than one study that costs $3 million, companies can consider multiple lower-cost studies that leverage usage data and give the company a picture of how the product performs in a greater diversity of contexts.

Empirical staff themselves presented on a variety of topics, including quasi-experiments on edtech products; teacher recruitment, evaluation, and retention; and long-term impact evaluations. In all cases, Empirical reinforced its commitment to innovative, low-cost, and rigorous research. You can read more about the research projects we presented in our previous AERA post.

photo of Denis Newman presenting at AERA 2018

Finally, Empirical was delighted to co-host the Division H AERA Reception at the Supernova bar at Novotel Hotel. If you ever wondered if Empirical knows how to throw a party, wonder no more! A few pictures from the event are below. View all of the pictures from our event on facebook!


We had a great time and look forward to seeing everyone at the next AERA annual meeting!

2018-05-03

Where's Denis?

It’s been a busy month for Empirical CEO Denis Newman, who’s been in absentia at our Palo Alto office as he jet-sets around the country to spread the good word of rigorous evidence in education research.

His first stop was Washington, DC and the conference of the Society for Research on Educational Effectiveness (SREE). This was an opportunity to get together with collaborators, as well as plot proposal writing, blog postings, webinars, and revisions to our research guidelines for edtech impact studies. Andrew Jaciw, Empirical’s Chief Scientist, kept up the company’s methodological reputation with a paper presentation on “Leveraging Fidelity Data to Make Sense of Impact Results.” For Denis, a highlight was dinner with Peg Griffin, a longtime friend and his co-author on The Construction Zone. Then it was on to Austin, TX, for a very different kind of meeting—more of a festival, really.

At this year’s SXSWEDU, Denis was one of three speakers on the panel, “Can Evidence Even Keep Up with Edtech?” The problem presented by the panel was that edtech, as a rapidly moving field, seems to be outpacing the rate of research that stakeholders may want to use to evaluate these products. How, then, could education stakeholders make informed decisions about whether to use edtech products?

According to Denis, the most important thing is for a district to have enough information to know whether a given edtech product may or may not work for that district’s unique population and context. Therefore, researchers may need to adapt their methods both to be able to differentiate a product’s impact between subgroups, as well as to meet the faster timelines of edtech product development. Empirical’s own solution to this quandry, Evidence as a ServiceTM, offers quick-turnaround research reports that can examine impact and outcomes for specific student subgroups, with methodology that is flexible but rigorous enough to meet ESSA standards.

Denis praised the panel, stating, “In the festival’s spirit of invention, our moderator, Mitch Weisberg, masterfully engaged the audience from the beginning to pose the questions for the panel. Great questions, too. I got to cover all of my prepared talking points!”

You can read more coverage of our SXSWEDU panel on EdSurge.

After the panel, a string of meetings and parties kept the energy high and continued to show the growing interest in efficacy. The ISTE meetup was particularly important following this theme. The concern raised by the ISTE leadership and its members—which are school-based technology users—was that traditional research doesn’t tell the practitioners whether a product is likely to work in their school, given its resources and student demographics. Users are faced with hundreds of choices in any product category and have little information for narrowing down the choice to a few that are worth piloting.

Following SXSWEDU, it was back to DC for the Consortium for School Networking (CoSN) conference. Denis participated in the annual Feedback Forum hosted by CoSN and the Software & Information Industry Association (SIIA), where SIIA—representing edtech developers—looked for feedback from the CIOs and other school district leaders. This year, SIIA was looking for feedback that would help the Empirical team improve the edtech research guidelines, which are sponsored by SIIA’s Education Technology Industry Network (ETIN). Linda Winter moderated and ran the session like a focus group, asking questions such as:

  • What data do you need from products to gauge engagement?
  • How can the relationship of engagement and achievement indicate that a product is working?
  • What is the role of pilots in measuring success?
  • And before a pilot decision is made, what do CoSN members need to know about edtech products to decide if they are likely to work?

The CoSN members were brutally honest, pointing out that as the leaders responsible for the infrastructure, they were concerned with implementability, bandwidth requirements, and standards such as single sign-on. Whether the software improved learning was secondary—if teachers couldn’t get the program to work, it hardly mattered how effective it may be in other districts.

Now, Denis is preparing for the rest of the spring conference season. Next stop will be New York City and the American Education Research Association (AERA) conference, which attracts over 20,000 researchers annually. The Empirical team will be presenting four studies, as well as co-hosting a cocktail reception with AERA’s school research division. Then, it’s back on the plane for ASU-GSV in San Diego.

For more information about Evidence as a Service, the edtech research guidelines, or to invite Denis to speak at your event, please email rmeans@empiricaleducation.com

2018-03-26

Presenting at AERA 2018

We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in New York City from April 13-17, 2018.

Research presentations will include the following.

For Quasi-Experiments on EdTech Products, What Counts as Being Treated?
Authors: Val Lazarev, Denis Newman, & Malvika Bhagwat
In Roundtable Session: Examining the Impact of Accountability Systems on Both Teachers and Students
Friday, April 13 - 2:15 to 3:45pm
New York Marriott Marquis, Fifth Floor, Westside Ballroom Salon 3

Abstract: Edtech products are becoming increasingly prevalent in K-12 schools and the needs of schools to evaluate their value for students calls for a program of rigorous research, at least at the level 2 of the ESSA standards for evidence. This paper draws on our experience conducting a large scale quasi-experiment in California schools. The nature of the product’s wide-ranging intensity of implementation presented a challenge in identifying schools that had used the product adequately enough to be considered part of the treatment group.


Planning Impact Evaluations Over the Long Term: The Art of Anticipating and Adapting
Authors: Andrew P Jaciw & Thanh Thi Nguyen
In Session: The Challenges and Successes of Conducting Large-Scale Educational Research
Saturday, April 14 - 2:15 to 3:45pm
Sheraton New York Times Square, Second Floor, Central Park East Room

Abstract: Perspective. It is good practice to identify core research questions and important elements of study designs a-priori, to prevent post-hoc “fishing” exercises and reduce the role of drawing false-positive conclusions [16,19]. However, programs in education, and evaluations of them, evolve [6] making it difficult to follow a charted course. For example, in the lifetime of a program and its evaluation, new curricular content or evidence standards for evaluations may be introduced and thus drive changes in program implementation and evaluation.

Objectives. This work presents three cases from program impact evaluations conducted through the Department of Education. In each case, unanticipated results or changes in study context had significant consequences for program recipients, developers and evaluators. We discuss responses, either enacted or envisioned, for addressing these challenges. The work is intended to serve as a practical guide for researchers and evaluators who encounter similar issues.

Methods/Data Sources/Results. The first case concerns the problem of outcome measures keeping pace with evolving content standards. For example, in assessing impacts of science programs, program developers and evaluators are challenged to find assessments that align with Next Generation Science Standards (NGSS). Existing NGSS-aligned assessments are largely untested or in development, resulting in the evaluator having to find, adapt or develop instruments with strong reliability, and construct and face validity – ones that will be accepted by independent review and not considered over-aligned to the interventions. We describe a hands-on approach to working with a state testing agency to develop forms to assess impacts on science generally, and on constructs more-specifically aligned to the program evaluated. The second case concerns the problem of reprioritizing research questions mid-study. As noted above, researchers often identify primary (confirmatory) research questions at the outset of a study. Such questions are held to high evidence standards, and are differentiated from exploratory questions, which often originate after examining the data, and must be replicated to be considered reliable [16]. However, sometimes, exploratory analyses produce unanticipated results that may be highly consequential. The evaluator must grapple with the dilemma of whether to re-prioritize the result, or attempt to proceed with replication. We discuss this issue with reference to an RCT in which the dilemma arose. The third addresses the problem of designing and implementing a study that meets one set of evidence standards, when the results will be reviewed according to a later version of those standards. A practical question is what to do when this happens and consequently the study falls under a lower tier of the new evidence standard. With reference to an actual case, we consider several response options, including assessing the consequence of this reclassification for future funding of the program, and augmenting the research design to satisfy the new standards of evidence.

Significance. Responding to demands of changing contexts, programs in the social sciences are moving targets. They demand a flexible but well-reasoned and justified approach to evaluation. This session provides practical examples and is intended to promote discussion for generating solutions to challenges of this kind.


Indicators of Successful Teacher Recruitment and Retention in Oklahoma Rural Schools
Authors: Val Lazarev, Megan Toby, Jenna Lynn Zacamy, Denis Newman, & Li Lin
In Session: Teacher Effectiveness, Retention, and Coaching
Saturday, April 14 - 4:05 to 6:05pm
New York Marriott Marquis, Fifth Floor, Booth

Abstract: The purpose of this study was to identify factors associated with successful recruitment and retention of teachers in Oklahoma rural school districts, in order to highlight potential strategies to address Oklahoma’s teaching shortage. The study was designed to identify teacher-level, district-level, and community characteristics that predict which teachers are most likely to be successfully recruited and retained. A key finding is that for teachers in rural schools, total compensation and increased responsibilities in job assignment are positively associated with successful recruitment and retention. Evidence provided by this study can be used to inform incentive schemes to help retain certain groups of teachers and increase retention rates overall.


Teacher Evaluation Rubric Properties and Associations with School Characteristics: Evidence from the Texas Evaluation System
Authors: Val Lazarev, Thanh Thi Nguyen, Denis Newman, Jenna Lynn Zacamy, Li Lin
In Session: Teacher Evaluation Under the Microscope
Tuesday, April 17 - 12:25 to 1:55pm
New York Marriott Marquis, Seventh Floor, Astor Ballroom

Abstract: A 2009 seminal report, The Widget Effect, alerted the nation to the tendency of traditional teacher evaluation systems to treat teachers like widgets, undifferentiated in their level of effectiveness. Since then, a growing body of research, coupled with new federal initiatives, has catalyzed the reform of such systems. In 2014-15, Texas piloted its reformed evaluation system, collecting classroom observation rubric ratings from over 8000 teachers across 51 school districts. This study analyzed that large dataset and found that 26.5 percent, compared to 2 percent under previous measures, of teachers were rated below proficient. The study also found a promising indication of low bias in the rubric ratings stemming from school characteristics, given that they were minimally associated with observation ratings.

We look forward to seeing you at our sessions to discuss our research. We’re also co-hosting a cocktail reception with Division H! If you’d like an invite, let us know.

2018-03-06

Spring 2018 Conference Season is Taking Shape


We’ll be on the road again this spring.

SREE

Andrew Jaciw and Denis Newman will be in Washington DC for the annual spring conference of the The Society for Research on Educational Effectiveness (SREE), the premier conference on rigorous research. Andrew Jaciw will present his paper: Leveraging Fidelity Data to Making Sense of Impact Results: Informing Practice through Research. His presentation will be a part of Session 2I: Research Methods - Post-Random Assignment Models: Fidelity, Attrition, Mediation & More from 8-10am on Thursday, March 1.

SXSW EDU

In March, Denis Newman will be attending SXSW EDU Conference & Festival in Austin, TX and presenting on a panel along with Malvika Bhagwat, Jason Palmer, and Karen Billings titled Can Evidence Even Keep Up with EdTech? This will address how researchers and companies can produce evidence that products work—in time for educators and administrators to make a knowledgeable buying decision under accelerating timelines.

AERA

Empirical staff will be presenting in 4 different sessions at the annual conference of the American Educational Research Association (AERA) in NYC in April, all under Division H (Research, Evaluation, and Assessment in Schools).

  1. For Quasi-experiments on Edtech Products, What Counts as Being Treated?
  2. Teacher evaluation rubric properties and associations with school characteristics: Evidence from the Texas evaluation system
  3. Indicators of Successful Teacher Recruitment and Retention in Oklahoma Rural Schools
  4. The Challenges and Successes of Conducting Large-scale Educational Research

In addition to these presentations, we are planning another of our celebrated receptions in NYC so stay tuned for details.

ISTE

A panel on our Research Guidelines has been accepted at this major convention, considered the epicenter of edtech with thousands of users and 100s of companies, held this year in Chicago from June 24–27.

2017-12-18

APPAM doesn’t stand for A Pretty Pithy Abbreviated Meeting

APPAM does stand for excellence, critical thinking, and quality research.

The 2017 fall research conference kept reminding me of one recurrent theme: bridging the chasms between researchers, policymakers, and practitioners.

photo of program

Linear processes don’t work. Participatory research is critical!

Another hot topic is generalizability! There is a lot of work to be done here. What works? For whom? Why?

photo of city

Lots of food for thought!

photo of cake

2017-11-06

National Forum to Advance Rural Education 2017


We are participating in 2 discussions at the National Forum to Advance Rural Education, organized by Battelle for Kids on Thursday, October 12, 2017.

THURSDAY, OCTOBER 12 | 1:15–2:15pm
Quality Teachers in Rural Schools: Lessons Learned in Oklahoma
Join a discussion with the Regional Educational Laboratory Southwest (REL Southwest) and practitioners in the Oklahoma Rural Schools Research Alliance about their research focused on two areas of high need in rural schools: teacher recruitment and retention, and professional development. This informal discussion with the researchers and Oklahoma practitioners will focus on how you can use the information from these studies in your own state and school district.
Presenters:
Pia Peltola (REL Southwest, American Institutes for Research)
Susan Pinson (Oklahoma State Department of Education)
Kathren Stehno (Office of Educational Quality & Accountability)
Megan Toby (Empirical Education)
Haidee Williams (REL Southwest, American Institutes for Research)

Rosa Ailbouni Room, Third Floor

THURSDAY, OCTOBER 12 | 2:30–3pm
Recruiting and Retaining Quality Teachers in Oklahoma
Learn about research conducted in partnership with the Regional Educational Laboratory Southwest (REL Southwest) and practitioners in the Oklahoma Rural Schools Research Alliance. The research identified teacher, district, and community characteristics that are predictors of successful teacher recruitment and retention in rural Oklahoma which can inform future policy and practice. Join the researchers and alliance members who guided the research and discover how you can use the information in your school district.
Presenters:
Kathren Stehno (Office of Educational Quality & Accountability)
Megan Toby (Empirical Education)
Haidee Williams (REL Southwest, American Institutes for Research)

Great Hall Meeting Room 2, First Floor


If you plan to attend the conference and would like to schedule a meeting with Senior Research Manager Megan Toby, send her an email.

2017-10-04

Academic Researchers Struggle with Research that is Too Expensive and Takes Too Long

I was in DC for an interesting meeting a couple weeks ago. The “EdTech Efficacy Research Academic Symposium” was very much an academic symposium.

The Jefferson Education Accelerator—out of the University of Virginia school of education—and Digital Promise—an organization that invents ways for school districts to make interesting use of edtech products and concepts—sponsored the get together. About 32% of the approximately 260 invited attendees were from universities or research organizations that conduct academic style research. About 16% represented funding or investment organizations and agencies, and another 20% were from companies that produce edtech (often being funded by the funders). 6% were school practitioners and, as would be expected at a DC event, about 26% were from associations and the media.

I represented a research organization with a lot of experience evaluating commercial edtech products. While in the midst of writing research guidelines for the software industry, i.e., the Software & Information Industry Association (SIIA), I felt a bit like an anthropologist among the predominantly academic crowd. I was listening to the language and trying to discern thinking patterns of professors and researchers, both federally- and foundation-funded. A fundamental belief is that real efficacy research is expensive (in the millions of dollars) and slow (a minimum of several years for a research report). A few voices said the cost could be lowered, especially for a school-district-initiated pilot, but the going rate—according to discussions at the meeting—for a simple study starts at $250,000. Given a recent estimate of 4,000 edtech products, (and assuming that new products and versions of existing products are being released at an accelerating rate), the annual cost of evaluating all edtech products would be around $1 billion—an amount unlikely to be supported in the current school funding climate.

Does efficacy research need to be that expensive and slow given the widespread data collection by schools, widely available datasets, and powerful computing capabilities? Academic research is expensive for several reasons. There is little incentive for research providers to lower costs. Federal agencies offer large contracts to attract the large research organizations with experience and high overhead rates. Other funders are willing to pay top dollar for the prestige of such organizations. University grant writers aim to support a whole scientific research program and need to support grad students and generally conduct unique studies that will be attractive to journals. In conventional practice, each study is a custom product. Automating repeatable processes is not part of the culture. Actually, there is an odd culture clash between the academic researchers and the edtech companies needing their services.

Empirical Education is now working with Reach Capital and their portfolio to develop an approach for edtech companies and their investors to get low-cost evidence of efficacy. We are also getting our recommendations down in the form of guidelines for edtech companies to get usable evidence. The document is expected to be released at SIIA’s Education Impact Symposium in July.

2017-05-30

Carnegie Summit 2017 Recap

If you’ve never been to Carnegie Summit, we highly recommend it.

This was our first year attending Carnegie Foundation’s annual conference in San Francisco, and we only wish we had checked it out sooner. Chief Scientist Andrew Jaciw attended on behalf of Empirical Education, and he took over our twitter account for the duration of the event. Below is a recap of his live tweeting, interspersed with additional thoughts too verbose for twitter’s strict character limitations.

Day 1


Curious about what I will learn. On my mind: Tony Bryk’s distinction between evidence-based practice and practice-based evidence. I am also thinking of how the approaches to be discussed connect to ideas of Lee Cronbach - he was very interested in timeliness and relevance of research findings and the limited reach of internal validity.

I enjoyed T. Bryk’s talk. These points resonated.


Improvement Science involves a hands-on approach to identifying systemic sources of predictable failure. This is appealing because it puts problem solving at the core, while realizing the context-specificity of what will actually work!

Day 2

Jared Bolte - Great talk! Improvement Science contrasts with traditional efficacy research by jumping right in to solve problems, instead of waiting. This raises an important question: What is the cost of delaying action to wait for efficacy findings? I am reminded of Lee Cronbach’s point: the half-life of empirical propositions is short!



This was an excellent session with Tony Bryk and John Easton. There were three important questions posed.



Day 3

Excited to Learn about PDSA cycles





2017-04-27
Archive