blog posts and news stories

Education Innovator Named CEO of Evidentally

Trisha Callella to Drive Growth for Startup Efficacy Analytics Company

Palo Alto, CA (May 30, 2019)Evidentally, Inc., a San Francisco Bay Area start-up that is giving education research a major makeover, named education innovator Trisha Callella as its CEO. Callella, a dynamic leader with a history of success in education technology, will drive growth for Evidentally, which provides cost-effective analyses that meet ESSA requirements and reports that pinpoint where edtech products are used what impact they’re having in schools, and which students and teachers benefit most.

“Trisha brings the know-how, energy, and passion to scale Evidentally to become the leading digital factory for the production of evidence,” said Denis Newman, Co-founder and Board Chair of Evidentally and CEO of Empirical Education Inc., which he founded in 2003. “Her first-hand knowledge of the edtech market and commitment to science was essential. Her drive to making a positive difference aligns with our goal of helping edtech companies improve their products so schools who need it most will benefit.”

An Apple Distinguished Educator for six years, Callella has a doctorate in Educational Psychology and Learning Sciences from University of Southern California. She is the Director of Innovation and Instruction at an urban, public TK-12 school district in California and was previously Vice President of Code to the Future (a startup for computer science immersion).

She is a speaker and program developer at numerous conferences and national organizations including ISTE, SOLVE at MIT, TEDx, and SXSW EDU where she is on the advisory board, and has taught and mentored educators since 2005. Her publications include a textbook on algebraic thinking readiness, 47 K-8 teacher resource books, and six children’s books.

“Evidentally’s subscription based online product captured my interest because of its truly innovative approach and proprietary technology,” said Dr. Callella. “I’m very enthusiastic about how the efficacy analytics can help the edtech companies and schools work together in making better products.”

Evidentally is being honored as a participant in the software industry’s Innovation Showcase at SIIA’s Ed Tech Industry Network Conference, June 10 in San Francisco. The company was incubated within Empirical Education and draws on its expertise in US Department of Education policy for rigorous research. It focuses on a collaborative process with edtech developers and school practitioners to help companies improve their products and how they are implemented.

“With her strong background in education, knowledge of research, and entrepreneurial spirit, Trisha Callella is well positioned to take on the senior leadership role at Evidentally,” said Betsy Corcoran, CEO of EdSurge, who served as matchmaker. “She is bright and passionate about education and technology and has a vision of future technologies that will make a positive difference.”

About Evidentally

Evidentally, Inc. was recently formed to help edtech providers benefit from product efficacy studies. Its mission is to improve the efficacy of edtech products while providing K-12 educators with evidence about what will work in their schools. With its headquarters being established at GSVlabs in San Mateo, California, the company builds on the research experience, industry savvy, and passion of Empirical Education Inc., which is incubating the new company by providing infrastructure, expertise, and staffing.

2019-05-29

Presenting at AERA 2018

We will again be presenting at the annual meeting of the American Educational Research Association (AERA). Join the Empirical Education team in New York City from April 13-17, 2018.

Research presentations will include the following.

For Quasi-Experiments on EdTech Products, What Counts as Being Treated?
Authors: Val Lazarev, Denis Newman, & Malvika Bhagwat
In Roundtable Session: Examining the Impact of Accountability Systems on Both Teachers and Students
Friday, April 13 - 2:15 to 3:45pm
New York Marriott Marquis, Fifth Floor, Westside Ballroom Salon 3

Abstract: Edtech products are becoming increasingly prevalent in K-12 schools and the needs of schools to evaluate their value for students calls for a program of rigorous research, at least at the level 2 of the ESSA standards for evidence. This paper draws on our experience conducting a large scale quasi-experiment in California schools. The nature of the product’s wide-ranging intensity of implementation presented a challenge in identifying schools that had used the product adequately enough to be considered part of the treatment group.


Planning Impact Evaluations Over the Long Term: The Art of Anticipating and Adapting
Authors: Andrew P Jaciw & Thanh Thi Nguyen
In Session: The Challenges and Successes of Conducting Large-Scale Educational Research
Saturday, April 14 - 2:15 to 3:45pm
Sheraton New York Times Square, Second Floor, Central Park East Room

Abstract: Perspective. It is good practice to identify core research questions and important elements of study designs a-priori, to prevent post-hoc “fishing” exercises and reduce the role of drawing false-positive conclusions [16,19]. However, programs in education, and evaluations of them, evolve [6] making it difficult to follow a charted course. For example, in the lifetime of a program and its evaluation, new curricular content or evidence standards for evaluations may be introduced and thus drive changes in program implementation and evaluation.

Objectives. This work presents three cases from program impact evaluations conducted through the Department of Education. In each case, unanticipated results or changes in study context had significant consequences for program recipients, developers and evaluators. We discuss responses, either enacted or envisioned, for addressing these challenges. The work is intended to serve as a practical guide for researchers and evaluators who encounter similar issues.

Methods/Data Sources/Results. The first case concerns the problem of outcome measures keeping pace with evolving content standards. For example, in assessing impacts of science programs, program developers and evaluators are challenged to find assessments that align with Next Generation Science Standards (NGSS). Existing NGSS-aligned assessments are largely untested or in development, resulting in the evaluator having to find, adapt or develop instruments with strong reliability, and construct and face validity – ones that will be accepted by independent review and not considered over-aligned to the interventions. We describe a hands-on approach to working with a state testing agency to develop forms to assess impacts on science generally, and on constructs more-specifically aligned to the program evaluated. The second case concerns the problem of reprioritizing research questions mid-study. As noted above, researchers often identify primary (confirmatory) research questions at the outset of a study. Such questions are held to high evidence standards, and are differentiated from exploratory questions, which often originate after examining the data, and must be replicated to be considered reliable [16]. However, sometimes, exploratory analyses produce unanticipated results that may be highly consequential. The evaluator must grapple with the dilemma of whether to re-prioritize the result, or attempt to proceed with replication. We discuss this issue with reference to an RCT in which the dilemma arose. The third addresses the problem of designing and implementing a study that meets one set of evidence standards, when the results will be reviewed according to a later version of those standards. A practical question is what to do when this happens and consequently the study falls under a lower tier of the new evidence standard. With reference to an actual case, we consider several response options, including assessing the consequence of this reclassification for future funding of the program, and augmenting the research design to satisfy the new standards of evidence.

Significance. Responding to demands of changing contexts, programs in the social sciences are moving targets. They demand a flexible but well-reasoned and justified approach to evaluation. This session provides practical examples and is intended to promote discussion for generating solutions to challenges of this kind.


Indicators of Successful Teacher Recruitment and Retention in Oklahoma Rural Schools
Authors: Val Lazarev, Megan Toby, Jenna Lynn Zacamy, Denis Newman, & Li Lin
In Session: Teacher Effectiveness, Retention, and Coaching
Saturday, April 14 - 4:05 to 6:05pm
New York Marriott Marquis, Fifth Floor, Booth

Abstract: The purpose of this study was to identify factors associated with successful recruitment and retention of teachers in Oklahoma rural school districts, in order to highlight potential strategies to address Oklahoma’s teaching shortage. The study was designed to identify teacher-level, district-level, and community characteristics that predict which teachers are most likely to be successfully recruited and retained. A key finding is that for teachers in rural schools, total compensation and increased responsibilities in job assignment are positively associated with successful recruitment and retention. Evidence provided by this study can be used to inform incentive schemes to help retain certain groups of teachers and increase retention rates overall.


Teacher Evaluation Rubric Properties and Associations with School Characteristics: Evidence from the Texas Evaluation System
Authors: Val Lazarev, Thanh Thi Nguyen, Denis Newman, Jenna Lynn Zacamy, Li Lin
In Session: Teacher Evaluation Under the Microscope
Tuesday, April 17 - 12:25 to 1:55pm
New York Marriott Marquis, Seventh Floor, Astor Ballroom

Abstract: A 2009 seminal report, The Widget Effect, alerted the nation to the tendency of traditional teacher evaluation systems to treat teachers like widgets, undifferentiated in their level of effectiveness. Since then, a growing body of research, coupled with new federal initiatives, has catalyzed the reform of such systems. In 2014-15, Texas piloted its reformed evaluation system, collecting classroom observation rubric ratings from over 8000 teachers across 51 school districts. This study analyzed that large dataset and found that 26.5 percent, compared to 2 percent under previous measures, of teachers were rated below proficient. The study also found a promising indication of low bias in the rubric ratings stemming from school characteristics, given that they were minimally associated with observation ratings.

We look forward to seeing you at our sessions to discuss our research. We’re also co-hosting a cocktail reception with Division H! If you’d like an invite, let us know.

2018-03-06

How Efficacy Studies Can Help Decision-makers Decide if a Product is Likely to Work in Their Schools

We and our colleagues have been working on translating the results of rigorous studies of the impact of educational products, programs, and policies for people in school districts who are making the decisions whether to purchase or even just try out—pilot—the product. We are influenced by Stanford University Methodologist Lee Cronbach, especially his seminal book (1982) and article (1975) where he concludes “When we give proper weight to local conditions, any generalization is a working hypothesis, not a conclusion…positive results obtained with a new procedure for early education in one community warrant another community trying it. But instead of trusting that those results generalize, the next community needs its own local evaluation” (p. 125). In other words, we consider even the best designed experiment to be like a case study, as much about the local and moderating role of context, as about the treatment when interpreting the causal effect of the program.

Following the focus on context, we can consider characteristics of the people and of the institution where the experiment was conducted to be co-causes of the result that deserve full attention—even though, technically, only the treatment, which was randomly assigned was controlled. Here we argue that any generalization from a rigorous study, where the question is whether the product is likely to be worth trying in a new district, must consider the full context of the study.

Technically, in the language of evaluation research, these differences in who or where the product or “treatment” works are called “interaction effects” between the treatment and the characteristic of interest (e.g., subgroups of students by demographic category or achievement level, teachers with different skills, or bandwidth available in the building). The characteristic of interest can be called a “moderator”, since it changes, or moderates, the impact of the treatment. An interaction reveals if there is differential impact and whether a group with a particular characteristic is advantaged, disadvantaged, or unaffected by the product.

The rules set out by The Department of Education’s What Works Clearinghouse (WWC) focus on the validity of the experimental conclusion: Did the program work on average compared to a control group? Whether it works better for poor kids than for middle class kids, works better for uncertified teachers versus veteran teachers, increases or closes a gap between English learners and those who are proficient, are not part of the information provided in their reviews. But these differences are exactly what buyers need in order to understand whether the product is a good candidate for a population like theirs. If a program works substantially better for English proficient students than for English learners, and the purchasing school has largely the latter type of student, it is important that the school administrator know the context for the research and the result.

The accuracy of an experimental finding depends on it not being moderated by conditions. This is recognized with recent methods of generalization (Tipton, 2013) that essentially apply non-experimental adjustments to experimental results to make them more accurate and more relevant to specific local contexts.

Work by Jaciw (2016a, 2016b) takes this one step further.

First, he confirms the result that if the impact of the program is moderated, and if moderators are distributed differently between sites, then an experimental result from one site will yield a biased inference for another site. This would be the case, for example, if the impact of a program depends on individual socioeconomic status, and there is a difference between the study and inference sites in the proportion of individuals with low socioeconomic status. Conditions for this “external validity bias” are well understood, but the consequences are addressed much less often than the usual selection bias. Experiments can yield accurate results about the efficacy of a program for the sample studied, but that average may not apply either to a subgroup within the sample or to a population outside the study.

Second, he uses results from a multisite trial to show empirically that there is potential for significant bias when inferring experimental results from one subset of sites to other inference sites within the study; however, moderators can account for much of the variation in impact across sites. Average impact findings from experiments provide a summary of whether a program works, but leaves the consumer guessing about the boundary conditions for that effect—the limits beyond which the average effect ceases to apply. Cronbach was highly aware of this, titling a chapter in his 1982 book “The Limited Reach of Internal Validity”. Using terms like “unbiased” to describe impact findings from experiments is correct in a technical sense (i.e., the point estimate, on hypothetical repeated sampling, is centered on the true average effect for the sample studied), but it can impart an incorrect sense of the external validity of the result: that it applies beyond the instance of the study.

Implications of the work cited, are, first, that it is possible to unpack marginal impact estimates through subgroup and moderator analyses to arrive at more-accurate inferences for individuals. Second, that we should do so—why obscure differences by paying attention to only the grand mean impact estimate for the sample? And third, that we should be planful in deciding which subgroups to assess impacts for in the context of individual experiments.

Local decision-makers’ primary concern should be with whether a program will work with their specific population, and to ask for causal evidence that considers local conditions through the moderating role of student, teacher, and school attributes. Looking at finer differences in impact may elicit criticism that it introduces another type of uncertainty—specifically from random sampling error—which may be minimal with gross impacts and large samples, but influential when looking at differences in impact with more and smaller samples. This is a fair criticism, but differential effects may be less susceptible to random perturbations (low power) than assumed, especially if subgroups are identified at individual levels in the context of cluster randomized trials (e.g., individual student-level SES, as opposed to school average SES) (Bloom, 2005; Jaciw, Lin, & Ma, 2016).

References:
Bloom, H. S. (2005). Randomizing groups to evaluate place-based programs. In H. S. Bloom (Ed.), Learning more from social experiments. New York: Russell Sage Foundation.

Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 116-127.

Cronbach, L. J. (1982). Designing evaluations of educational and social programs. San Francisco, CA: Jossey-Bass.

Jaciw, A. P. (2016). Applications of a within-study comparison approach for evaluating bias in generalized causal inferences from comparison group studies. Evaluation Review, (40)3, 241-276. Retrieved from http://erx.sagepub.com/content/40/3/241.abstract

Jaciw, A. P. (2016). Assessing the accuracy of generalized inferences from comparison group studies using a within-study comparison approach: The methodology. Evaluation Review, (40)3, 199-240. Retrieved from http://erx.sagepub.com/content/40/3/199.abstract

Jaciw, A., Lin, L., & Ma, B. (2016). An empirical study of design parameters for assessing differential impacts for students in group randomized trials. Evaluation Review. Retrieved from http://erx.sagepub.com/content/early/2016/10/14/0193841X16659600.abstract

Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38, 239-266.

2018-01-16

ETIN Releases Guidelines for Research on Educational Technologies in K-12 Schools

The press release (below) was originally published on the SIIA website. It has since inspired stories in the Huffington Post, edscoop, EdWeek’s Market Brief, and the EdSurge newsletter.



ETIN Releases Guidelines for Research on Educational Technologies in K-12 Schools

Changes in education technology and policy spur updated approach to industry research

Washington, DC (July 25, 2017)The Education Technology Industry Network, a division of The Software & Information Industry Association, released an important new report today: “Guidelines for Conducting and Reporting EdTech Impact Research in U.S. K-12 Schools.” Authored by Dr. Denis Newman and the research team at Empirical Education Inc., the Guidelines provide 16 best practice standards of research for publishers and developers of educational technologies.

The Guidelines are a response to the changing research methods and policies driven by the accelerating pace of development and passage of the Every Student Succeeds Act (ESSA), which has challenged the static notion of evidence defined in NCLB. Recognizing the need for consensus among edtech providers, customers in the K-12 school market, and policy makers at all levels, SIIA is making these Guidelines freely available.

“SIIA members recognize that changes in technology and policy have made evidence of impact an increasingly critical differentiator in the marketplace,” said Bridget Foster, senior VP and managing director of ETIN. “The Guidelines show how research can be conducted and reported within a short timeframe and still contribute to continuous product improvement.”

“The Guidelines for research on edtech products is consistent with our approach to efficacy: that evidence of impact can lead to product improvement,” said Amar Kumar, senior vice president of Efficacy & Research at Pearson. “We appreciate ETIN’s leadership and Empirical Education’s efforts in putting together this clear presentation of how to use rigorous and relevant research to drive growth in the market.”

The Guidelines draw on over a decade of experience in conducting research in the context of the U.S. Department of Education’s Institute of Education Sciences, and its Investing in Innovation program.

“The current technology and policy environment provides an opportunity to transform how research is done,” said Dr. Newman, CEO of Empirical Education Inc. and lead author of the Guidelines. “Our goal in developing the new guidelines was to clarify current requirements in a way that will help edtech companies provide school districts with the evidence they need to consistently quantify the value of software tools. My thanks go to SIIA and the highly esteemed panel of reviewers whose contribution helped us provide the roadmap for the change that is needed.”

“In light of the ESSA evidence standards and the larger movement toward evidence-based reform, publishers and software developers are increasingly being called upon to show evidence that their products make a difference with children,” said Guidelines peer reviewer Dr. Robert Slavin, director of the Center for Research and Reform in Education, Johns Hopkins University. “The ETIN Guidelines provide practical, sensible guidance to those who are ready to meet these demands.”

ETIN’s goal is to improve the market for edtech products by advocating for greater transparency in reporting research findings. For that reason, it is actively working with government, policy organizations, foundations, and universities to gain the needed consensus for change.

“As digital instructional materials flood the market place, state and local leaders need access to evidence-based research regarding the effectiveness of products and services. This guide is a great step in supporting both the public and private sector to help ensure students and teachers have access to the most effective resources for learning,” stated Christine Fox, Deputy Executive Director, SETDA. The Guidelines can be downloaded here: https://www.empiricaleducation.com/research-guidelines.

2017-07-25

SIIA ETIN EIS Conference Presentations 2017


We are playing a major role in the Education Impact Symposium (EIS), organized by the Education Technology Industry Network (ETIN), a division of The Software & Information Industry Association (SIIA).

  1. ETIN is releasing a set of edtech research guidelines that CEO Denis Newman wrote this year
  2. Denis is speaking on 2 panels this year

The edtech research guidelines that Denis authored and ETIN is releasing on Tuesday, July 25 are called “Guidelines for Conducting and Reporting EdTech Impact Research in U.S. K-12 Schools” and can be downloaded from this webpage. The Guidelines are a much-needed response to a rapidly-changing environment of cloud-based technology and important policy changes brought about by the Every Student Succeeds Act (ESSA).

The panels Denis will be presenting on are both on Tuesday, July 25, 2017.

12:30 - 1:15pm
ETIN’s New Guidelines for Product Research in the ESSA Era
With the recent release of ETIN’s updated Guidelines for EdTech Impact Research, developers and publishers can ride the wave of change from NCLB’s sluggish concept of “scientifically-based” to ESSA’s dynamic view of “evidence” for continuous improvement. The Guidelines are being made publicly available at the Symposium, with a discussion and Q&A led by the lead author and some of the contributing reviewers.
Moderator:
Myron Cizdyn, Chief Executive Officer, The BLPS Group
Panelists:
Malvika Bhagwat, Research & Efficacy, Newsela
Amar Kumar, Sr. Vice President, Pearson
Denis Newman, CEO, Empirical Education Inc.
John Richards, President, Consulting Services for Education

2:30 - 3:30pm
The Many Faces of Impact
Key stakeholders in the EdTech Community will each review in Ted Talk style, what they are doing to increase impact of digital products, programs and services. Our line-up of presenters include:
- K-12 and HE content providers using impact data to better understand their customers improve their products, and support their marketing and sales teams
- an investor seeking impact on both disadvantaged populations and their financial return in order to make funding decisions for portfolio companies
- an education organization helping institutions decide what research is useful to them and how to grapple with new ESSA requirements
- a researcher working with product developers to produce evidence of the impact of their digital products

After the set of presenters have finished, we’ll have time for your questions on these multidimensional aspects of IMPACT and how technology can help.
Moderator:
Karen Billings, Principal, BillingsConnects
Panelists:
Jennifer Carolan, General Partner, Reach Capital
Christopher Cummings, VP, Institutional Product and Solution Design, Cengage
Melissa Greene, Director, Strategic Partnerships, SETDA
Denis Newman, CEO, Empirical Education Inc.
Kari Stubbs, PhD, Vice President, Learning & Innovation, BrainPOP

Jennifer Carolan, Denis Newman, and Chris Cummings on a panel at ETIN EIS

If you get a chance to check out the Guidelines before EIS, Denis would love to hear your thoughts about them at the conference.

2017-07-21
Archive