Empirical Education Inc. - education research

Meet Our Newest Researchers

The Empirical Research Team is pleased to announce the addition of 3 new team members. We welcome Rebecca Dowling, Lindsay Maurer, and Mayah Waltower as our newest researchers!

Rebecca Dowling, Research Manager

Rebecca (box 8 in the pet matching game) is taking on the role of project manager for two evaluations. One is the EVERFI WORD Force project, working with Mayah Waltower. The other is the How Are The Children project. Rebecca’s PhD in Applied Developmental Psychology with a specialization in educational contexts of development lends expertise to both of these projects. Her education is complemented by her experience managing evaluations before joining Empirical Education. Rebecca works out of her home office in Utah. Can you guess which pet works at home with her?

Lindsay Maurer, Research Assistant

Lindsay (box 6 in the pet matching game) assists Sze-Shun Lau with the CREATE project, a teacher residency program in Atlanta Public Schools invested in expanding equity in education by developing critically conscious, compassionate, and skilled educators. Lindsay’s experience as a research assistant studying educational excellence and equality at the University of California Davis is an asset to the CREATE project. Lindsay works out of her home office in San Francisco, CA. Can you guess which pet works at home with her?

Mayah Waltower, Research Assistant

Mayah (box 1 in the pet matching game) has taken on assisting Rebecca with the EVERFI WORD Force and the How Are The Children projects. Mayah also assists Sze-Shun Lau with the CREATE project, a teacher residency program in Atlanta Public Schools invested in expanding equity in education by developing critically conscious, compassionate, and skilled educators. Mayah works out of her home office in Atlanta, GA. Can you guess which pet works at home with her?

To get to know them better, we’d like to invite you to play our pet matching game. The goal of the game is to correctly match each new team member with their pet (yes, plants can be pets too). To submit your answers and see if you’re right, post your guesses to twitter and tag us @empiricaled.

2023-03-17

Posted by: Robin Means

Tags: create, education research, everfi and research

Rock Island-Milan School District, Connect with Kids, and Empirical Education Win an EIR Grant

The Rock Island-Milan School District #41 (RIMSD) in partnership with CWK Network, Inc. (Connect with Kids) and Empirical Education just announced that they were awarded an EIR grant to develop and evaluate a program called How Are The Children? (HATC). This project-based social emotional curriculum intends to foster students’ social emotional competence, increase student engagement, and ameliorate the long-term social emotional impacts of the COVID-19 pandemic.

We will be conducting an independent evaluation of the effectiveness of HATC through a randomized control trial and formative evaluation. Our findings will inform the improvement of the program, as well as to foster the expansion of the curriculum into other schools and districts.

For more details on this grant and the project, see the press announcement and our EIR grant proposal.

2023-03-14

Posted by: Robin Means

Tags: covid recovery, education research, eir, evaluation, grant, SEL and social and emotional learning

SREE 2022 Annual Meeting

When I read the theme of the 2022 SREE Conference, “Reckoning to Racial Justice: Centering Underserved Communities in Research on Educational Effectiveness”, I was eager to learn more about the important work happening in our community. The conference made it clear that SREE researchers are becoming increasingly aware of the need to swap individual-level variables for system-level variables that better characterize issues of systematic access and privilege. I was also excited that many SREE researchers are pulling from the fields of mixed methods and critical race theory to foster more equity-aligned study designs, such as those that center participant voice and elevate counter-narratives.

I’m excited to share a few highlights from each day of the conference.

Wednesday, September 21, 2022

Dr. Kamilah B. Legette, University of Denver

Dr. Kamilah B Legette masked and presenting at SREE

Dr. Kamilah B. Legette from the University of Denver discussed their research exploring the relationship between a student’s race and teacher perceptions of the student’s behavior as a) severe, b) inappropriate, and c) indicative of patterned behavior. In their study, 22 teachers were asked to read vignettes describing non-compliant student behaviors (e.g., disrupting storytime) where student identity was varied by using names that are stereotypically gendered and Black (e.g., Jazmine, Darnell) or White (e.g., Katie, Cody).

Multilevel modeling revealed that while student race did not predict teacher perceptions of behavior as severe, inappropriate, or patterned, students’ race was a moderator of the strength of the relationship between teachers’ emotions and perceptions of severe and patterned behavior. Specifically, the relationship between feelings of frustration and severe behavior was stronger for Black children than for White children, and the relationship between feelings of anger and patterned behavior showed the same pattern. Dr. Legette’s work highlighted a need for teachers to engage in reflective practices to unpack these biases.

Dr. Johari Harris, University of Virginia

In the same session, Dr. Johari Harris from the University of Virginia shared their work with the Children’s Defense Fund Freedom Schools. Learning for All (LFA), one Freedom School for students in grades 3-5, offers a five-week virtual summer literacy program with a culturally responsive curriculum based on developmental science. The program aims to create humanizing spaces that (re)define and (re)affirm Black students’ racial-ethnic identities, while also increasing students’ literacy skills, motivation, and engagement.

Dr. Harris’s mixed methods research found that students felt LFA promoted equity and inclusion, and reported greater participation, relevance, and enjoyment within LFA compared to in-person learning environments prior to COVID-19. They also felt their teachers were culturally engaging, and reported a greater sense of belonging, desire to learn, and enjoyment.

While it’s often assumed that young children of color are not fully aware of their racial-ethnic identity or how it is situated within a White supremacist society, Dr. Harris’s work demonstrated the importance of offering culturally affirming spaces to upper-elementary aged students.

Thursday, September 22, 2022

Dr. Krystal Thomas, SRI

Dr. Krystal Thomas presenting at SREE

On Thursday, I attended a talk by Dr. Krystal Thomas from SRI International about the potential of open education resource (OER) programming to further culturally responsive and sustaining practices (CRSP). Their team developed a rubric to analyze OER programming, including materials and professional development (PD) opportunities. The rubric combined principles of OER (free and open access to materials, student-generated knowledge) and CRSP (critical consciousness, student agency, student ownership, inclusive content, classroom culture, and high academic standards).

Findings suggest that while OER offers access to quality instructional materials, it does not necessarily develop teacher capacity to employ CRSP. The team also found that some OER developers charge for CRSP PD, which undermines a primary goal of OER (i.e., open access). One opportunity this talk provided was eventual access to a rubric to analyze critical consciousness in program materials and professional learning (Dr. Thomas said these materials will be posted on the SRI website in upcoming months). I believe this rubric may support equity-driven research and evaluation, including Empirical’s evaluation of the antiracist teacher residency program, CREATE (Collaboration and Reflection to Enhance Atlanta Teacher Effectiveness).

Dr. Rekha Balu, Urban Institute; Dr. Sean Reardon, Stanford University; Dr. Beth Boulay, Abt Associates

left to right: Dr. Beth Boulay, Dr. Rekha Balu, Dr. Sean Reardon, and Titilola Harley on stage at SREE

The plenary talk, featuring discussants Dr. Rekha Balu, Dr. Sean Reardon, and Dr. Beth Boulay, offered suggestions for designing equity- and action-driven effectiveness studies. Dr. Balu urged the SREE community to undertake “projects of a lifetime”. These are long-haul initiatives that push for structural change in search of racial justice. Dr. Balu argued that we could move away from typical thinking about race as a “control variable”, towards thinking about race as an experience, a system, and a structure.

Dr. Balu noted the necessity of mixed methods and participant-driven approaches to serve this goal. Along these same lines, Dr. Reardon felt we need to consider system-level inputs (e.g., school funding) and system-level outputs (e.g., rate of high school graduation) in order to understand disparities in opportunity, rather than just focusing on individual-level factors (e.g., teacher effectiveness, student GPA, parent involvement) that distract from larger forces of inequity. Dr. Boulay noted the importance of causal evidence to persuade key gatekeepers to pursue equity initiatives and called for more high quality measures to serve that goal.

Friday, September 23, 2022

The tone of the conference on Friday was to call people in (a phrase used in opposition to “call people out”, which is often ego-driven, alienating, and counter-productive to motivating change).

Dr. Ivory Toldson, Howard University

Dr. Ivory Toldson at a podium presenting at SREE

In the morning, I attended the Keynote Session by Dr. Ivory Toldson from Howard University. What stuck with me from Dr. Toldson’s talk was their argument that we tend to use numbers as a proxy for people in statistical models, but to avoid some of the racism inherent in our profession as researchers, we must see numbers as people. Dr. Toldson urged the audience to use people to understand numbers, not numbers to understand people. In other words, by deriving a statistical outcome, we do not necessarily know more about the people we study. However, we are equipped with a conversation starter. For example, if Dr. Toldson hadn’t invited Black boys to voice their own experience of why they sometimes struggle in school, they may have never drawn a potential link between sleep deprivation and ADHD diagnosis: a huge departure from the traditional deficit narrative surrounding Black boys in school.

Dr. Toldson also challenged us to consider what our choice in the reference group means in real terms. When we use White students as the reference group, we normalize Whiteness and we normalize groups with the most power. This impacts not only the conclusions we draw, but also the larger framework in which we operate (i.e., White = standard, good, normal).

I also appreciated Dr. Toldson’s commentary on the need for “distributive trust” in schools. They questioned why the people furthest from the students (e.g., superintendents, principals) are given the most power to name best practices, rather than empowering teachers to do what they know works best and to report back. This thought led me to wonder, what can we do as researchers to lend power to teachers and students? Not in a performative way, but in a way that improves our research by honoring their beliefs and first-hand experiences; how can we engage them as knowledgeable partners who should be driving the narrative of effectiveness work?

Dr. Deborah Lindo, Dr. Karin Lange, Adam Smith, EF+Math Program; Jenny Bradbury, Digital Promise; Jeanette Franklin, New York City DOE

Later in the day, I attended a session about building research programs on a foundation of equity. Folks from EF+Math Program (Dr. Deborah Lindo, Dr. Karin Lange, and Dr. Adam Smith), Digital Promise (Jenny Bradbury), and the New York City DOE (Jeanette Franklin) introduced us to some ideas for implementing inclusive research, including a) fostering participant ownership of research initiatives; b) valuing participant expertise in research design; c) co-designing research in partnership with communities and participants; d) elevating participant voice, experiential data, and other non-traditional effectiveness data (e.g., “street data”); and e) putting relationships before research design and outcomes. As the panel noted, racism and inequity are products of design and can be redesigned. More equitable research practices can be one way of doing that.

Saturday, September 24, 2022

Dr. Andrew Jaciw, Empirical Education

Dr. Andrew Jaciw at a podium presenting at SREE

On Saturday, I sat in on a session that included a talk given by my colleague Dr. Andrew Jaciw. Instead of relaying my own interpretation of Andrew’s ideas and the values they bring to the SREE community, I’ll just note that he will summarize the ideas and insights from his talk and subsequent discussion in an upcoming blog. Keep your eyes open for that!

See you next year!

Dr. Chelsey Nardi and Dr. Leanne Doughty

2022-11-29

Posted by: Chelsey Nardi

Tags: conference, education research, equity, racial justice and sree

Introducing SEERNet with the Goal of Replication Research

In 2021, we partnered with Digital Promise on a research proposal for the IES research network: Digital Learning Platforms to Enable Efficient Education Research Network. The project, SEER Research Network for Digital Learning Platforms (SEERNet) was funded through an IES education research grant in fall 2021, and we took off running. Digital Promise launched this SEERNet website to keep the community up to date on our progress. We’ve been meeting with five platform hosts, selected by IES, to develop ideas for replication research, generalizability in research, and rapid research.

The goal of SEERNet is to integrate rigorous education research into existing digital learning platforms (DLPs) in an effort to modernize research. The digital learning platforms have the potential to support education researchers as they study new ideas and seek to replicate those ideas quickly, across many sites, with a wide range of student populations and with a variety of education research topics. Each of the five platforms (listed below) will eventually have over 100,000 users, allowing us to explore ways to increase the efficiency of a replication study.

Kinetic by OpenStax
UpGrade/MATHia by Carnegie Learning
Learning at Scale by Arizona State University
E-Trials by ASSISTments
Terracotta by Canvas

As the network leads, Empirical Education and Digital Promise will work to share best practices among the DLPs and build a community of researchers and practitioners interested in the opportunities afforded by these innovative platforms for impactful research. Stay tuned for more updates on how you can get involved!

^{This project is supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305N210034 to Digital Promise. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.}

2022-01-20

Posted by: Robin Means

Tags: education research, education research grant, education research topics, generalizability, generalizability in research, rapid research, replication, what is a limitation that affects the generalizability of research results, what is a limitation that affects the generalizability of research results? and what is replication of a study

Happy New Year from Empirical Education

We’ve had an interesting year, to say the least. We are grateful for your colleagueship and friendship through it all. To ring in the new year, we want to share this one-hour playlist with you. It comprises songs from each person on our team. We hope you like it. Cheers to a healthy and prosperous 2022!

Happy New Year playlist

2021-12-16

Posted by: Robin Means

Tags: education research, empirical education, ESSA and evaluation

Introducing Our Newest Researchers

The Empirical Research Team is pleased to announce the addition of 2 new team members. We welcome Zahava Heydel and Chelsey Nardi on board as our newest researchers!

Zahava Heydel, Research Assistant

Zahava has taken on assisting Sze-Shun Lau with the CREATE project, a teacher residency program in Atlanta Public Schools invested in expanding equity in education by developing critically conscious, compassionate, and skilled educators. Zahava’s experience as a research assistant at the University of Colorado Anschutz Medical Campus Department of Psychiatry, Colorado Center for Women’s Behavioral Health and Wellness is an asset to the Empirical Education team as we move toward evaluating SEL programs and individual student needs.

Chelsey Nardi, Research Manager

Chelsey is taking on the role of co-project manager for our evaluation of the CREATE project, working with Sze-Shun and Zahava. Chelsey is currently working toward her PhD exploring the application of antiracist theories in science education, which may support the evaluation of CREATE’s mission to develop critically conscious educators. Additionally, her research experience at McREL International and REL Pacific as a Research and Evaluation Associate has prepared her for managing some of our REL Southwest applied research projects. These experiences, coupled with her experience in project management, makes her an ideal fit for our team.

2021-10-04

Posted by: Robin Means

Tags: Chelsey Nardi, create, education research, rel-sw, research and Zahava Heydel

Empirical Education Wraps Up Two Major i3 Research Studies

Empirical Education is excited to share that we recently completed two Investing In Innovation (i3) (now EIR) evaluations for the Making Sense of SCIENCE program and the Collaboration and Reflection to Enhance Atlanta Teacher Effectiveness (CREATE) programs. We thank the staff on both programs for their fantastic partnership. We also acknowledge Anne Wolf, our i3 technical assistance liaison from Abt Associates, as well as our Technical Working Group members on the Making Sense of SCIENCE project (Anne Chamberlain, Angela DeBarger, Heather Hill, Ellen Kisker, James Pellegrino, Rich Shavelson, Guillermo Solano-Flores, Steve Schneider, Jessaca Spybrook, and Fatih Unlu) for their invaluable contributions. Conducting these two large-scale, complex, multi-year evaluations over the last five years has not only given us the opportunity to learn much about both programs, but has also challenged our thinking—allowing us to grow as evaluators and researchers. We now reflect on some of the key lessons we learned, lessons that we hope will contribute to the field’s efforts in moving large-scale evaluations forward.

Background on Both Programs and Study Summaries

Making Sense of SCIENCE (developed by WestEd) is a teacher professional learning model aimed at increasing student achievement through improving instruction and supporting districts, schools, and teachers in their implementation of the Next Generation Science Standards (NGSS). The key components of the model include building leadership capacity and providing teacher professional learning. The program’s theory of action is based on the premise that professional learning that is situated in an environment of collaborative inquiry and supported by school and district leadership produces a cascade of effects on teachers’ content and pedagogical content knowledge, teachers’ attitudes and beliefs, the school climate, and students’ opportunities to learn. These effects, in turn, yield improvements in student achievement and other non-academic outcomes (e.g., enjoyment of science, self-efficacy, and agency in science learning). NGSS had just been introduced two years prior to the study, a study which ran from 2015 through 2018. The infancy of NGSS and the resulting shifting landscape of science education posed a significant challenge to our study, which we discuss below.

Our impact study of Making Sense of SCIENCE was a cluster-randomized, two-year evaluation involving more than 300 teachers and 8,000 students. Confirmatory impact analyses found a positive and statistically significant impact on teacher content knowledge. While impact results on student achievement were mostly all positive, none reached statistical significance. Exploratory analyses found positive impacts on teacher self-reports of time spent on science instruction, shifts in instructional practices, and amount of peer collaboration. Read our final report here.

CREATE is a three-year teacher residency program for students of Georgia State University College of Education and Human Development (GSU CEHD) that begins in their last year at GSU and continues through their first two years of teaching. The program seeks to raise student achievement by increasing teacher effectiveness and retention of both new and veteran educators by developing critically-conscious, compassionate, and skilled educators who are committed to teaching practices that prioritize racial justice and interrupt inequities.

Our impact study of CREATE used a quasi-experimental design to evaluate program effects for two staggered cohorts of study participants (CREATE and comparison early career teachers) from their final year at GSU CEHD through their second year of teaching, starting with the first cohort in 2015–16. Confirmatory impact analyses found no impact on teacher performance or on student achievement. However, exploratory analyses revealed a positive and statistically significant impact on continuous retention over a three-year time period (spanning graduation from GSU CEHD, entering teaching, and retention into the second year of teaching) for the CREATE group, compared to the comparison group. We also observed that higher continuous retention among Black educators in CREATE, relative to those in the comparison group, is the main driver of the favorable impact. The fact that the differential impacts on Black educators were positive and statistically significant for measures of executive functioning (resilience) and self-efficacy—and marginally statistically significant for stress management related to teaching—hints at potential mediators of impact on retention and guides future research.

After the i3 program funded this research, Empirical Education, GSU CEHD, and CREATE received two additional grants from the U.S. Department of Education’s Supporting Educator Effectiveness Development (SEED) program for further study of CREATE. We are currently studying our sixth cohort of CREATE residents and will have studied eight cohorts of CREATE residents, five cohorts of experienced educators, and two cohorts of cooperating teachers by the end of the second SEED grant. We are excited to continue our work with the GSU and CREATE teams and to explore the impact of CREATE, especially for retention of Black educators. Read our final report for the i3 evaluation of CREATE here.

Lessons Learned

While there were many lessons learned over the past five years, we’ll highlight two that were particularly challenging and possibly most pertinent to other evaluators.

The first key challenge that both studies faced was the availability of valid and reliable instruments to measure impact. For Making Sense of SCIENCE, a measure of student science achievement that was aligned with NGSS was difficult to identify because of the relative newness of the standards, which emphasized three-dimensional learning (disciplinary core ideas, science and engineering practices, and cross-cutting concepts). This multi-dimensional learning stood in stark contrast to the existing view of science education at the time, which primarily focused on science content. In 2014, one year prior to the start of our study, the National Research Council pointed out that “the assessments that are now in wide use were not designed to meet this vision of science proficiency and cannot readily be retrofitted to do so” (NRC, 2014, page 12). While state science assessments that existed at the time were valid and reliable, they focused on science content and did not measure the type of three-dimensional learning targeted by NGSS. The NRC also noted that developing new assessments would “present[s] complex conceptual, technical, and practical challenges, including cost and efficiency, obtaining reliable results from new assessment types, and developing complex tasks that are equitable for students across a wide range of demographic characteristics” (NRC, 2014, p.16).

Given this context, despite the research team’s extensive search for assessments from a variety of sources—including reaching out to state departments of education, university-affiliated assessment centers, and test developers—we could not find an appropriate instrument. Using state assessments was not an option. The states in our study were still in the process of either piloting or field testing assessments that were aligned to NGSS or to state standards based on NGSS. This void of assessments left the evaluation team with no choice but to develop one, independently of the program developer, using established items from multiple sources to address general specifications of NGSS, and relying on the deep content expertise of some members of the research team. Of course there were some risks associated with this, especially given the lack of opportunity to comprehensively pilot or field test the items in the context of the study. When used operationally, the researcher-developed assessment turned out to be difficult and was not highly discriminating of ability at the low end of the achievement scale, which may have influenced the small effect size we observed. The circumstances around the assessment and the need to improvise a measure leads us to interpret findings related to science achievement of the Making Sense of SCIENCE program with caution.

The CREATE evaluation also faced a measurement challenge. One of the two confirmatory outcomes in the study was teacher performance, as measured by ratings of teachers by school administrators on two of the state’s Teacher Assessment on Performance Standards (TAPS), which is a component of the state’s evaluation system (Georgia Department of Education, 2021). We could not detect impact on this measure because the variance observed in the ordinal ratings was remarkably low, with ratings overwhelmingly centered on the median value. This was not a complete surprise. The literature documents this lack of variability in teaching performance ratings. A seminal report, The Widget Effect by The New Teacher Project (Weisberg et al., 2009), called attention to this “national crisis”—the inability of schools to effectively differentiate among low- and high-performing teachers. The report showed that in districts that use binary evaluation ratings, as well as those that use a broader range of rating options, less than 1% of teachers received a rating of unsatisfactory. In the CREATE study, the median value was chosen overwhelmingly. In a study examining teacher performance ratings by Kraft and Gilmour (2017), principals in that study explained that they were more reluctant to give new teachers a rating below proficient because they acknowledge that new teachers were still working to improve their teaching, and that “giving a low rating to a potentially good teacher could be counterproductive to a teacher’s development.” These reasons are particularly relevant to the CREATE study given that the teachers in our study are very early in their teaching career (first year teachers), and given the high turnover rate of all teachers in Georgia.

We bring up this point about instruments as a way to share with the evaluation community what we see as a not uncommon challenge. In 2018 (the final year of outcomes data collection for Making Sense of SCIENCE), when we presented about the difficulties of finding a valid and reliable NGSS-aligned instrument at AERA, a handful of researchers approached us to commiserate; they too were experiencing similar challenges with finding an established NGSS-aligned instrument. As we write this, perhaps states and testing centers are further along in their development of NGSS-aligned assessments. However, the challenge of finding valid and reliable instruments, generally speaking, will persist as long as educational standards continue to evolve. (And they will.) Our response to this challenge was to be as transparent as possible about the instruments and the conclusions we can draw from using them. In reporting on Making Sense of SCIENCE, we provided detailed descriptions of our process for developing the instruments and reported item- and form-level statistics, as well as contextual information and rationale for critical decisions. In reporting on CREATE, we provided the distribution of ratings on the relevant dimensions of teacher performance for both the baseline and outcome measures. In being transparent, we allow the readers to draw their own conclusions from the data available, facilitate the review of the quality of the evidence against various sets of research standards, support replication of the study, and provide further context for future study.

A second challenge was maintaining a consistent sample over the course of the implementation, particularly in multi-year studies. For Making Sense of SCIENCE, which was conducted over two years, there was substantial teacher mobility into and out of the study. Given the reality of schools, even with study incentives, nearly half of teachers moved out of study schools or study-eligible grades within schools over the two year period of the study. This obviously presented a challenge to program implementation. WestEd delivered professional learning as intended, and leadership professional learning activities all met fidelity thresholds for attendance, with strong uptake of Making Sense of SCIENCE within each year (over 90% of teachers met fidelity thresholds). Yet, only slightly more than half of study teachers met the fidelity threshold for both years. The percentage of teachers leaving the school was congruous with what we observed at the national level: only 84% of teachers stay as a teacher at the same school year-over-year (McFarland et al., 2019). For assessing impacts, the effects of teacher mobility can be addressed to some extent at the analysis stage; however, the more important goal is to figure out ways to achieve fidelity of implementation and exposure for the full program duration. One option is to increase incentivization and try to get more buy-in, including among administration, to allow more teachers to reach the two-year participation targets by retaining teachers in subjects and grades to preserve their eligibility status in the study. This solution may go part way because teacher mobility is a reality. Another option is to adapt the program to make it shorter and more intensive. However, this option may work against the core model of the program’s implementation, which may require time for teachers to assimilate their learning. Yet another option is to make the program more adaptable; for example, by letting teachers who leave eligible grades and school to continue to participate remotely, allowing impacts to be assessed over more of the initially randomized sample.

For CREATE, sample size was also a challenge, but for slightly different reasons. During study design and recruitment, we had anticipated and factored the estimated level of attrition into the power analysis, and we successfully recruited the targeted number of teachers. However, several unexpected limitations arose during the study that ultimately resulted in small analytic samples. These limitations included challenges in obtaining research permission from districts and schools (which would have allowed participants to remain active in the study), as well as a loss of study participants due to life changes (e.g., obtaining teaching positions in other states, leaving the teaching profession completely, or feeling like they no longer had the time to complete data collection activities). Also, while Georgia administers the Milestones state assessment in grades 4–8, many participating teachers in both conditions taught lower elementary school grades or non-tested subjects. For the analysis phase, many factors resulted in small student samples: reduced teacher samples, the technical requirement of matching students across conditions within each cohort in order to meet WWC evidence standards, and the need to match students within grades, given the lack of vertically scaled scores. While we did achieve baseline equivalence between the CREATE and comparison groups for the analytic samples, the small number of cases greatly reduced the scope and external validity of the conclusions related to student achievement. The most robust samples were for retention outcomes. We have the most confidence in those results.

As a last point of reflection, we greatly enjoyed and benefited from the close collaboration with our partners on these projects. The research and program teams worked together in lockstep at many stages of the study. We also want to acknowledge the role that the i3 grant played in promoting the collaboration. For example, the grant’s requirements around the development and refinement of the logic model was a major driver of many collaborative efforts. Evaluators reminded the team periodically about the “accountability” requirements, such as ensuring consistency in the definition and use of the program components and mediators in the logic model. The program team, on the other hand, contributed contextual knowledge gained through decades of being intimately involved in the program. In the spirit of participatory evaluation, the two teams benefited from the type of organization learning that “occurs when cognitive systems and memories are developed and shared by members of the organizations” (Cousins & Earl, 1992). This type of organic and fluid relationship encouraged the researchers and program teams to embrace uncertainty during the study. While we “pre-registered” confirmatory research questions for both studies by submitting the study plans to NEi3 prior to the start of the studies, we allowed exploratory questions to be guided by conversations with the program developers. In doing so, we were able to address questions that were most useful to the program developers and the districts and schools implementing the programs.

We are thankful that we had the opportunity to conduct these two rigorous evaluations alongside such humble, thoughtful, and intentional (among other things!) program teams over the last five years, and we look forward to future collaborations. These two evaluations have both broadened and deepened our experience with large-scale evaluations, and we hope that our reflections here not only serve as lessons for us, but that they may also be useful to the education evaluation community at large, as we continue our work in the complex and dynamic education landscape.

References

Cousins, J. B., & Earl, L. M. (1992). The case for participatory evaluation. Educational Evaluation and Policy Analysis, 14(4), 397-418.

Georgia Department of Education (2021). Teacher Keys Effectiveness System. https://www.gadoe.org/School-Improvement/Teacher-and-Leader-Effectiveness/Pages/Teacher-Keys-Effectiveness-System.aspx

Kraft, M. A., & Gilmour, A. F. (2017). Revisiting the widget effect: Teacher evaluation reforms and the distribution of teacher effectiveness. Educational Researcher, 46(5), 234-249.

McFarland, J., Hussar, B., Zhang, J., Wang, X., Wang, K., Hein, S., Diliberti, M., Forrest Cataldi, E., Bullock Mann, F., and Barmer, A. (2019). The Condition of Education 2019 (NCES 2019-144). U.S. Department of Education. National Center for Education Statistics. https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2019144

National Research Council (NRC). (2014). Developing Assessments for the Next Generation Science Standards. Committee on Developing Assessments of Science Proficiency in K-12. Board on Testing and Assessment and Board on Science Education, J.W. Pellegrino, M.R. Wilson, J.A. Koenig, and A.S. Beatty, Editors. Division of Behavioral and Social Sciences and Education. The National Academies Press.

Weisberg, D., Sexton, S., Mulhern, J., & Keeling, D. (2009). The Widget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness. The New Teacher Project. https://tntp.org/wp-content/uploads/2023/02/TheWidgetEffect_2nd_ed.pdf

2021-06-23

Posted by: Thanh Nguyen, Jenna Zacamy, & Andrew Jaciw

Tags: CREATE, education research, empirical education, evaluation, Georgia State University, i3, Making Sense of SCIENCE, QE, RCT, research, teacher and WestEd

SREE 2020 Goes Virtual

We, like many of you, were excited to travel to Washington DC in March 2020 to present at the annual conference of the Society for Research on Educational Effectiveness (SREE). This would have been our 15th year attending or presenting at the SREE conference! We had been looking forward to learning from a variety of sessions and to sharing our own work with the SREE community, so imagine our disappointment when the conference was cancelled (rightfully) in response to the pandemic. Thankfully, SREE offered presenters the option to share their work virtually, and we are excited to have taken part in this opportunity!

Among the several accepted conference proposals, we decided to host the symposium on Social and Emotional Learning in Educational Settings & Academic Learning because it incorporated several of our major projects—three evaluations funded by the Department of Education’s i3/EIR program—two of which focus on teacher professional development and one that focuses on content enhancement routines and student content knowledge. We were joined by Katie Lass who presented on another i3/EIR evaluation conducted by the Policy & Research Group and by Anne Wolf, from Abt Associates, who served as the discussant. The presentations focused on unpacking the logic model for each of the respective programs and collectively, we tried to uncover common threads and lessons learned across the four i3/EIR evaluations.

We were happy to have a turnout that was more than we had hoped for and a rich discussion about the topic. The recording of our virtual symposium is now available here. Below are materials from each presentation.

We look forward to next year!

9A. Unpacking the Logic Model: A Discussion of Mediators and Antecedents of Educational Outcomes from the Investing in Innovation (i3) Program

Symposium: September 9, 1:00-2:00 PM EDT

Section: Social and Emotional Learning in Educational Settings & Academic Learning in Education Settings

Abstract

Slides

Organizer: Katie Lass, Policy & Research Group

Impact on Antecedents of Student Dropout in a Cross-Age Peer Mentoring Program

Abstract

Katie Lass, Policy & Research Group*; Sarah Walsh, Policy & Research Group; Eric Jenner, Policy & Research Group; and Sherry Barr, Center for Supportive Schools

Supporting Content-Area Learning in Biology and U.S. History: A Randomized Control Trial of Enhanced Units in California and Virginia

Abstract

Hannah D’Apice, Empirical Education*; Adam Schellinger, Empirical Education; Jenna Zacamy, Empirical Education; Xin Wei, SRI International; and Andrew P. Jaciw, Empirical Education

The Role of Socioemotional Learning in Teacher Induction: A Longitudinal Study of the CREATE Teacher Residency Program

Abstract

Audra Wingard, Empirical Education*; Andrew P. Jaciw, Empirical Education; Jenna Zacamy, Empirical Education

Uncovering the Black Box: Exploratory Mediation Analysis for a Science Teacher Professional Development Program

Abstract

Thanh Nguyen, Empirical Education*; Andrew P. Jaciw, Empirical Education; and Jenna Zacamy, Empirical Education

Discussant: Anne Wolf, Abt Associates

2020-10-24

Posted by: Robin Means & Thanh Nguyen

Tags: conference, education research, EIR, empirical education, evaluation, i3, mediation anaysis, professional development, RCT, research, SEL and sree

Updating Evidence According to ESSA

The U.S. Department of Education (ED) sets validity standards for evidence of what works in schools through The Every Student Succeeds Act (ESSA), which provides usefully defined tiers of evidence.

When we helped develop the research guidelines for the Software & Information Industry Association, we took a close look at ESSA and how it is often interpreted. Now, as research is evolving with cloud-based online tools that automatically report usage data, it is important to review the standards and to clarify both ESSA’s useful advances and how the four tiers fail to address some critical scientific concepts. These concepts are needed for states, districts, and schools to make the best use of research findings.

We will cover the following in subsequent postings on this page.

Evidence According to ESSA: Since the founding of the Institute of Education Sciences and NCLB in 2002, the philosophy of evaluation has been focused on the perfection of one good study. We’ll discuss the cost and technical issues this kind of study raises and how it sometimes reinforces educational inequity.
Who is Served by Measuring Average Impact: The perfect design focused on the average impact of a program across all populations of students, teachers, and schools has value. Yet, school decision makers need to also know about the performance differences between specific groups such as students who are poor or middle class, teachers with one kind of preparation or another, or schools with AP courses vs. those without. Mark Schneider, IES’s director defines the IES mission as “IES is in the business of identifying what works for whom under what conditions.” This framing is a move toward a broader focus with more relevant results.
Differential Impact is Unbiased. According to the ESSA standards, studies must statistically control for selection bias and other sources of bias. But biases that impact the average for the population in the study don’t impact the size of the differential effect between subgroups. The interaction between the program and the population characteristic is unaffected. And that’s what the educators need to know about.
Putting many small studies together. Instead of the One Good Study approach we see the need for multiple studies each collecting data on differential impacts for subgroups. As Andrew Coulson of Mind Research Institute put it, we have to move from the One Good Study approach and on to valuing multiple studies with enough variety to be able to account for commonalities among districts. We add that meta-analysis of interaction effects are entirely feasible.

Our team works closely with the ESSA definitions and has addressed many of these issues. Our design for an RCE for the Dynamic Learning Project shows how Tiers 2 and 3 can be combined to answer questions involving intermediate results (or mediators). If you are interested in more information on the ESSA levels of evidence, the video on this page is a recorded webinar that provides clarification.

2020-05-12

Posted by: Denis Newman

Tags: ED, education research, efficacy, ESSA, evidence, impact and research

The Evaluation of CREATE Continues

Empirical Education began conducting the evaluation of Collaboration and Reflection to Enhance Atlanta Teacher Effectiveness (CREATE) in 2015 under a subcontract with Atlanta Neighborhood Charter Schools (ANCS) as part of their Investing in Innovation (i3) Development grant. Since our last CREATE update, we’ve extended this work through the Supporting Effective Educator Development (SEED) Grant Program. The SEED grant provides continued funding for three more cohorts of participants and expands the research to include experienced educators (those not in the CREATE residency program) in CREATE schools. The grant was awarded to Georgia State University and includes partnerships with ANCS, Empirical Education (as the external evaluator), and local schools and districts.

Similar to the i3 work, we’re following a treatment and comparison group over the course of the three-year CREATE residency program and looking at impacts on teacher effectiveness, teacher retention, and student achievement. With the SEED project, we will also be able to follow Cohort 3 and 4 for an additional 1-2 years following residency. Surveys will measure perceived levels of social capital, school climate and community, collaboration, resilience, and mindfulness, in addition to other topics. Recruitment for Cohort 4 began this past spring and continued through the summer, resulting in approximately 70 new participants.

One of the goals of the expanded CREATE programming is to support the effectiveness and social capital of experienced educators in CREATE schools. Any experienced educator in a CREATE school who attends CREATE professional learning activities will be invited to participate in the research study. Surveys will measure similar topics to those measured in the quasi-experiment and we conduct individual interviews with a sample of participants to gain an in-depth understanding of the participant experience.

We have completed our first year of experienced educator research and continue to recruit participants, on an ongoing basis, into the second year of the study. We currently have 88 participants and counting.

2018-10-03

Posted by: Jacquelyn Tran & Megan Toby

Tags: CREATE, education research, empirical education, evaluation, Georgia State University, QE, research, SEED and teacher

blog posts and news stories

Meet Our Newest Researchers

2023-03-17

Rock Island-Milan School District, Connect with Kids, and Empirical Education Win an EIR Grant

2023-03-14

SREE 2022 Annual Meeting

Wednesday, September 21, 2022

Dr. Kamilah B. Legette, University of Denver

Dr. Johari Harris, University of Virginia

Thursday, September 22, 2022

Dr. Krystal Thomas, SRI

Dr. Rekha Balu, Urban Institute; Dr. Sean Reardon, Stanford University; Dr. Beth Boulay, Abt Associates

Friday, September 23, 2022

Dr. Ivory Toldson, Howard University

Dr. Deborah Lindo, Dr. Karin Lange, Adam Smith, EF+Math Program; Jenny Bradbury, Digital Promise; Jeanette Franklin, New York City DOE

Saturday, September 24, 2022

Dr. Andrew Jaciw, Empirical Education

2022-11-29

Introducing SEERNet with the Goal of Replication Research

2022-01-20

Happy New Year from Empirical Education

Happy New Year playlist

2021-12-16

Introducing Our Newest Researchers

2021-10-04

Empirical Education Wraps Up Two Major i3 Research Studies

Background on Both Programs and Study Summaries

Lessons Learned

2021-06-23

SREE 2020 Goes Virtual

2020-10-24

Updating Evidence According to ESSA

2020-05-12

The Evaluation of CREATE Continues

2018-10-03

Archive