blog posts and news stories

Pittsburgh Public Schools Uses 20 New Content Suite Videos

In June 2015, Pittsburgh Public Schools began using Observation Engine for calibration and training their teacher evaluators. They were one of our first clients to use the Content Suite to calibrate and certify classroom observers.

The Content Suite contains a collection of master scored videos along with thoughtful, objective score justifications on all observable elements of teaching called for by the evaluation framework used in the district. And it includes short video clips focused on one particular aspect of teaching. The combination of full-length and short clips makes it possible to easily and flexibly set up practice exercises, collaborative calibration sessions, and formal certification testing. Recently, we have added 20 new videos to the Content Suite collection!

The Content Suite can be used with most frameworks, either as-is or modified to ensure that the scores and justifications are consistent with the local context and observation framework interpretation. Observation Engine support staff will work closely with each client to modify content and design a customized implementation plan that meets the goals of the school system and sets up evaluators for success. For more information about the Content Suite, click here.


Arkansas Implements Observation Engine Statewide

BloomBoard’s observation tool, EdReflect, has been used across the state of Arkansas since fall 2014. Last year, the Arkansas Department of Education piloted Observation Engine, an online observation training and calibration tool from Empirical Education Inc., in four districts under the state’s Equitable Access Plan. Accessible through the BloomBoard platform, Observation Engine allows administrators and other teacher evaluators to improve scoring calibration and reliability through viewing and rating videos of classroom lessons collected in thousands of classrooms across the country.

Paired with BloomBoard resources and training, the results were impressive. In one district, the number of observers scoring above target increased from 43% to 100%. Not only that but the percent discrepancy (scores that were two levels above or below the target) decreased from 9% to 0%. Similar results were found in the other three pilot districts, prompting decision makers to make Observation Engine readily available to districts throughout the state.

“EdReflect has proven to be a valuable platform for educator observations in Arkansas. The professional conversation, which results from the ability to provide timely feedback and shared understanding of effective practice, has proven to ensure a transparency and collaboration that we have not experienced before. With the addition of Empirical Education’s Observation Engine, credentialed teacher observers have ready access to increase inter-rater reliability and personal skill. For the first time this year, BloomBoard Collections and micro-credentials have begun meeting individualized professional learning needs for educators all over the state.”
– Sandra Hurst, Arkansas Department of Education

In July, the Arkansas Department of Education decided to offer Observation Engine to the entire state. About half of all districts in the state opted in to receive the service, with the implementation spanning three groups of users in Arkansas. The Beginning Administrators group has already started pursuing a micro-credential based on Observation Engine. Micro-credentials are a digital form of certification that indicate a person has demonstrated competency in a specific skill set. The Beginning Administrators group can earn their “Observation Skills for Beginning Administrators” micro-credential by demonstrating observation skill competencies using Observation Engine’s online observer calibration tool to practice and assess observation skills.

Next month, the 26 more districts under the Equitable Access Plan and the remaining Arkansas districts will begin using Observation Engine. We look forward to following and reporting on the progress of these districts during the 2016-17 school year.


U.S. Department of Education Could Expand its Concept of Student Growth

The continuing debate about the use of student test scores as a part of teacher evaluation misses an essential point. A teacher’s influence on a student’s achievement does not end in spring when the student takes the state test (or is evaluated using any of the Student Learning Objectives methods). An inspiring teacher, or one that makes a student feel recognized, or one that digs a bit deeper into the subject matter, may be part of the reason that the student later graduates high school, gets into college, or pursues a STEM career. These are “student achievements,” but they are ones that show up years after a teacher had the student in her class. As a teacher is getting students to grapple with a new concept, the students may not demonstrate improvements on standardized tests that year. But the “value-added” by the teacher may show up in later years.

States and districts implementing educator evaluations as part of their NCLB waivers are very aware of the requirement that they must “use multiple valid measures in determining performance levels, including as a significant factor data on student growth …” Student growth is defined as change between points in time in achievement on assessments. Student growth defined in this way obscures a teacher’s contribution to a student’s later school career.

As a practical matter, it may seem obvious that for this year’s evaluation, we can’t use something that happens next year. But recent analyses of longitudinal data, reviewed in an excellent piece by Raudenbush show that it is possible to identify predictors of later student achievement associated with individual teacher practices and effectiveness. The widespread implementation of multiple-measure teacher evaluations is starting to accumulate just the longitudinal datasets needed to do these predictive analyses. On the basis of these analyses we may be able to validate many of the facets of teaching that we have found, in analyses of the MET data, to be unrelated to student growth as defined in the waiver requirements.

Insofar as we can identify, through classroom observations and surveys, practices and dispositions that are predictive of later student achievement such as college going, then we have validated those practices. Ultimately, we may be able to substitute classroom observations and surveys of students, peers, and parents for value-added modeling based on state tests and other ad hoc measures of student growth. We are not yet at that point, but the first step will be to recognize that a teacher’s influence on a student’s growth extends beyond the year she has the student in the class.


State Reports Show Almost All Teachers Are Effective or Highly So. Is This Good News?

The New York Times recently picked up a story, originally reported in Education Week two months ago, that school systems using formal methods for classroom observation as part of their educator evaluations are giving all but a very small percent of teachers high ratings—a phenomenon commonly known as the “widget effect.” The Times quotes Russ Whitehurst as suggesting that “It would be an unusual profession that at least 5 percent are not deemed ineffective.”

Responding to the story in her blog, Diane Ravitch calls it “unintentionally hilarious,” portraying the so-called reformers as upset that their own expensive evaluation methods are finding that most teachers are good at what they do. In closing, she asks, “Where did all those ineffective teachers go?”

We’re a research company working actively on teacher evaluation, so we’re interested in these kinds of questions. Should state-of-the-art observation protocols have found more teachers in the “needs improvement” category or at least 5% labeled “ineffective”? We present here an informal analysis meant to get an approximate answer, but based on data that was collected in a very rigorous manner. As one of the partners in the Gates Foundation’s Measures of Effective Teaching (MET) project, Empirical Education has access to a large dataset available for this examination, including videotaped lessons for almost 2,000 teachers coded according to a number of popular observational frameworks. Since the MET raters were trained intensively using methods approved by the protocol developers and had no acquaintance or supervisory relationship with the teachers in the videos, there is reason to think that the results show the kind of distribution intended by the developers of the observation methods. We can then compare the results in this controlled environment to the results referred to in the EdWeek and Times articles, which were based on reporting by state agencies. We used a simple (but reasonable) way of calculating the distribution of teachers in the MET data according to the categories in one popular protocol and compared it to the results reported by one of the states for a district known to have trained principals and other observers in the same protocol. We show the results here. The light bars show the distribution of the ratings in the MET data. We can see that a small percentage are rated “highly effective” and an equally small percentage “unsatisfactory.” So although the number doesn’t come up to the percent suggested by Russ Whitehurst, this well-developed method finds only 2% of a large sample of teachers to be in the bottom category. About 63% are considered “effective”, while a third are given a “needs improvement” rating. The dark bars are the ratings given by the school district using the same protocol. This shows a distribution typical of what EdWeek and the Times reported, where 97% are rated as “highly effective” or “effective.” It is interesting that the school district and MET research both found a very small percentage of unsatisfactory teachers.

Where we find a big difference is in the fact that the research program deemed only a small number of teachers to be exceptional while the school system used that category much more liberally. The other major difference is in the “needs improvement” category. When the observational protocol is used as designed, a solid number of teachers are viewed as doing OK but potentially doing much better. Both in research and in practice, the observational protocol divides most teachers between two categories. In the research setting, the distinction is between teachers who are effective and those who need improvement. In practice, users of the same protocol distinguish between effective and highly effective teachers. Both identify a small percent as unsatisfactory.

Our analysis suggests two problems with the use of the protocol in practice: first, the process does not provide feedback to teachers who are developing their skills, and, second, it does not distinguish between very good teachers and truly exceptional ones. We can imagine all sorts of practical pressures that, for the evaluators (principals, coaches and other administrators) decrease the value of identifying teachers who are less than fully effective and can benefit from developing specific skills. For example, unless all the evaluators in a district simultaneously agree to implement more stringent evaluations, then teachers in the schools where such evaluations are implemented will be disadvantaged. It will help to also have consistent training and calibration for the evaluators as well as accountability, which can be done with a fairly straightforward examination of the distribution of ratings.

Although this was a very informal analysis with a number of areas where we approximated results, we think we can conclude that Russ Whitehurst probably overstated the estimate of ineffective teachers but Diane Ravitch probably understated the estimate of teachers who could use some help and guidance in getting better at what they do.

Postscript. Because we are researchers and not committed to the validity of the observational methods, we need to state that we don’t know the extent to which the teachers labeled ineffective are generally less capable of raising student achievement. But researchers are notorious for ending all our reports with “more research is needed!”


Does 1 teacher = 1 number? Some Questions About the Research on Composite Measures of Teacher Effectiveness

We are all familiar with approaches to combining student growth metrics and other measures to generate a single measure that can be used to rate teachers for the purpose of personnel decisions. For example, as an alternative to using seniority as the basis for reducing the workforce, a school system may want to base such decisions—at least in part—on a ranking based on a number of measures of teacher effectiveness. One of the reports released January 8 by the Measures of Effective Teaching (MET) addressed approaches to creating a composite (i.e., a single number that averages various aspects of teacher performance) from multiple measures such as value-added modeling (VAM) scores, student surveys, and classroom observations. Working with the thousands of data points in the MET longitudinal database, the researchers were able to try out multiple statistical approaches to combining measures. The important recommendation from this research for practitioners is that, while there is no single best way to weight the various measures that are combined in the composite, balancing the weights more evenly tends to increase reliability.

While acknowledging the value of these analyses, we want to take a step back in this commentary. Here we ask whether agencies may sometimes be jumping to the conclusion that a composite is necessary when the individual measures (and even the components of these measures) may have greater utility than the composite for many purposes.

The basic premise behind creating a composite measure is the idea that there is an underlying characteristic that the composite can more or less accurately reflect. The criterion for a good composite is the extent to which the result accurately identifies a stable characteristic of the teacher’s effectiveness.

A problem with this basic premise is that in focusing on the common factor, the aspects of each measure that are unrelated to the common factor get left out—treated as noise in the statistical equation. But, what if observations and student surveys measure things that are unrelated to what the teacher’s students are able to achieve in a single year under her tutelage (the basis for a VAM score)? What if there are distinct domains of teacher expertise that have little relation to VAM scores? By definition, the multifaceted nature of teaching gets reduced to a single value in the composite.

This single value does have a use in decisions that require an unequivocal ranking of teachers, such as some personnel decisions. For most purposes, however, a multifaceted set of measures would be more useful. The single measure has little value for directing professional development, whereas the detailed output of the observation protocols are designed for just that. Consider a principal deciding which teachers to assign as mentors, or a district administrator deciding which teachers to move toward a principalship. Might it be useful, in such cases, to have several characteristics to represent different dimensions of abilities relevant to success in the particular roles?

Instead of collapsing the multitude of data points from achievement, surveys, and observations, consider an approach that makes maximum use of the data points to identify several distinct characteristics. In the usual method for constructing a composite (and in the MET research), the results for each measure (e.g., the survey or observation protocol) are first collapsed into a single number, and then these values are combined into the composite. This approach already obscures a large amount of information. The Tripod student survey provides scores on the seven Cs; an observation framework may have a dozen characteristics; and even VAM scores, usually thought of as a summary number, can be broken down (with some statistical limitations) into success with low-scoring vs. with high-scoring students (or any other demographic category of interest). Analyzing dozens of these data points for each teacher can potentially identify several distinct facets of a teacher’s overall ability. Not all facets will be strongly correlated with VAM scores but may be related to the teacher’s ability to inspire students in subsequent years to take more challenging courses, stay in school, and engage parents in ways that show up years later.

Creating a single composite measure of teaching has value for a range of administrative decisions. However, the mass of teacher data now being collected are only beginning to be tapped for improving teaching and developing schools as learning organizations.


Can We Measure the Measures of Teaching Effectiveness?

Teacher evaluation has become the hot topic in education. State and local agencies are quickly implementing new programs spurred by federal initiatives and evidence that teacher effectiveness is a major contributor to student growth. The Chicago teachers’ strike brought out the deep divisions over the issue of evaluations. There, the focus was on the use of student achievement gains, or value-added. But the other side of evaluation—systematic classroom observations by administrators—is also raising interest. Teaching is a very complex skill, and the development of frameworks for describing and measuring its interlocking elements is an area of active and pressing research. The movement toward using observations as part of teacher evaluation is not without controversy. A recent OpEd in Education Week by Mike Schmoker criticizes the rapid implementation of what he considers overly complex evaluation templates “without any solid evidence that it promotes better teaching.”

There are researchers engaged in the careful study of evaluation systems, including the combination of value-added and observations. The Bill and Melinda Gates Foundation has funded a large team of researchers through its Measures of Effective Teaching (MET) project, which has already produced an array of reports for both academic and practitioner audiences (with more to come). But research can be ponderous, especially when the question is whether such systems can impact teacher effectiveness. A year ago, the Institute of Education Sciences (IES) awarded an $18 million contract to AIR to conduct a randomized experiment to measure the impact of a teacher and leader evaluation system on student achievement, classroom practices, and teacher and principal mobility. The experiment is scheduled to start this school year and results will likely start appearing by 2015. However, at the current rate of implementation by education agencies, most programs will be in full swing by then.

Empirical Education is currently involved in teacher evaluation through Observation Engine: our web-based tool that helps administrators make more reliable observations. See our story about our work with Tulsa Public Schools. This tool, along with our R&D on protocol validation, was initiated as part of the MET project. In our view, the complexity and time-consuming aspects of many of the observation systems that Schmoker criticizes arise from their intended use as supports for professional development. The initial motivation for developing observation frameworks was to provide better feedback and professional development for teachers. Their complexity is driven by the goal of providing detailed, specific feedback. Such systems can become cumbersome when applied to the goal of providing a single score for every teacher representing teaching quality that can be used administratively, for example, for personnel decisions. We suspect that a more streamlined and less labor-intensive evaluation approach could be used to identify the teachers in need of coaching and professional development. That subset of teachers would then receive the more resource-intensive evaluation and training services such as complex, detailed scales, interviews, and coaching sessions.

The other question Schmoker raises is: do these evaluation systems promote better teaching? While waiting for the IES study to be reported, some things can be done. First, look at correlations of the components of the observation rubrics with other measures of teaching such as value-added to student achievement (VAM) scores or student surveys. The idea is to see whether the behaviors valued and promoted by the rubrics are associated with improved achievement. The videos and data collected by the MET project are the basis for tools to do this (see earlier story on our Validation Engine.) But school systems can conduct the same analysis using their own student and teacher data. Second, use quasi-experimental methods to look at the changes in achievement related to the system’s local implementation of evaluation systems. In both cases, many school systems are already collecting very detailed data that can be used to test the validity and effectiveness of their locally adopted approaches.


Oklahoma Implements Empirical’s Observation Engine for Certification of Classroom Observers

Tulsa Public Schools, the Cooperative Council for Oklahoma Administration, and Empirical Education Inc. just announced the launch of Observation Engine to implement the Teacher and Leader Effectiveness program in the state of Oklahoma. Tulsa Public Schools has purchased Empirical Education’s Observation Engine, an online certification and calibration tool for measuring the reliability of administrators assigned to conduct classroom observations. Tulsa Public Schools developed the Tulsa Model for Observation and Evaluation, a framework for ensuring teaching effectiveness performance, as well as best practices for creating an environment for successful learning and student achievement. Nearly 500 school districts in the state are piloting the Tulsa Model evaluation system this year.

In order to support the dissemination of the Tulsa Model, the Cooperative Council for Oklahoma Administration (CCOSA) is training and administering calibration tests throughout the state to assess and certify the individuals who evaluate the state’s teachers. The Tulsa Model is embedded in Observation Engine to deliver an efficient online system for state-wide use by Oklahoma certified classroom observers. Observation Engine is allowing CCOSA to test approximately 2,000 observers over a span of two weeks.

Observation Engine was developed as part of The Bill and Melinda Gates Foundation’s Measures of Effective Teaching project in which Empirical Education has participated as a research partner conducting R&D on validity and reliability of observational measures. The web-based software was built by Empirical Education, which hosts and supports it for school systems nationwide.

For more details on these events, see the press announcement and our case study.


Recognizing Success

When the Obama-Duncan administration approaches teacher evaluation, the emphasis is on recognizing success. We heard that clearly in Arne Duncan’s comments on the release of teacher value-added modeling (VAM) data for LA Unified by the LA Times. He’s quoted as saying, “What’s there to hide? In education, we’ve been scared to talk about success.” Since VAM is often thought of as a method for weeding out low performing teachers, Duncan’s statement referencing success casts the use of VAM in a more positive light. Therefore we want to raise the issue here: how do you know when you’ve found success? The general belief is that you’ll recognize it when you see it. But sorting through a multitude of variables is not a straightforward process, and that’s where research methods and statistical techniques can be useful. Below we illustrate how this plays out in teacher and in program evaluation.

As we report in our news story, Empirical is participating in the Gates Foundation project called Measures of Effective Teaching (MET). This project is known for its focus on value-added modeling (VAM) of teacher effectiveness. It is also known for having collected over 10,000 videos from over 2,500 teachers’ classrooms—an astounding accomplishment. Research partners from many top institutions hope to be able to identify the observable correlates for teachers whose students perform at high levels as well as for teachers whose students do not. (The MET project tested all the students with an “alternative assessment” in addition to using the conventional state achievement tests.) With this massive sample that includes both data about the students and videos of teachers, researchers can identify classroom practices that are consistently associated with student success. Empirical’s role in MET is to build a web-based tool that enables school system decision-makers to make use of the data to improve their own teacher evaluation processes. Thus they will be able to build on what’s been learned when conducting their own mini-studies aimed at improving their local observational evaluation methods.

When the MET project recently had its “leads” meeting in Washington DC, the assembled group of researchers, developers, school administrators, and union leaders were treated to an after-dinner speech and Q&A by Joanne Weiss. Joanne is now Arne Duncan’s chief of staff, after having directed the Race to the Top program (and before that was involved in many Silicon Valley educational innovations). The approach of the current administration to teacher evaluation—emphasizing that it is about recognizing success—carries over into program evaluation. This attitude was clear in Joanne’s presentation, in which she declared an intention to “shine a light on what is working.” The approach is part of their thinking about the reauthorization of ESEA, where more flexibility is given to local decision- makers to develop solutions, while the federal legislation is more about establishing achievement goals such as being the leader in college graduation.

Hand in hand with providing flexibility to find solutions, Joanne also spoke of the need to build “local capacity to identify and scale up effective programs.” We welcome the idea that school districts will be free to try out good ideas and identify those that work. This kind of cycle of continuous improvement is very different from the idea, incorporated in NCLB, that researchers will determine what works and disseminate these facts to the practitioners. Joanne spoke about continuous improvement, in the context of teachers and principals, where on a small scale it may be possible to recognize successful teachers and programs without research methodologies. While a teacher’s perception of student progress in the classroom may be aided by regular assessments, the determination of success seldom calls for research design. We advocate for a broader scope, and maintain that a cycle of continuous improvement is just as much needed at the district and state levels. At those levels, we are talking about identifying successful schools or successful programs where research and statistical techniques are needed to direct the light onto what is working. Building research capacity at the district and state level will be a necessary accompaniment to any plan to highlight successes. And, of course, research can’t be motivated purely by the desire to document the success of a program. We have to be equally willing to recognize failure. The administration will have to take seriously the local capacity building to achieve the hoped-for identification and scaling up of successful programs.


Empirical Education Develops Web-Based Tool to Improve Teacher Evaluation

For school districts looking for ways to improve teacher observation methods, Empirical Education has begun development of a web-delivered tool that will provide a convenient way to validate their observational protocols and rubrics against measures of the teacher’s contribution to student academic growth.

Empirical Education is charged with developing a “validation engine” as part of the Measures of Teacher Effectiveness (MET) project, funded by the Bill and Melinda Gates Foundation. As described on the project’s website, the tool will allow users to “view classroom observation videos, rate those videos and then receive a report that evaluates the predictive validity and rater consistency for the protocol.” The MET project has collected thousands of hours of video of classrooms as well as records of the characteristics and academic performance associated with the students in the class.

By watching and coding videos of a range of teachers, users will be able to verify whether or not their current teacher rating systems are identifying teaching behavior associated with higher achievement. The tool will allow users to review their own rating systems against a variety of MET project measures, and will give real-time feedback through an automated report generator.

Development of the validation engine builds on two years of MET Project research, which included data from six school districts across the country and over 3,000 teachers. Researchers will now use the data to identify leading indicators of teacher practice on student achievement. The engine is expected to undergo beta testing over the next few months, beginning with the National Math and Science Initiative.

Announcement of the new tool comes as interest in alternative ways to measure the effectiveness of teachers is becoming a major issue in education and as federal, state and local officials and teacher organizations look for researched-based ways to identify effective teachers and improve student outcomes.

“At a time when schools are experiencing budget cuts, it is vital that school districts have ready access to research tools, so that they can make the most informed decisions,” says Denis Newman, President of Empirical Education. The validation engine will be part of a suite of web-based technology tools developed by the company, including [MeasureResults, an online tool that allows districts to evaluate the effectiveness of the products and programs they use.


2010-2011: The Year of the VAM

If you haven’t heard about Value-Added Modeling (VAM) in relation to the controversial teacher ratings in Los Angeles and subsequent brouhaha in the world of education, chances are that you’ll hear about it in the coming year.

VAM is a family of statistical techniques for estimating the contribution of a teacher or of a school to the academic growth of students. Recently, the LA Times obtained the longitudinal test score records for all the elementary school teachers and students in LA Unified and had a RAND economist (working as an independent consultant) run the calculations. The result was a “score” for all LAUSD elementary school teachers. Note that the economist who did the calculations wrote up a technical report on how it was done and the specific questions his research was aimed at answering.

Reactions to the idea that a teacher could be evaluated using a set of test scores—in this case from the California Standards Test—were swift and divisive. The concept was denounced by the teachers’ union, with the local leader calling for a boycott. Meanwhile, the US Secretary of Education, Arne Duncan, made headlines by commenting favorably on the idea. The LA Times quotes him as saying “What’s there to hide? In education, we’ve been scared to talk about success.”

There is a tangle of issues here, along with exaggerations, misunderstandings, and confusion between research techniques and policy decisions. This column will address some of the issues over the coming year. We also plan to announce some of our own contributions to the VAM field in the form of project news.

The major hot-button issues include appropriate usage (e.g., for part or all of the input to merit pay decisions) and technical failings (e.g., biases in the calculations). Of course, these two issues are often linked; for example, many argue that biases may make VAM unfair for individual merit pay. The recent Brief from the Economic Policy Institute, authored by an impressive team of researchers (several our friends/mentors from neighboring Stanford), makes a well reasoned case for not using VAM as the only input to high-stakes decisions. While their arguments are persuasive with respect to VAM as the lone criterion for awarding merit pay or firing individual teachers, we still see a broad range of uses for the technique, along with the considerable challenges.

For today, let’s look at one issue that we find particularly interesting: How to handle teacher collaboration in a VAM framework. In a recent Education Week commentary, Kim Marshall argues that any use of test scores for merit pay is a losing proposition. One of the many reasons he cites is its potentially negative impact on collaboration.

A problem with an exercise like that conducted by the LA Times is that there are organizational arrangements that do not come into the calculations. For example, we find that team teaching within a grade at a school is very common. A teacher with an aptitude for teaching math may take another teacher’s students for a math period, while sending her own kids to the other teacher for reading. These informal arrangements are not part of the official school district roster. They can be recorded (with some effort) during the current year but are lost for prior years. Mentoring is a similar situation, wherein the value provided to the kids is distributed among members of their team of teachers. We don’t know how much difference collaborative or mentoring arrangements make to individual VAM scores, but one fear in using VAM in setting teacher salaries is that it will militate against productive collaborations and reduce overall achievement.

Some argue that, because VAM calculations do not properly measure or include important elements, VAM should be disqualified from playing any role in evaluation. We would argue that, although they are imperfect, VAM calculations can still be used as a component of an evaluation process. Moreover, continued improvements can be made in testing, in professional development, and in the VAM calculations themselves. In the case of collaboration, what is needed are ways that a principal can record and evaluate the collaborations and mentoring so that the information can be worked into the overall evaluation and even into the VAM calculation. In such an instance, it would be the principal at the school, not an administrator at the district central office, who can make the most productive use of the VAM calculations. With knowledge of the local conditions and potential for bias, the building leader may be in the best position to make personnel decisions.

VAM can also be an important research tool—using consistently high and/or low scores as a guide for observing classroom practices that are likely to be worth promoting through professional development or program implementations. We’ve seen VAM used this way, for example, by the research team at Wake County Public Schools in North Carolina in identifying strong and weak practices in several content areas. This is clearly a rich area for continued research.

The LA Times has helped to catapult the issue of VAM onto the national radar. It has also sparked a discussion of how school data can be used to support local decisions, which can’t be a bad thing.