Statistical vs. Social Significance

January 2017

The error of treating statistical significance as ipso facto equivalent to social significance seems to be standard practice in the program evaluation domain.

– David Hunter, Hunter Consulting, LLC

Statistical significance and policy relevance are not the same thing. One has to reach a judgment about any project’s results by separately considering both forms of significance.

– Gordon Berlin, President, MDRC

Overview

The July 2015 exchange among ambassadors about statistically significant impact vs. social (or policy) value in many ways reflects the internal debate over the hot-button issue of external evaluation during the development of “The Performance Imperative.” Ultimately, ambassadors robustly endorsed external evaluation as Pillar 7 of the Performance Imperative, while making clear that no one type of evaluation is right for every organization at every stage of organizational development. Similarly, ambassadors both questioned and offered thoughtful explanations of the value of statistical significance, while agreeing that it is not the same as social value and that additional context and considerations are necessary to ensure socially valuable outcomes.

The full context of each ambassador’s contribution to the exchange is listed in the Appendix.

Background/The Issue at Hand

In a community post on July 11, Mario Morino solicited ambassadors’ thoughts on the Social Innovation Research Center’s (SIRC) report, “Social Innovation Fund: Early Results Are Promising,” which described, with reasonable qualification, positive evaluation findings for several SIF-funded programs. David Hunter’s concern with the report centered on the way in which SIRC interpreted the results of MRDC’s evaluation of Reading Partners, a volunteer tutoring program for kids who are 6-30 months behind their age-mates in reading. While taking no issue with the quality of MRDC’s research, which found “statistically significant impacts…equivalent to approximately one-and-a-half to two months of additional progress in reading,” David critiqued SIRC for concluding, “the Reading Partners model is effective.” As David stated, “I want to focus on one issue: the difference between statistical significance and social significance…and how very frequently (including in this instance) the former is treated as if it were inherently the latter.“

David illustrated his point further:

If our child was 30 months behind his or her age cohort in reading, would we be happy if that child caught up 2 months (maximum) after participating in a 28-week reading improvement program? He or she would still be 28 months behind and, I think it is fair to say, still at a great educational disadvantage….In other words, the report does not address whether the statistically significant impacts of Reading Partners has any social value.

Context and Considerations

Three ambassadors– Gordon Berlin, Kris Moore, and Mari Kuraishi–responded to David’s critique by offering additional considerations for assessing the value of statistical vs. social significance.

Statistical and social significance must be considered separately

Gordon Berlin: “Statistical significance and policy relevance are not the same thing. One has to reach a judgment about any project’s results by separately considering both forms of significance.”

Statistical significance represents only an average, and, like any average, there is a lot of dispersion around that mean. “Some children gained more, some children gained nothing; in fact, some might have lost ground. The analysis and the numbers do not really tell us anything about individual children.”
When the control group also gets extra attention, the social value of statistically significant findings increases. “In analyzing the results of an experiment one always wants to understand the control group context, keeping in mind that the program’s impact is driven by the treatment difference between those in the program group and those in the control group. In this case, both groups of children got extra attention and resources above and beyond what they would have normally gotten. That RP still made a difference despite the fact that control group children also received school provided special resources, adds to the author’s and MDRC’s confidence that RP is policy relevant.”

Statistically significant reading outcomes may signal greater value (in science, math, behavior) than a single study can determine

Kris Moore: “While I agree with these cautions, I want to add another consideration, which is that the ability to read, even a little better, makes it more likely that a child will be able to do science and math problems, and perhaps they are more likely to read on their own. And, hopefully, if school is less frustrating, a child might even behave a little better. A longitudinal follow-up with data on exploratory (as well as confirmatory) outcomes could answer these kinds of questions.”

Statistical significance has value to the individual student making some gains (private good), even if insignificant in the broader scheme of things (social good)

Mari Kuraishi: “This also fits the classic economics divide of private good vs. social good…As a parent with a kid with an IEP, I know that ANY gain is sought after and appreciated. But it is frustrating not to know how to get the gains faster and or cheaper.”

Upon review of the exchange, Mari offered this additional explanation:
“Markets work in private goods because ‘in theory’ there are multiple suppliers (in non-monopoly situations) that match up to multiple demanders, each with their own utility function reflected in different willingness to pay, and therefore different clearing prices. So in this example that David sets up, ‘If our child was 30 months behind his or her age cohort in reading, would we be happy if that child caught up 2 months (maximum) after participating in a 28 week reading improvement program?’ Different families might be willing to pay different prices for this intervention that results in a net 10-week gain, or 2+ month gain. Positing that repeating this intervention has cumulative results (completely unrealistic, I know) a parent might be wiling to pay the price 14 times over to get their kid caught up because the gain to them is a kid that’s caught up, and hypothetically able to participate in public education with no supports etc.

“This equation becomes a lot harder to balance when we have to think about this in the public good context because we are not asking individual households to define their own utility functions and letting them choose their own price point–the decision has to be made for them all as a group (and frequently the households are not consulted in choice of intervention in any event), and moreover the households aren’t paying directly for this intervention. They may be paying indirectly via taxes etc, but they are not at liberty to set price. And in this example we have no idea of alternative interventions, their cost and efficacy, as Gordon points out below.”

A Point of Contention: The Cost Factor

Gordon and David agreed that whether a program yields statistically significant evaluation results is a different question than whether it has policy relevance. Yet, they offered different perspectives when cost considerations are added to the equation.

Gordon Berlin: When cost is low, the policy value of modest statistical significance increases. “In considering policy relevance, cost matters. And because Reading Partners relies on volunteers, the cost is low. If policymakers can get a two month boost on average for all served children for very low cost, in the resource-constrained world we inhabit, policy relevance would rise. Benefits outweigh the modest costs.”

David Hunter: Is scaling inexpensive programs that get only modest results ultimately good for society? “Is it really good for our society and its people to have relatively inexpensive programs that produce weak results at best? Results that are unlikely to exert a meaningful influence over the life prospects of the people who most need a helping hand? Wouldn’t we want our social policy analysts to take this wider view? And might it not well be true that paying for inexpensive but weak programs will, in the end, cost our society more (in lost opportunities for people who needed more robust help than the weak programs provide–and the loss in contributions they might have made had they received such help)?

“Consider the example of home visitation programs for single mothers. If there were someone in your family who qualified for such a program, would you want her to receive services from the Nurse-Family Partnership or Healthy Families America? In the former there are proven and significant (life-changing) impacts both for the mother and sixteen years later for their children. In the latter, there also are proven impacts–but the effects are very modest in comparison and don’t seem to be strong enough to change life trajectories. The Nurse-Family Partnership is more expensive. Is that a good reason for policy analysts and makers to favor Healthy Families America when looking to spend public moneys?”

Resources for Understanding Social Value

Dean Fixsen responded by offering several references from social science that go beyond RCTs/statistical significance to help “describe and assess social validity.” As he stated, “Present day views of social sciences have wed themselves to randomized group designs as the only way of knowing and in that mode statistical significance is the only test of what matters. Science is broader than that, and any science aimed at improving human service outcomes definitely goes far beyond that narrow view.” References recommended by Dean (with links added):