I’ve a dear friend who has been in poor health for many years. What, exactly, is wrong is unclear. Over the decades, diagnoses have come and gone, as have various treatments—and the value of some seems far-fetched. When a new treatment is particularly odd and I raise an eyebrow, she invariably informs me that the research supporting the intervention is quite sound. She’s a bright woman and is well educated—in fact, she’s a registered nurse. Yet I suspect that if I reviewed the “research,” I would find that the study linking, say, concentrated pomegranate tea to the cure of psoriasis relied on only five subjects, or was performed in a research laboratory no one has heard of before or since, or was conducted by the PGA (Pomegranate Growers Association—quit thinking about golf and read my column), or that the finding is based on 54 of 100 psoriasis sufferers having been cured (hardly persuasive), or that the result came from a massive statistical study correlating 250 conditions and 1,235 environmental/lifestyle factors (some of which will appear to be associated just by chance).
Or consider coffee. Ubiquitous in lists of lifestyle factors, it seems that coffee is found to be alternately bad for you, neutral, or good for you in each successive year. (The latest news being good, I choose to regard the matter as settled.)
How do we judge the quality of research claims? This is a difficult problem. The Food and Drug Administration deemed the research supporting pain reliever Vioxx to be persuasive enough to justify widespread use of the drug. Merck was making $2.5 billion annually off the drug before it became apparent that Vioxx increased the risk of heart attack among those taking it for extended periods. Some question whether Merck falsified results. Even under the rigorous rules of drug testing, however, it is difficult to test for every possible outcome.
The challenges of evaluation can be equally complex in the human services. From a relatively short-term and measurable outcome—e.g. “Did the pain go away?”—we move to a much less tangible and often distant outcome that that is hard to measure—e.g. “Is the child better able to cope with bullying at school?”
Why must we evaluate human service programs? Is it not sufficient that we’ve fed the hungry and clothed the poor? When funds are scarce—and when are they not—we are obligated to concentrate our resources where they will be best used. There are better and worse approaches to feeding the hungry, although we might not assess the impact of outcomes (“how much less hungry?”) as we do assess a process outcome, e.g. persons served per dollar expended.
We are tempted to apply the medical model to the human services. Yet this is not always appropriate, technically achievable, affordable, or, in some cases, even ethical.
CGR constantly struggles with this problem. As an example, consider our ongoing efforts on behalf of the Hillside Family of Agencies to evaluate the Hillside Work-Scholarship Connection (HWSC). The outcome is clear—does the child graduate or not? (Or is it? My colleague Erika Rosenberg discusses the various flavors of graduation rates in our Education Notebook at http://education.cgr.org/). Question: Do we conclude that the HWSC program is successful if students participating in HWSC graduate at a higher rate than the City School District as a whole? Answer: Only if students participating in the program are similar to the rest of the student body in all important respects. As it happens, they aren’t similar enough to adopt such a simple test of effectiveness. Good evaluation practice requires that we construct a “control group” from among RCSD students who did not participate in the program. We’re nearing the end of a new phase of evaluation that relies on a more robust procedure for selecting a control group than the procedure we’ve used in the past. Even with the more robust procedure, absolute certainty will elude us. When the new results are released later this year, skeptics (regardless of the conclusion) will be able to raise technical questions about our approach. Whether we are working to evaluate housing initiatives, programs targeting homelessness, teen pregnancy or charter school performance, these challenges persist.
How much certainty should funders require? There has been a drive within major foundations and the federal government to limit funding to programs with ample evidence of effectiveness, often termed “evidence-based” programs or practices. Do such restrictions force out new and innovative ideas as we support only the programs and practices with a body of acceptable evidence supporting their efficacy? Do we drive money away from activities that are nearly impossible to measure rigorously, either because of small treatment groups or by the nature of the intervention? Remember the story of the drunk searching for his keys under the lights in a parking lot? Asked if that was where he’d lost them, he replied, “No, but the light is better here.” If you are interested in helping us explore the challenge of evidence in the human services, join us on June 15 for a community forum (see www.cgr.org for more information).
We must do a better job measuring outcomes and rigorously assessing the evidence that supports community interventions, be they aimed at reducing violence, challenging the epidemic of childhood obesity, cutting teen pregnancy rates, improving literacy, or enhancing parenting skills among new mothers. Yet we also must recognize the limits of evaluation and recognize that certainty of impact can elude us even with the best of procedures.
Kent Gardner, Ph.D. President & Chief Economist
Published in the Rochester (NY) Business Journal May 14, 2010