Two Kinds of Evaluations in Health Management Studies
by Scott MacStravic
There are clearly two different kinds of studies being made and reported relative to proactive health management (PHM)efforts aimed at preventing or reducing either the incidence and prevalence of disease (wellness and risk reduction) or reducing the crises, complications and worsening of existing chronic disease (disease management). First are the rigorous, random control trials and analyses conducted by the “scientific community” which tend to find little support for the economic benefits of PHM efforts. Second are the analyses performed by PHM providers and customers, which tend to show significant economic benefits.
Different Measurement Methods
The two types of studies tend to reflect different approaches to measuring results. Provider and customer studies usually rely on either simple before/after comparisons of the same population, or on side-by-side comparisons of participants in PHM efforts vs. non-participants. Both approaches run serious risks of overstating results, though for different reasons.
Before/after studies are likely to suffer from “regression to the mean” effects that look like PHM effects. When people are targeted for PHM interventions because of having had high levels of health care costs or recent diagnosis of health problems in one year, there is a natural tendency for their costs of care to go down in the next year. People do not normally repeat “outlier” levels of expense in every year; the first year of diagnosis and treatment, or one at an outlier level tend to be the highest they will have for a while.
This tendency to move toward the average level of costs is too often used by PHM providers or believed by PHM customers to reflect the effects of PHM efforts. In many cases, this can multiply actual effects of such efforts by a factor of two or more, however. The only way to overcome this exaggeration is to randomly assign some people identified as high-cost or high-risk to different “treatments”, i.e. one group to PHM interventions, and the other to “usual care” without such interventions.
Side-by side comparisons of PHM participants vs. non-participants over the same period, e.g. a baseline year then an “intervention year” after the PHM intervention is adopted, suffer from a different but equally dangerous tendency. When people can choose for themselves whether or not to participate in a PHM program, the tendency is consistently for those who are more health-conscious, more motivated, etc. to “self-select” themselves into participation, while those less so self-select themselves into non-participation. Thus the “results” measured will partly reflect the differences that already existed between participants and non-participants, along with any differences caused by participation per se.
The only way to eliminate this “self-selection bias” is to randomly assign participants, rather than permit self-selection. This can be done by only offering PHM to half the target population, with incentives to achieve very high participation among them, while following the other half not offered as a control group. This can easily be done with a population of members in a health insurance plan, but not so easily in members of an employee population, unless there are multiple locations that can easily be handled separately.
When both self-selection bias and regression to the mean problems are controlled for, PHM results have a far greater chance or reflecting the effects of the PHM intervention, rather than those of unrelated causes. There are other measurement and attribution issues that may also cloud evaluations, though these two are the most common and usually the most serious. And when providers want their customers to be happy, and customers want to avoid the appearance of having made a bad investment, it is often easier to accept the “unscientific”
good results, rather than using an approach that has a far greater risk of bad results.
Who and What Are Measured
This “bifurcation” of methods has produced two different kinds of study results, with the “scientific” studies significantly less enthusiastic than their provider/customer counterparts. But there is a second reason for the differences in enthusiasm between these two kinds of results. Most of the scientific studies seem to have been made of Medicaid and Medicare beneficiaries, and have been sponsored by government agencies. These reflect not only differences in evaluation methods, but different populations.
Provider/customer studies mostly reflect the experiences of commercially insured populations who are covered by their employers’ health insurance benefits, or are self-insured by their employers. These tend to be significantly younger than Medicare populations, and therefore less likely to have multiple chronic conditions that cause extreme levels of health care expenditures. And not only are most Medicaid populations not employed, they tend to include a large number who become eligible specifically because they are women who are pregnant, and are therefore much more likely to have high healthcare costs for that reason alone, but lower costs after their babies are born.
This means that population differences between commercially insured and governmentally insured populations could easily explain why the results of PHM interventions are less significant, perhaps even cost more so make net cost savings difficult to find. And there is another major difference between employed and unemployed populations. When examining unemployed populations, the only cost savings considered are reductions in healthcare use and expenditures. When examining employed populations, cost savings can include all the health-related effects on absences, disability and workers compensation costs, productivity, quality, job performance and even revenue impacts, not merely healthcare costs.
Studies of employed populations have tended to show that the “indirect” costs of health risks and the acute and chronic conditions that arise as a result of such risks, as well as those already present, tend to add from two to five times as much additional costs or lost opportunities for gains than they do in healthcare costs alone. This means that evaluations of employed populations could miss the total benefits of PHM interventions by as much as 80%, and reach negative or equivocal conclusions that are “biased” by as serious a problem as is the case when self-selection bias or regression to the mean problems exist.
What should occur in future evaluations of PHM interventions is a combination of more scientific and rigorous measurement methods being used with the inclusion of the full economic effects of such interventions, including those when employed populations are involved. Moreover, rather than combining studied of different PHM approaches, the studies that show with scientific rigor that a particular PHM intervention model works well should then be reported as a success, rather than clouded by inclusion of different “treatments” that were not successful. Then the successful models can be tested with other populations to see if they work repeatedly, as well.
The combination of better measurement methods and selection of PHM interventions proven successful will enable the development of an “evidence-based medicine” foundation for the future development and application of the best methods, rather than muddled mixes of a wide range of different PHM approaches. This can then result in a science-based improvement of PHM methods and results, rather than confusing and unjustified conclusions of either success or failure that add little or nothing to the development of PHM and the improvement of the health of populations and the reduction of sickness burdens.





