Elmar Alizadeh, Stella Arndorfer, Ali Ruderian, Devin Incerti
Simulation models often map hundreds of inputs to outcomes in complex ways, making it challenging to grasp the influence of each input on the results. Model-based decision analysis consequently often faces criticism for its perceived lack of transparency and robustness. Traditionally, one-way sensitivity analysis has been utilized to assess how variation in individual model parameters impacts model outputs, with results often being reported via a tornado diagram. While this simplistic univariate analysis may help identify the most influential inputs, its positive impact on model interpretability is limited because it (1) does not consider interactions between inputs, (2) is often restricted to a manually selected small subset of inputs, and (3) typically varies inputs in arbitrary ways (e.g., +\- 20%). Conversely, SHapley Additive exPlanations (SHAP) analysis uses Nobel prize winning game theoretic concepts that are over 70 years old to provide a more nuanced understanding of variable importance. SHAP analysis models the joint distribution of all model inputs to quantify the influence of every input across all of its possible values and all possible interactions with the other inputs.
In this post, we discuss an applied, real-world example in which we used EntityRisk’s PROVENTMsoftware to conduct a SHAP analysis to enhance our understanding of results from a generalized cost-effectiveness analysis (GCEA).
Analysis Purpose and Hypothesis
The Institute for Clinical and Economic Review (ICER) recently shared a final Evidence Report assessing the value of ensifentrine (Verona Pharma) for the treatment of chronic obstructive pulmonary disease (COPD). In this report, ICER developed a cost-effectiveness model (CEM) to simulate the costs and health outcomes associated with treatment compared to standard of care (SoC). This model served as the foundation for a traditional cost-effectiveness analysis (CEA), which ICER utilized to estimate the maximum annual value-based price (VBP)1 of ensifentrine under several willingness to pay per quality-adjusted life-year (QALY) thresholds.
1 A VBP is the maximum net price that can be charged for a drug to still be cost-effective at a given willingness to pay. Put differently, it is the net price in which the total discounted incremental monetary benefits of treatment are equal to the total discounted incremental costs.
As we discussed in our recent blog post, there has been an increased use of GCEA to offer more holistic modeling strategies in response to the restrictive assumptions of traditional CEA. Utilizing the PROVENTM platform, EntityRisk implemented a CEM similar to ICER’s while also including novel GCEA value elements.
Ex ante, the joint impact on VBPs of including all GCEA value elements is difficult to predict in both magnitude and direction. However, when certain GCEA elements are added individually, it is possible to reason through the direction of their potential impact on VBPs. For example, dynamic pricing models allow the drug acquisition cost of ensifentrine to drop by a substantial percentage after loss of exclusivity due to patent expiry; the incorporation of this GCEA value element should almost certainly raise ensifentrine’s VBP, but the magnitude of this positive impact potentially depends on multiple other model inputs.
We utilized SHAP analysis to formally investigate the sensitivity of dynamic pricing’s VBP impact to model inputs. We hypothesized that the inputs that impact mortality, such as starting age of the cohort or disease-related mortality, should also influence the impact of including dynamic pricing on VBP; for instance, older cohorts should dampen the effect of dynamic pricing because higher background mortality results in a higher proportion of the cohort entering the death state before they can benefit from the reduced ensifentrine drug acquisition cost. We tested this hypothesis by utilizing our EvidenceMLTM software to run a SHAP analysis on a ΔVBP outcome, where Δ represents the absolute difference in VBPs such that:
ΔVBP = VBP2 – VBP1
where VBP1 represents the VBP under a traditional CEA framework and VBP2 represents the VBP under traditional CEA with dynamic pricing (i.e., a limited form of GCEA).
Model inputs were related to the ΔVBP by training a machine learning metamodel and evaluating it on a held-out test set to ensure that its predictions were accurate. Once trained and evaluated, the model inputs (i.e., features) of the metamodel were used to predict the change in the VBP associated with any new set of model inputs. Our primary finding reveals the average marginal contributions of the most influential model parameters to the ΔVBP.
Data
The dataset used to train the metamodel was generated from ΔVBPs simulated using EntityRisk’s CEM. The CEM was run for each of 10,000 Monte Carlo draws of the model parameters, which produced a dataset of size 10,000. 8000 rows were used to train the model and the remaining 2000 were held out for testing. All model features were included in the analysis, except for those that had deterministic distributions (i.e., their values were assumed constant across simulation draws). Figure 1 displays the distribution of ΔVBPs across all rows of the training data. Variation reflects differences in model output across simulation draws of the CEM parameters.
Figure 1: Distribution of the change in the value-based price for the training data

Model Training and Evaluation
Since simulation models are highly non-linear and involve interactions between the model inputs, we used extreme gradient boosting (xgboost) which iteratively combines multiple regression trees. Extreme gradient boosting has historically outperformed other algorithms with tabular data, including complex deep learning algorithms. The model was evaluated from predictions on the held-out test set. The primary metric used for evaluation was the R-squared, which was quite high, at 0.9995. This suggests that the metamodel almost perfectly maps the model inputs to the model output of interest (i.e., ΔVBP) and that the SHAP analysis can be reliably used to interpret the CEM. Note that this is not unique to this model: since the data itself is simulated, it can always be made large enough to facilitate training of a machine learning model complex enough to achieve near perfect predictive accuracy.
Model Interpretation
Parameter Importance
A common way to summarize the SHAP values across datasets—and measure feature importance—is to compute the mean (across the dataset’s rows) of the absolute SHAP values; these values represent the average absolute marginal contribution to change in the VBP. For example, SHAP values were computed for each of the 800 rows in the training dataset, and Figure 2 displays the complete distribution of these values (where each dot represents a single row from the training dataset) for the 10 inputs with the highest mean absolute SHAP values. Features exhibiting greater variation in SHAP values are therefore deemed to have a more significant impact.
We posited that the parameters impacting drug costs—those linked to mortality—would have the greatest impact on the ΔVBP. Our analysis is somewhat consistent with this hypothesis, as 3 of the top 6 parameters are indeed related to mortality: the probability of death in the GOLD 4 health state (the most severe stage of COPD), the probability of death in the GOLD 3 health state (the second most severe stage of COPD), and age. However, it is noteworthy that 3 of the top 5 parameters relate to health-related quality of life (HRQoL). Notably, the most influential parameter is the HRQoL value for the GOLD 3 health state, with the next closest parameter exhibiting a mean absolute SHAP value that is approximately 50% lower.
Although the finding that HRQoL is the most impactful may appear counterintuitive since HRQoL does not impact drug or non-drug costs, it is, in fact, quite logical. The VBP increases as a function of both the incremental monetary benefit (IMB) and the anticipated price decline due to patent expiration. If a change in an input, such as HRQoL, increases the IMB, then the price decline necessary to maintain the original VBP diminishes. Parameters that either (1) boost the IMB or (2) accelerate the price decline will exert the most significant influence.
Direction of Association
While variation in SHAP values offer valuable insights into the influence of parameters, they do not reveal the direction of the relationship between parameter values and the ΔVBP To address this, we color each dot in the beeswarm plot so that the direction of the relationship can be assessed. The colors represent the standardized values of the features: lower values are colored in blue and higher values in red. Moreover, negative SHAP values indicate that a feature’s contribution decreased the ΔVBP, whereas positive values indicate an increase.
To illustrate, consider the most impactful parameter, “HRQoL: GOLD 3”. It has a stronger impact on ΔVBP (i.e., it’s SHAP values are higher) when its values are higher (indicated by the color red in the figure). As explained above, this suggests that higher values of GOLD 3 HRQoL correspond to a higher IMB. This result likely occurs because ensifentrine keeps patients in GOLD 3 longer than SoC (i.e., SoC patients transition to GOLD 4, the most severe stage of COPD, at a faster rate), and so a higher GOLD 3 HRQoL should result in higher incremental QALYs (and therefore IMB) for ensifentrine relative to SoC. The exact same reasoning applies to the “HRQoL: GOLD 2” parameter; GOLD 2 is a less severe stage of COPD that precedes GOLD 3, and ensifentrine keeps patients in GOLD 2 (i.e., not worsening to GOLD 3) longer than SoC.
Conversely, the opposite reasoning applies to the “HRQoL: GOLD 4” parameter. The beeswarm plot indicates that higher values of GOLD 4 HRQoL result in a lower ΔVBP. This result likely occurs because SoC patients spend more time in GOLD 4 than esifentrine patients, and so a higher GOLD 4 HRQoL parameter lowers the incremental QALYs of ensifentrine relative to SoC, which lowers the IMB and the ΔVBP.
Figure 2: Beeswarm plot of the distribution of SHAP values for the 10 most influential model parameters

Shape of Association
One limitation of the beeswarm plot is that it does not capture the shape of the relationship between the ΔVBP and model features. In contrast, partial dependence plots (PDPs) show the relationship between a target variable and each feature, conveying both its shape (e.g., linear, non-linear) and direction (e.g., positively related, negatively related). The relationships between the six most influential model features and ΔVBP are displayed in Figure 3. The impacts of the HRQoL variables are consistent with the beeswarm plot above, but we can now observe the relationship between values of the HRQoL parameters and the magnitude of impact on the average ΔVBP prediction. Most notably, certain values of the “HRQoL: GOLD 3” feature flip ΔVBP from positive to negative (e.g., a value of 0.331 results in a ΔVBP of -$4); this implies that at a low enough value of this parameter, the incremental HRQoL (and therefore IMB) of ensifentrine relative to SoC is negative. That said, this sign change only occurs at extremely low values for this parameter, and predicted ΔVBP never crosses 0 for any variation in any of the other model parameters.
Figure 3: Predicted mean change in value-based price as a function of the six most influential model parameters

Conclusion
In this analysis, we utilized EntityRisk’s PROVENTM software to assess the sensitivity of dynamic pricing’s VBP impact to the model inputs. As hypothesized, starting age and disease-related mortality emerged as some of the most influential features in the model; however, we were surprised to find that HRQoL played an even more crucial role. This unexpected finding prompted us to conduct additional internal validation checks, and after ruling out any errors, we gained greater confidence in our model and an enriched understanding of dynamic pricing. Furthermore, despite wide parameter sampling ranges, the magnitude of their impact was never substantial enough to result in a negative effect of dynamic pricing on the VBP; that is, dynamic pricing always resulted in a higher VBP for ensifentrine. This implies that the positive impact of dynamic pricing on ensifentrine’s VBP is substantial and relatively stable for all realistic values of model parameters for a COPD cohort. This example is not a special case: SHAP offers a powerful framework for interpreting simulation models, supporting internal validation, and assessing robustness. Unconventional outcomes (e.g., VBP differences under alternative CEA frameworks) can be readily explored using PROVENTM. Future analyses can adopt a similar approach to explore other GCEA value elements (e.g., stacked cohorts) and identify the GCEA elements that are most consistently impactful across a range of therapeutic areas.