SHAP Attribution

SHAP (SHapley Additive exPlanations) is the attribution method used to decompose model predictions into per-feature contributions. This page explains how SHAP values are computed, interpreted, and aggregated into the factor metrics displayed in the application.

What is SHAP?

SHAP is a game-theoretic approach to explaining machine learning predictions. For each prediction, SHAP assigns each feature a value representing its contribution to the difference between the prediction and the average prediction (baseline).

Key properties of SHAP values:

Additivity: SHAP values for all features sum exactly to the difference between the prediction and baseline
Consistency: If a feature’s contribution increases in a new model, its SHAP value will not decrease
Local accuracy: The sum of SHAP values plus baseline equals the prediction

TreeExplainer

For tree-based models like XGBoost, SHAP provides TreeExplainer, which computes exact SHAP values efficiently by leveraging the tree structure. This is computationally tractable and provides exact (not approximate) attributions.

TreeExplainer traces all possible paths through the tree ensemble, weighting each path by its probability of being traversed for different feature subsets. The result is a precise accounting of each feature’s contribution.

Log-Space Interpretation

Because models predict log(RoL), SHAP values are computed in log space. A SHAP value of 0.05 in log space corresponds to a multiplicative effect on RoL.

Converting to Percentage Impact

To express SHAP values as percentage changes:


% change = (exp(SHAP_value) - 1) × 100

Log-space SHAP	Percentage Impact
+0.10	+10.5%
+0.05	+5.1%
0.00	0%
-0.05	-4.9%
-0.10	-9.5%

This conversion ensures that reported impacts are intuitive: a factor with +10% impact is associated with RoL approximately 10% higher than baseline.

Importance Score Calculation

The importance score for each factor represents its share of total pricing variation explained:

Step 1: Mean Absolute SHAP

For each feature, compute the mean of absolute SHAP values across all policies in the peer group:


mean_abs_shap[feature] = mean(|SHAP_values[feature]|)

Absolute values are used because a feature can be important regardless of whether its effect is positive or negative.

Step 2: Normalization

Normalize scores so they sum to 1.0:


importance[feature] = mean_abs_shap[feature] / sum(mean_abs_shap)

Interpretation

An importance score of 0.15 means the feature explains approximately 15% of the RoL variation within the peer group. Higher importance indicates the factor has more influence on pricing differences.

Average Impact Calculation

The average impact shows whether a factor tends to increase or decrease RoL on average:


avg_impact[feature] = mean(SHAP_values[feature])

Unlike importance (which uses absolute values), average impact preserves sign:

Positive average impact: The feature tends to increase RoL in this peer group
Negative average impact: The feature tends to decrease RoL in this peer group
Near-zero average impact: Effects may be large but balanced (some increase, some decrease)

Value Breakdowns

To show how a factor’s impact varies across different values, the system computes value-level breakdowns.

Categorical Features

For categorical features (state, carrier, etc.), impacts are computed per category:


avg_impact[feature=value] = mean(SHAP_values[feature] where feature == value)

This shows, for example, that policies in Florida have +15% impact on RoL while policies in Ohio have -5% impact.

Numeric Features (Quartile Breakdowns)

For numeric features (unit count, year built, etc.), values are grouped into quartiles:

Quartile	Range
Q1	0-25th percentile
Q2	25-50th percentile
Q3	50-75th percentile
Q4	75-100th percentile

Average impact is computed for each quartile:


avg_impact[feature=Q1] = mean(SHAP_values[feature] where feature in Q1 range)

This reveals patterns such as “newer buildings (Q4 year built) have -8% impact on RoL.”

Boolean Features

For boolean features (coverage flags), impacts are computed for true and false:


avg_impact[feature=true] = mean(SHAP_values[feature] where feature == true)
avg_impact[feature=false] = mean(SHAP_values[feature] where feature == false)

Limitations

Correlation vs. Causation

SHAP values indicate statistical association, not causation. A positive SHAP value for a feature means policies with that characteristic tend to have higher RoL, but other correlated factors may be the actual pricing drivers.

Peer Group Specificity

SHAP values are computed for a specific peer group. Results should not be generalized to different populations without running new analysis.

Sample Size Effects

With small peer groups, SHAP values may be noisy. The system enforces minimum sample sizes and suppresses results when data is insufficient for reliable attribution.

Feature Interactions

Standard SHAP values do not explicitly report feature interactions. When two features interact strongly (e.g., state and wind coverage), the interaction effect is distributed between them according to Shapley value principles.