Model Validation

Our factor analysis models undergo rigorous statistical validation to ensure reliable, unbiased results. This page describes our validation methodology and current model performance.

Validation Framework

We employ a comprehensive validation framework that meets actuarial and statistical best practices:

Statistical Significance Testing

Every factor included in our models is tested for statistical significance. We compute:

Pearson correlation with the target variable (log RoL)
P-values to confirm significance at the 0.05 level
Effect direction validation to ensure factors behave as expected actuarially

Factors that do not meet statistical significance thresholds across models are removed or flagged.

Multicollinearity Analysis

We compute Variance Inflation Factor (VIF) scores for all features to detect multicollinearity. High multicollinearity can cause unstable attributions, so we:

Flag features with VIF > 10 (high multicollinearity)
Monitor features with VIF 5-10 (moderate multicollinearity)
Document known correlations that reflect business realities

Cross-Validation

All models use 5-fold cross-validation to assess generalization:

Data is split into 5 equal parts
Each fold is used once as validation data
R² is computed for each fold
Mean and standard deviation across folds indicate stability

Low standard deviation indicates the model performs consistently across different data subsets.

Residual Analysis

We analyze model residuals to detect systematic issues:

Bias check: Mean residual should be near zero
Heteroscedasticity: Error variance should be roughly constant
Distribution: Residuals should be approximately symmetric

Current Model Performance

Cross-Validation Results

Model	R² (CV)	CV Std Dev	Interpretation
Package	0.52	0.027	Moderate-strong explanatory power
Property	0.58	0.013	Strong explanatory power
Liability	0.68	0.008	Strong explanatory power

What these numbers mean:

R² of 0.52-0.68 indicates the models explain 52-68% of RoL variation
Low standard deviation (0.008-0.027) shows stable, consistent performance
These values are typical for insurance pricing models with diverse portfolios

Bias Assessment

All models pass our bias checks with mean prediction error below 0.2%:

Model	Mean Error	Status
Package	+0.08%	Unbiased
Property	-0.18%	Unbiased
Liability	+0.03%	Unbiased

This confirms the models do not systematically over- or under-predict RoL.

Error Variance Stability

We verify that prediction errors are consistent across the RoL range:

Model	Variance Ratio	Status
Package	1.13	Stable
Property	2.11	Moderate variation
Liability	1.22	Stable

The Property model shows somewhat higher error variance for low-RoL predictions. This is documented and users should interpret low-RoL property results with appropriate caution.

Early Stopping and Overfitting Prevention

Our models use aggressive regularization and early stopping to prevent overfitting:

Shallow trees (max depth 3) limit model complexity
Minimum samples per leaf (10) prevents fitting to noise
Row and column subsampling (60%) adds randomness
L1/L2 regularization smooths predictions
Early stopping halts training when validation performance plateaus

The gap between training and cross-validation R² (7-8%) indicates good generalization without significant overfitting.

Factor Significance Summary

Package Model (20 factors)

19 factors statistically significant (p < 0.05)
1 factor borderline (retained for consistency across models)

Property Model (17 factors)

13 factors statistically significant (p < 0.05)
4 factors borderline or model-specific

Liability Model (15 factors)

14 factors statistically significant (p < 0.05)
1 factor borderline

Continuous Monitoring

We continuously monitor model performance:

Retraining as new data becomes available
Performance tracking across retraining cycles
Feature distribution monitoring to detect data drift
Importance stability checks to ensure consistent rankings

Transparency and Limitations

We believe in transparent reporting. Known limitations include:

Property model shows higher error variance for low-RoL predictions
Some factors correlate with each other due to business realities (e.g., product types determine coverage inclusions)
Results reflect statistical associations, not proven causal relationships

These limitations are documented and factored into our guidance for interpreting results.