Statistical Rigor
Our factor analysis methodology is built on rigorous statistical foundations. This page describes the quality assurance processes we apply to ensure our models meet the standards expected of a top-tier analytical platform.
Our Commitment to Quality
Factor analysis results inform important business decisions. We hold ourselves to high standards:
- Every factor is statistically tested before inclusion
- Every model is validated against held-out data
- Every prediction is checked for systematic bias
- All limitations are documented transparently
Factor Selection Process
Inclusion Criteria
Each factor must meet four criteria for inclusion:
| Criterion | Requirement | How We Test |
|---|---|---|
| Statistical Significance | p < 0.05 correlation with target | Pearson correlation with p-value |
| Predictive Value | Meaningful contribution to model | SHAP importance analysis |
| Business Interpretability | Users can understand and act on results | Actuarial review |
| Non-Redundancy | Unique information not captured elsewhere | VIF and correlation analysis |
Factors We Removed
During our validation process, we identified and removed factors that did not meet our criteria of statistical significance or showed significant redundency with other included factors
Factors We Retained with Documentation
Some factors are significant in most but not all models. We retain these for consistency while documenting their limitations:
- AM Best Rating: Significant in Property and Liability, borderline in Package
- RCV per Door: Significant in Package, borderline in Property and Liability
Multicollinearity Management
What is Multicollinearity?
When two factors are highly correlated, attribution can shift between them unpredictably. We actively monitor for this issue.
Our Approach
We compute Variance Inflation Factor (VIF) for all features:
| VIF Range | Interpretation | Our Action |
|---|---|---|
| < 5 | Acceptable | Include without concern |
| 5-10 | Moderate | Include with monitoring |
| > 10 | High | Investigate and potentially remove |
Current Status
Our models show no high multicollinearity (VIF > 10). Some moderate correlations exist between coverage features (e.g., terrorism property and terrorism liability coverages are often purchased together), but these reflect business realities rather than data quality issues.
Validation Methodology
Cross-Validation
We use 5-fold cross-validation to ensure our models generalize to new data:
- Data is randomly split into 5 equal folds
- Each fold takes a turn as the validation set
- Model is trained on remaining 4 folds
- R² is computed on the held-out fold
- Final R² is the average across all 5 folds
This approach prevents overfitting and provides realistic performance estimates.
Residual Diagnostics
For each model, we analyze prediction errors (residuals) to detect issues:
Bias Check: Do predictions systematically over- or under-estimate?
- All models show mean error < 0.2%
- Result: No systematic bias detected
Variance Stability: Are errors consistent across the prediction range?
- Package and Liability models show stable variance
- Property model shows moderate variation (documented limitation)
Distribution: Are errors approximately symmetric?
- All models show acceptable residual distributions
Actuarial Sense Checks
We validate that factor effects align with actuarial expectations:
| Factor | Expected Effect | Actual Effect | Status |
|---|---|---|---|
| Wind/Hail Coverage | Increases RoL | Positive | Confirmed |
| Named Storm Coverage | Increases RoL | Positive | Confirmed |
| General Liability | Increases RoL | Positive | Confirmed |
| Umbrella Coverage | Increases RoL | Positive | Confirmed |
| SFHA (Flood Zone) | Increases RoL | Positive | Confirmed |
All key factors show expected effect directions.
Model Performance Summary
Explanatory Power
| Model | R² (Cross-Validated) | Assessment |
|---|---|---|
| Package | 52% | Moderate-strong |
| Property | 58% | Strong |
| Liability | 68% | Strong |
These values indicate our models capture meaningful pricing relationships while acknowledging that some RoL variation stems from factors not in our data (underwriter judgment, relationship pricing, etc.).
Models are constantly retrained, to be sure you have the latest figures, reach out to Advocate directly.
Sample Sizes
All models are trained on thousands of policies, ensuring statistical reliability.
Continuous Improvement
Our validation is not a one-time event. We continuously:
- Retrain models as new data becomes available
- Monitor performance for degradation
- Update documentation to reflect current state
- Review factor significance with each training cycle
Summary
Our factor analysis methodology combines:
- Rigorous statistical testing for factor selection
- Multiple validation techniques to ensure reliability
- Transparent documentation of limitations
- Continuous monitoring and improvement
This approach ensures that factor analysis results you see in the Advocate Terminal are grounded in sound statistical practice and can be trusted for business decision-making.