Skip to Content

Statistical Rigor

Our factor analysis methodology is built on rigorous statistical foundations. This page describes the quality assurance processes we apply to ensure our models meet the standards expected of a top-tier analytical platform.

Our Commitment to Quality

Factor analysis results inform important business decisions. We hold ourselves to high standards:

  • Every factor is statistically tested before inclusion
  • Every model is validated against held-out data
  • Every prediction is checked for systematic bias
  • All limitations are documented transparently

Factor Selection Process

Inclusion Criteria

Each factor must meet four criteria for inclusion:

CriterionRequirementHow We Test
Statistical Significancep < 0.05 correlation with targetPearson correlation with p-value
Predictive ValueMeaningful contribution to modelSHAP importance analysis
Business InterpretabilityUsers can understand and act on resultsActuarial review
Non-RedundancyUnique information not captured elsewhereVIF and correlation analysis

Factors We Removed

During our validation process, we identified and removed factors that did not meet our criteria of statistical significance or showed significant redundency with other included factors

Factors We Retained with Documentation

Some factors are significant in most but not all models. We retain these for consistency while documenting their limitations:

  • AM Best Rating: Significant in Property and Liability, borderline in Package
  • RCV per Door: Significant in Package, borderline in Property and Liability

Multicollinearity Management

What is Multicollinearity?

When two factors are highly correlated, attribution can shift between them unpredictably. We actively monitor for this issue.

Our Approach

We compute Variance Inflation Factor (VIF) for all features:

VIF RangeInterpretationOur Action
< 5AcceptableInclude without concern
5-10ModerateInclude with monitoring
> 10HighInvestigate and potentially remove

Current Status

Our models show no high multicollinearity (VIF > 10). Some moderate correlations exist between coverage features (e.g., terrorism property and terrorism liability coverages are often purchased together), but these reflect business realities rather than data quality issues.

Validation Methodology

Cross-Validation

We use 5-fold cross-validation to ensure our models generalize to new data:

  1. Data is randomly split into 5 equal folds
  2. Each fold takes a turn as the validation set
  3. Model is trained on remaining 4 folds
  4. R² is computed on the held-out fold
  5. Final R² is the average across all 5 folds

This approach prevents overfitting and provides realistic performance estimates.

Residual Diagnostics

For each model, we analyze prediction errors (residuals) to detect issues:

Bias Check: Do predictions systematically over- or under-estimate?

  • All models show mean error < 0.2%
  • Result: No systematic bias detected

Variance Stability: Are errors consistent across the prediction range?

  • Package and Liability models show stable variance
  • Property model shows moderate variation (documented limitation)

Distribution: Are errors approximately symmetric?

  • All models show acceptable residual distributions

Actuarial Sense Checks

We validate that factor effects align with actuarial expectations:

FactorExpected EffectActual EffectStatus
Wind/Hail CoverageIncreases RoLPositiveConfirmed
Named Storm CoverageIncreases RoLPositiveConfirmed
General LiabilityIncreases RoLPositiveConfirmed
Umbrella CoverageIncreases RoLPositiveConfirmed
SFHA (Flood Zone)Increases RoLPositiveConfirmed

All key factors show expected effect directions.

Model Performance Summary

Explanatory Power

ModelR² (Cross-Validated)Assessment
Package52%Moderate-strong
Property58%Strong
Liability68%Strong

These values indicate our models capture meaningful pricing relationships while acknowledging that some RoL variation stems from factors not in our data (underwriter judgment, relationship pricing, etc.).

Models are constantly retrained, to be sure you have the latest figures, reach out to Advocate directly.

Sample Sizes

All models are trained on thousands of policies, ensuring statistical reliability.

Continuous Improvement

Our validation is not a one-time event. We continuously:

  • Retrain models as new data becomes available
  • Monitor performance for degradation
  • Update documentation to reflect current state
  • Review factor significance with each training cycle

Summary

Our factor analysis methodology combines:

  • Rigorous statistical testing for factor selection
  • Multiple validation techniques to ensure reliability
  • Transparent documentation of limitations
  • Continuous monitoring and improvement

This approach ensures that factor analysis results you see in the Advocate Terminal are grounded in sound statistical practice and can be trusted for business decision-making.

Last updated on