Statistical Rigor

Our factor analysis methodology is built on rigorous statistical foundations. This page describes the quality assurance processes we apply to ensure our models meet the standards expected of a top-tier analytical platform.

Our Commitment to Quality

Factor analysis results inform important business decisions. We hold ourselves to high standards:

Every factor is statistically tested before inclusion
Every model is validated against held-out data
Every prediction is checked for systematic bias
All limitations are documented transparently

Factor Selection Process

Inclusion Criteria

Each factor must meet four criteria for inclusion:

Criterion	Requirement	How We Test
Statistical Significance	p < 0.05 correlation with target	Pearson correlation with p-value
Predictive Value	Meaningful contribution to model	SHAP importance analysis
Business Interpretability	Users can understand and act on results	Actuarial review
Non-Redundancy	Unique information not captured elsewhere	VIF and correlation analysis

Factors We Removed

During our validation process, we identified and removed factors that did not meet our criteria of statistical significance or showed significant redundency with other included factors

Factors We Retained with Documentation

Some factors are significant in most but not all models. We retain these for consistency while documenting their limitations:

AM Best Rating: Significant in Property and Liability, borderline in Package
RCV per Door: Significant in Package, borderline in Property and Liability

Multicollinearity Management

What is Multicollinearity?

When two factors are highly correlated, attribution can shift between them unpredictably. We actively monitor for this issue.

Our Approach

We compute Variance Inflation Factor (VIF) for all features:

VIF Range	Interpretation	Our Action
< 5	Acceptable	Include without concern
5-10	Moderate	Include with monitoring
> 10	High	Investigate and potentially remove

Current Status

Our models show no high multicollinearity (VIF > 10). Some moderate correlations exist between coverage features (e.g., terrorism property and terrorism liability coverages are often purchased together), but these reflect business realities rather than data quality issues.

Validation Methodology

Cross-Validation

We use 5-fold cross-validation to ensure our models generalize to new data:

Data is randomly split into 5 equal folds
Each fold takes a turn as the validation set
Model is trained on remaining 4 folds
R² is computed on the held-out fold
Final R² is the average across all 5 folds

This approach prevents overfitting and provides realistic performance estimates.

Residual Diagnostics

For each model, we analyze prediction errors (residuals) to detect issues:

Bias Check: Do predictions systematically over- or under-estimate?

All models show mean error < 0.2%
Result: No systematic bias detected

Variance Stability: Are errors consistent across the prediction range?

Package and Liability models show stable variance
Property model shows moderate variation (documented limitation)

Distribution: Are errors approximately symmetric?

All models show acceptable residual distributions

Actuarial Sense Checks

We validate that factor effects align with actuarial expectations:

Factor	Expected Effect	Actual Effect	Status
Wind/Hail Coverage	Increases RoL	Positive	Confirmed
Named Storm Coverage	Increases RoL	Positive	Confirmed
General Liability	Increases RoL	Positive	Confirmed
Umbrella Coverage	Increases RoL	Positive	Confirmed
SFHA (Flood Zone)	Increases RoL	Positive	Confirmed

All key factors show expected effect directions.

Model Performance Summary

Explanatory Power

Model	R² (Cross-Validated)	Assessment
Package	52%	Moderate-strong
Property	58%	Strong
Liability	68%	Strong

These values indicate our models capture meaningful pricing relationships while acknowledging that some RoL variation stems from factors not in our data (underwriter judgment, relationship pricing, etc.).

Models are constantly retrained, to be sure you have the latest figures, reach out to Advocate directly.

Sample Sizes

All models are trained on thousands of policies, ensuring statistical reliability.

Continuous Improvement

Our validation is not a one-time event. We continuously:

Retrain models as new data becomes available
Monitor performance for degradation
Update documentation to reflect current state
Review factor significance with each training cycle

Summary

Our factor analysis methodology combines:

Rigorous statistical testing for factor selection
Multiple validation techniques to ensure reliability
Transparent documentation of limitations
Continuous monitoring and improvement

This approach ensures that factor analysis results you see in the Advocate Terminal are grounded in sound statistical practice and can be trusted for business decision-making.