Factor Analysis Methodology
Factor Analysis quantifies how specific property characteristics, coverage options, and market factors contribute to observed Rate on Line (RoL) variation. Rather than simply reporting what pricing was, factor analysis explains why pricing varies across comparable policies.
Purpose and Scope
The factor analysis methodology uses machine learning to decompose RoL variation into attributable components. For any peer group of policies, the system identifies which characteristics most strongly influence pricing and quantifies the direction and magnitude of each factor’s impact.
This analysis answers questions such as:
- Which property characteristics drive the largest pricing differences?
- How does my specific factor value compare to the peer group?
- What is the typical RoL impact of having wind/hail coverage vs. not having it?
Methodological Framework
Factor analysis uses XGBoost gradient boosting models combined with SHAP (SHapley Additive exPlanations) values for attribution. This approach:
- Captures non-linear relationships between features and pricing
- Handles mixed feature types including numeric, categorical, and boolean variables
- Provides exact attributions using TreeExplainer for efficient SHAP computation
- Produces interpretable results expressed as percentage impacts on RoL
Key Principles
Dynamic Per-Peer-Group Analysis
Factor importance and impact are computed dynamically for each peer group query. A factor that strongly influences pricing for coastal multifamily properties may have different importance for inland office buildings. Results reflect the specific comparison set, not global averages.
Statistical Attribution, Not Causation
Factor analysis identifies statistical associations between characteristics and pricing outcomes. A positive impact for a factor means policies with that characteristic tend to have higher RoL, but does not prove the characteristic directly causes the price difference. Other correlated factors or underwriting considerations may contribute.
Minimum Data Requirements
Reliable factor analysis requires sufficient sample size. The system enforces minimum thresholds for both model training and factor reporting to ensure statistical validity. When sample sizes are insufficient, factor results are suppressed or flagged accordingly.
Reporting Standards
Normalized Importance Scores
Importance scores are normalized to sum to 1.0 (100%) across all factors. An importance score of 0.15 indicates the factor explains approximately 15% of the pricing variation within the peer group.
Multiplicative Effects
Because models are trained on log-transformed RoL, SHAP values represent multiplicative effects. An impact of +5% means the factor is associated with RoL approximately 5% higher than the peer group baseline, all else equal.
Value-Level Breakdowns
For categorical factors, the system reports impact by category value (e.g., by state or carrier). For numeric factors, impacts are reported by quartile to show how the effect varies across the value range.