Factor Analysis Methodology

Factor Analysis quantifies how specific property characteristics, coverage options, and market factors contribute to observed Rate on Line (RoL) variation. Rather than simply reporting what pricing was, factor analysis explains why pricing varies across comparable policies.

Purpose and Scope

The factor analysis methodology uses machine learning to decompose RoL variation into attributable components. For any peer group of policies, the system identifies which characteristics most strongly influence pricing and quantifies the direction and magnitude of each factor’s impact.

This analysis answers questions such as:

Which property characteristics drive the largest pricing differences?
How does my specific factor value compare to the peer group?
What is the typical RoL impact of having wind/hail coverage vs. not having it?

Methodological Framework

Factor analysis uses XGBoost gradient boosting models combined with SHAP (SHapley Additive exPlanations) values for attribution. This approach:

Captures non-linear relationships between features and pricing
Handles mixed feature types including numeric, categorical, and boolean variables
Provides exact attributions using TreeExplainer for efficient SHAP computation
Produces interpretable results expressed as percentage impacts on RoL

Key Principles

Dynamic Per-Peer-Group Analysis

Factor importance and impact are computed dynamically for each peer group query. A factor that strongly influences pricing for coastal multifamily properties may have different importance for inland office buildings. Results reflect the specific comparison set, not global averages.

Statistical Attribution, Not Causation

Factor analysis identifies statistical associations between characteristics and pricing outcomes. A positive impact for a factor means policies with that characteristic tend to have higher RoL, but does not prove the characteristic directly causes the price difference. Other correlated factors or underwriting considerations may contribute.

Minimum Data Requirements

Reliable factor analysis requires sufficient sample size. The system enforces minimum thresholds for both model training and factor reporting to ensure statistical validity. When sample sizes are insufficient, factor results are suppressed or flagged accordingly.

Reporting Standards

Normalized Importance Scores

Importance scores are normalized to sum to 1.0 (100%) across all factors. An importance score of 0.15 indicates the factor explains approximately 15% of the pricing variation within the peer group.

Multiplicative Effects

Because models are trained on log-transformed RoL, SHAP values represent multiplicative effects. An impact of +5% means the factor is associated with RoL approximately 5% higher than the peer group baseline, all else equal.

Value-Level Breakdowns

For categorical factors, the system reports impact by category value (e.g., by state or carrier). For numeric factors, impacts are reported by quartile to show how the effect varies across the value range.