Skip to Content

Feature Selection

The factors included in the pricing attribution model were selected based on four criteria: data availability, predictive power, business interpretability, and non-redundancy.

Selection Criteria

Data Availability

Only features consistently captured across policies in the terminal data can be included. Features with high missing rates or inconsistent definitions across data sources are excluded to ensure model reliability.

Predictive Power

Features must demonstrate correlation with RoL in exploratory analysis. Including non-predictive features adds noise without improving explanatory power.

Business Interpretability

Users must be able to understand and act on factor results. Abstract or highly technical features that users cannot relate to their own properties or decisions are excluded.

Non-Redundancy

Each feature should capture unique information. When two features are highly correlated (e.g., total RCV and property size), derived ratios or one representative feature is used to avoid double-counting effects.

Common Features (All Coverage Types)

These features are included in all three model variants (package, property, liability).

Numeric Features

FeatureDescriptionWhy Included
Unit CountNumber of units in the propertyProperty size is a primary pricing driver; larger properties have different risk profiles
Year BuiltConstruction yearBuilding age correlates with construction quality, code compliance, and maintenance
StoriesNumber of floorsVertical density affects fire risk, evacuation complexity, and liability exposure
Annual Gross RentTotal annual rental incomeIncome proxy indicating property quality and tenant profile
RCV per DoorReplacement Cost Value divided by unit countNormalized replacement cost; controls for property value independent of size
EGI per DoorEffective Gross Income divided by unit countNormalized income; indicates revenue intensity per unit

Categorical Features

FeatureDescriptionWhy Included
StateProperty state locationGeographic location determines regulatory environment, weather exposure, and litigation climate
ZIP PrefixFirst 3 digits of ZIP codeRegional grouping (~900 regions) captures local market conditions without overfitting to individual ZIPs
CarrierInsurance carrier nameDifferent carriers have distinct pricing strategies and risk appetites
BrokeragePlacing brokerageBrokerages negotiate different rates and have varying market access
Policy StructurePolicy structure classificationHow coverage is structured affects pricing
AM Best RatingCarrier financial strength ratingCarrier financial strength may correlate with pricing discipline
Product TypeInsurance product classificationProduct type affects base pricing

Coverage-Specific Features

Property Model Features

In addition to common features, property and package models include:

FeatureTypeDescription
RCV of Existing StructureNumericTotal replacement cost value; fundamental to property pricing
Wind/Hail CoverageBooleanIndicates coverage for major property peril; CAT exposure marker
Named Storm CoverageBooleanCoverage for named storms; coastal exposure indicator
Terrorism Property CoverageBooleanProperty terrorism coverage; urban/high-value property indicator
Special Flood Hazard Area (SFHA)BooleanFEMA flood zone designation; flood risk indicator

Liability Model Features

In addition to common features, liability and package models include:

FeatureTypeDescription
General Liability CoverageBooleanCore liability coverage presence
Umbrella Liability CoverageBooleanExcess coverage indicates higher limits or exposure
Terrorism Liability CoverageBooleanLiability terrorism coverage; additional exposure indicator

Derived Features

Several features are derived from raw data fields to improve model performance:

RCV per Door

Calculated as RCV / Unit Count. Using this ratio instead of raw RCV allows the model to assess whether pricing reflects value intensity rather than absolute property size.

EGI per Door

Calculated as EGI / Unit Count. Normalizes income by property size to capture revenue quality independent of scale.

ZIP Prefix

The first 3 digits of the ZIP code. Full 5-digit ZIP codes would create thousands of categories, leading to data sparsity and overfitting. The 3-digit prefix provides approximately 900 geographic regions, sufficient granularity to capture regional variation while maintaining adequate sample sizes per category.

Validation testing confirmed that ZIP prefix pricing patterns correlate with external risk indicators, including local crime data and crime scores. This suggests the feature captures meaningful geographic risk variation rather than spurious patterns.

Last updated on