Data Quality Controls and Validation
RoL-related fields receive heightened quality scrutiny due to their importance in analytics, benchmarking, and reporting.
Validation and enrichment are supported by a purpose-built AI extraction and validation system that has been iteratively improved through multiple human-in-the-loop (HITL) cycles. Where required, values are re-validated or re-extracted directly from filed source documentation.
Manual review by the data team is performed when automated validation is insufficient or where policy structure complexity warrants additional scrutiny.
Outlier Detection Methodology
RoL values are evaluated at the product level using a log-transformed interquartile range (IQR) methodology to establish expected inlier ranges.
The methodology operates as follows:
- RoL values are log-transformed to reduce skew and stabilize variance
- Quartiles (Q1 and Q3) are calculated on the transformed values
- The interquartile range (IQR) is defined as Q3 − Q1
- Lower and upper bounds are determined using an IQR-based multiplier
- Observations are classified as inliers or outliers based on their position relative to these bounds
- Bounds may be back-transformed to the original RoL scale for interpretability
Outlier Severity Bands
Outliers are categorized into severity bands based on their distance from the nearest bound:
- Inlier — Within expected range
- Mild — Up to 20% beyond the nearest bound
- Moderate — Greater than 20% and up to 50% beyond the nearest bound
- Extreme — Greater than 50% beyond the nearest bound
Review Process
Outliers are not automatically excluded. Instead, they enter a validation workflow that includes:
- Automated re-validation via AI extraction against filed documentation
- Manual review by the data team where required
Only records that pass validation are approved for inclusion in Market Terminal outputs.