The trend in per capita medical costs in the United States is a structurally complex and non-tradeable economic variable. Actuarial benchmarks—such as those reflected in the Milliman Health Trend Guidelines—are built from claims data that show heterogeneous drivers such as utilization intensity, reimbursement levels, and drug prices. They are typically reported on a monthly basis.
A substantial portion of additional forward-looking information regarding these cost drivers also emerges in qualitative form: regulatory draft proposals, clinical trial outcomes, policy commentary, reimbursement negotiations, and industry guidance. Historically, the interpretation of such information has relied on domain experts who translate dispersed signals into implicit judgments about future cost trends. However, this expert processing is difficult to scale, systematize, or replicate within a quantitative framework. This paper proposes a methodology for formalizing that interpretive layer.
Specifically, we construct a Healthcare Sentiment Index (HSI) that leverages large language models to encode expert-informed taxonomies of healthcare cost drivers and systematically map unstructured textual data—news feed—into a directional, time-indexed signal. The primary contribution is methodological rather than predictive. The HSI represents an automated translation of domain expertise into a structured quantitative index, enabling qualitative information to be incorporated into formal empirical analysis and potentially systematic asset allocation programs.
Key discussion points include the following.
- Natural language processing methodology: A multistage pipeline that prioritizes domain expertise over generic sentiment scoring.
- Sentiment index construction: A multistage aggregation framework culminating in a dynamic Bayesian smoothing approach.
- Time series explainer: A specialized retrieval-augmented generation pipeline designed to interpret why the sentiment index moved.