BHI layer v1: docs, schema, Phase A ingestion stubs
This commit is contained in:
125
docs/scoring.md
Normal file
125
docs/scoring.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# BHI Composite Scoring Function
|
||||
|
||||
## Formula
|
||||
|
||||
```
|
||||
composite_score =
|
||||
(demand_severity * 0.25) +
|
||||
(supply_shortage * 0.25) +
|
||||
(pain_signal_volume * 0.20) +
|
||||
(capacity_trend * 0.10) +
|
||||
(workforce_shortage * 0.10) +
|
||||
(regulatory_tailwind * 0.05) +
|
||||
(govt_demand * 0.05)
|
||||
```
|
||||
|
||||
All components are normalized to 0-100 before weighting. Final `composite_score` is 0-100.
|
||||
Each component is computed at the **geo x niche x age-bracket** level (state, county, or MSA depending on data).
|
||||
|
||||
Thesis this reflects (all-of-the-above): demand is outpacing supply, delivery model is shifting, and regulation is restructuring the market — we weight demand + supply heaviest (50% combined), then real-time pain signals, then the three tailwinds.
|
||||
|
||||
---
|
||||
|
||||
## Component definitions
|
||||
|
||||
### 1. demand_severity (25%)
|
||||
Feeder: `bhi_demand_indicators` (CDC WONDER, BRFSS, YRBSS, NSCH).
|
||||
|
||||
For a given geo + age bracket, combine:
|
||||
- Suicide rate per 100k (CDC WONDER, ICD-10 X60-X84)
|
||||
- Drug overdose death rate per 100k (CDC WONDER, X40-X44 + Y10-Y14)
|
||||
- YRBSS "seriously considered suicide" % (adolescent)
|
||||
- BRFSS "mental health not good 14+ days" % (young adult via 18-24 bracket)
|
||||
- NSCH unmet mental health treatment need %
|
||||
|
||||
Normalize each to 0-100 against the national distribution (percentile rank), then average.
|
||||
Trend multiplier: +10 if 5-yr CAGR > 5%.
|
||||
|
||||
### 2. supply_shortage (25%)
|
||||
Feeders: `bhi_shortages` (HRSA HPSA) + `bhi_facilities` (SAMHSA + CMS).
|
||||
|
||||
For a geo:
|
||||
- HPSA mental health score (0-25, already normalized; rescale x4 -> 0-100)
|
||||
- Inverse of facility density: beds per 100k population (percentile-invert)
|
||||
- Inverse of adolescent/young-adult-specific bed density (if scoring those brackets)
|
||||
|
||||
Weighted average: 50% HPSA score, 30% total bed density, 20% age-targeted bed density.
|
||||
|
||||
### 3. pain_signal_volume (20%)
|
||||
Feeders: base Brain's `reddit_posts`, `app_reviews`, and `risk_factors` tables (already being built).
|
||||
|
||||
For a niche (e.g., "adolescent inpatient"):
|
||||
- Count of posts/reviews/risk-factor hits matching niche keywords in last 90 days
|
||||
- Z-score against the full base Brain niche distribution
|
||||
- Clamp to 0-100
|
||||
|
||||
Depends on base Brain being live — until then, this component defaults to 50 (neutral).
|
||||
|
||||
### 4. capacity_trend (10%)
|
||||
Feeder: `bhi_facilities` (opened_date, closed_date) + CMS POS termination records.
|
||||
|
||||
For the geo x niche:
|
||||
- Facilities opened in last 24 months minus closed in last 24 months, normalized by baseline facility count
|
||||
- Negative net = high score (more opportunity), positive net = low score (saturated)
|
||||
- Formula: `100 * (1 - (net_change + baseline) / (2 * baseline))` clamped 0-100
|
||||
|
||||
### 5. workforce_shortage (10%)
|
||||
Feeder: `bhi_workforce` (BLS OES).
|
||||
|
||||
For the MSA:
|
||||
- Wage growth YoY for SOC codes 29-1223, 21-1014, 21-1018, 103T (percentile rank)
|
||||
- Employment per 100k (inverse percentile)
|
||||
- Average them
|
||||
|
||||
High wage growth + low employment density = high shortage score = high opportunity for new supply.
|
||||
|
||||
### 6. regulatory_tailwind (5%)
|
||||
Feeder: `bhi_policy_events`.
|
||||
|
||||
Count of favorable policy events in the last 18 months for the geo:
|
||||
- Medicaid rate increases for BH services
|
||||
- New state mandates for adolescent crisis services
|
||||
- Expanded provider types (peer support, mobile crisis)
|
||||
- Federal rules (e.g., Mental Health Parity enforcement)
|
||||
|
||||
`count * 20`, clamped to 0-100.
|
||||
|
||||
### 7. govt_demand (5%)
|
||||
Feeder: base Brain's `sam_gov_opportunities` table (if present) + `bhi_policy_events`.
|
||||
|
||||
Active + awarded SAM.gov opportunities in NAICS 621112 (Physician offices - mental), 621420 (Outpatient mental health/SUD), 623220 (Residential mental health), 623210 (Residential intellectual/developmental), 624190 (Other individual/family services). Dollar-value-weighted and geo-filtered.
|
||||
|
||||
Log-scale: `min(100, 10 * log10(total_dollar_value + 1))`.
|
||||
|
||||
---
|
||||
|
||||
## Age bracket handling
|
||||
|
||||
Every row in `bhi_demand_indicators` carries an `age_bracket`. When scoring a niche tagged for adolescents (13-17), the demand_severity and pain_signal components filter to that bracket. Young-adult scores pull 18-25. "All" niches average both brackets 50/50.
|
||||
|
||||
Young-adult gap note: for young-adult scoring, supply_shortage should apply an extra +15 penalty on facility density since very few IPFs have dedicated young-adult units — this is captured via the `young_adult_unit` boolean in `bhi_facilities`.
|
||||
|
||||
---
|
||||
|
||||
## Output table (to be added)
|
||||
|
||||
Scores write to `bhi_scores` (created at runtime, not in bhi_tables.sql v1 — add once inputs are flowing):
|
||||
|
||||
```sql
|
||||
CREATE TABLE bhi_scores (
|
||||
id SERIAL PRIMARY KEY,
|
||||
niche TEXT,
|
||||
geo_type TEXT,
|
||||
geo_code TEXT,
|
||||
age_bracket TEXT,
|
||||
composite NUMERIC,
|
||||
demand_severity NUMERIC,
|
||||
supply_shortage NUMERIC,
|
||||
pain_signal NUMERIC,
|
||||
capacity_trend NUMERIC,
|
||||
workforce_short NUMERIC,
|
||||
reg_tailwind NUMERIC,
|
||||
govt_demand NUMERIC,
|
||||
computed_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
Reference in New Issue
Block a user