BHI layer v1: docs, schema, Phase A ingestion stubs
This commit is contained in:
156
docs/integration_plan.md
Normal file
156
docs/integration_plan.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# BHI Layer — Integration Plan
|
||||
|
||||
Steps to merge the BHI layer into the base Economic Brain after the base build finishes.
|
||||
|
||||
**Prereqs** (verified before step 1):
|
||||
- Base Brain is running: `psql -d brain -c '\dt'` shows core tables including `job_runs`.
|
||||
- `/home/ubuntu/economic-brain/` contains a working `jobs/` directory structure.
|
||||
- DATABASE_URL env var exported and pointing at the `brain` Postgres.
|
||||
|
||||
---
|
||||
|
||||
## 1. Apply the BHI schema
|
||||
|
||||
```bash
|
||||
cd /home/ubuntu/economic-brain-bhi
|
||||
psql "$DATABASE_URL" -f schemas/bhi_tables.sql
|
||||
psql "$DATABASE_URL" -c "\dt bhi_*"
|
||||
# Expect 9 tables: bhi_facilities, bhi_facility_quality, bhi_facility_financials,
|
||||
# bhi_demand_indicators, bhi_workforce, bhi_shortages, bhi_rtf_licensing,
|
||||
# bhi_policy_events, bhi_crisis_calls
|
||||
```
|
||||
|
||||
## 2. Copy ingestion jobs into the Brain's jobs tree
|
||||
|
||||
```bash
|
||||
mkdir -p /home/ubuntu/economic-brain/jobs/bhi
|
||||
cp /home/ubuntu/economic-brain-bhi/jobs/ingestion/*.py /home/ubuntu/economic-brain/jobs/bhi/
|
||||
# _common.py is included; it reads DATABASE_URL from env already
|
||||
```
|
||||
|
||||
Install Python deps if the base Brain doesn't already have them:
|
||||
|
||||
```bash
|
||||
pip install requests psycopg2-binary
|
||||
```
|
||||
|
||||
## 3. Smoke test every Phase A job (no DB writes)
|
||||
|
||||
```bash
|
||||
cd /home/ubuntu/economic-brain/jobs/bhi
|
||||
for f in cms_ipfqr.py cms_hospital_compare.py cms_nursing_home.py \
|
||||
samhsa_locator.py hrsa_hpsa.py nppes.py cdc_brfss.py \
|
||||
cdc_yrbss.py cdc_wonder_mortality.py bls_oes.py cms_pos.py \
|
||||
samhsa_nssats_nmhss.py idea_part_b.py nsch.py; do
|
||||
echo "=== $f ==="
|
||||
python3 "$f" test || echo "FAIL: $f"
|
||||
done
|
||||
```
|
||||
|
||||
Every job should print `OK:` and exit 0. If any fail, fix the endpoint/URL in the job file before proceeding.
|
||||
|
||||
## 4. Run jobs in dependency order
|
||||
|
||||
```bash
|
||||
# Facilities first (feed bhi_facilities.facility_id FK for quality/financials)
|
||||
python3 cms_ipfqr.py
|
||||
python3 cms_hospital_compare.py
|
||||
python3 cms_nursing_home.py
|
||||
python3 samhsa_locator.py
|
||||
python3 cms_pos.py
|
||||
python3 samhsa_nssats_nmhss.py
|
||||
python3 nppes.py
|
||||
|
||||
# Shortages + demand (independent)
|
||||
python3 hrsa_hpsa.py
|
||||
python3 cdc_wonder_mortality.py
|
||||
python3 cdc_brfss.py
|
||||
python3 cdc_yrbss.py
|
||||
python3 idea_part_b.py
|
||||
python3 nsch.py
|
||||
|
||||
# Workforce
|
||||
python3 bls_oes.py
|
||||
```
|
||||
|
||||
Monitor `job_runs`:
|
||||
```sql
|
||||
SELECT job_name, status, started_at, finished_at, error
|
||||
FROM job_runs WHERE job_name LIKE 'bhi_%' ORDER BY started_at DESC;
|
||||
```
|
||||
|
||||
## 5. Import n8n workflows (scheduled refresh)
|
||||
|
||||
Create workflows in n8n (or add to existing scheduler):
|
||||
|
||||
| Workflow | Cron | Script |
|
||||
|---|---|---|
|
||||
| BHI: CMS facilities refresh | `0 3 * * 1` (weekly Mon 3am) | `cms_ipfqr.py`, `cms_hospital_compare.py`, `cms_nursing_home.py` |
|
||||
| BHI: SAMHSA locator refresh | `0 4 1 * *` (monthly) | `samhsa_locator.py` |
|
||||
| BHI: HRSA HPSA refresh | `0 5 * * 2` (weekly Tue 5am) | `hrsa_hpsa.py` |
|
||||
| BHI: CDC demand refresh | `0 6 1 * *` (monthly) | `cdc_brfss.py`, `cdc_yrbss.py`, `cdc_wonder_mortality.py` |
|
||||
| BHI: Workforce refresh | `0 7 1 */3 *` (quarterly) | `bls_oes.py` |
|
||||
| BHI: CMS POS refresh | `0 8 1 */3 *` (quarterly) | `cms_pos.py` |
|
||||
|
||||
Workflow template: Cron node -> Execute Command (`python3 /home/ubuntu/economic-brain/jobs/bhi/<script>.py`) -> if non-zero, send alert to Slack / email.
|
||||
|
||||
## 6. Add command center page
|
||||
|
||||
Create `/home/ubuntu/command-center/pages/brain/behavioral-health.html` (or equivalent in the Brain's command-center framework) with sections:
|
||||
|
||||
1. **Facility map** — Leaflet map of `bhi_facilities` colored by `facility_type`, filterable by `adolescent_unit` / `young_adult_unit`.
|
||||
2. **HPSA heatmap** — county-level choropleth of `bhi_shortages.score`.
|
||||
3. **Demand indicators panel** — small multiples of suicide rate, overdose rate, BRFSS depression by state, split by age bracket.
|
||||
4. **Composite ranking table** — top 50 opportunities by `composite_score` (see scoring.md).
|
||||
5. **Recent policy events feed** — last 20 rows from `bhi_policy_events` ordered by `effective_date DESC`.
|
||||
6. **Job status widget** — last run of each `bhi_*` job from `job_runs`.
|
||||
|
||||
Route: `/brain/behavioral-health`.
|
||||
|
||||
## 7. Test queries (acceptance smoke tests)
|
||||
|
||||
```sql
|
||||
-- Facility count by type
|
||||
SELECT facility_type, count(*) FROM bhi_facilities GROUP BY 1 ORDER BY 2 DESC;
|
||||
|
||||
-- Top 20 worst MH HPSAs
|
||||
SELECT state, county_fips, score, population_served
|
||||
FROM bhi_shortages WHERE withdrawn_date IS NULL
|
||||
ORDER BY score DESC LIMIT 20;
|
||||
|
||||
-- Adolescent suicide rates, top states
|
||||
SELECT geo_code, value FROM bhi_demand_indicators
|
||||
WHERE measure='suicide_rate' AND age_bracket='13-17'
|
||||
ORDER BY value DESC LIMIT 20;
|
||||
|
||||
-- Counties with IPF but zero adolescent units (cross-check)
|
||||
SELECT state, count(*) FILTER (WHERE adolescent_unit) AS adolescent_units,
|
||||
count(*) AS total
|
||||
FROM bhi_facilities WHERE facility_type='IPF' GROUP BY state ORDER BY 2 ASC;
|
||||
|
||||
-- Workforce shortage: psychiatrists, top wage growth MSAs
|
||||
SELECT msa_name, annual_wage_median
|
||||
FROM bhi_workforce WHERE occupation_code='29-1223'
|
||||
ORDER BY annual_wage_median DESC LIMIT 20;
|
||||
|
||||
-- job run health
|
||||
SELECT job_name, status, count(*)
|
||||
FROM job_runs WHERE job_name LIKE 'bhi_%'
|
||||
GROUP BY 1, 2;
|
||||
```
|
||||
|
||||
If every query returns rows and no job_run shows `status='error'`, the BHI layer is live.
|
||||
|
||||
## 8. Git merge to main Brain repo
|
||||
|
||||
```bash
|
||||
cd /home/ubuntu/economic-brain
|
||||
git checkout -b bhi-layer-merge
|
||||
cp -r /home/ubuntu/economic-brain-bhi/schemas/bhi_tables.sql schemas/
|
||||
cp -r /home/ubuntu/economic-brain-bhi/jobs/ingestion/* jobs/bhi/
|
||||
cp -r /home/ubuntu/economic-brain-bhi/docs/* docs/bhi/
|
||||
git add schemas/bhi_tables.sql jobs/bhi docs/bhi
|
||||
git commit -m "Integrate BHI layer"
|
||||
git push origin bhi-layer-merge
|
||||
# Open PR for review on Gitea
|
||||
```
|
||||
125
docs/scoring.md
Normal file
125
docs/scoring.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# BHI Composite Scoring Function
|
||||
|
||||
## Formula
|
||||
|
||||
```
|
||||
composite_score =
|
||||
(demand_severity * 0.25) +
|
||||
(supply_shortage * 0.25) +
|
||||
(pain_signal_volume * 0.20) +
|
||||
(capacity_trend * 0.10) +
|
||||
(workforce_shortage * 0.10) +
|
||||
(regulatory_tailwind * 0.05) +
|
||||
(govt_demand * 0.05)
|
||||
```
|
||||
|
||||
All components are normalized to 0-100 before weighting. Final `composite_score` is 0-100.
|
||||
Each component is computed at the **geo x niche x age-bracket** level (state, county, or MSA depending on data).
|
||||
|
||||
Thesis this reflects (all-of-the-above): demand is outpacing supply, delivery model is shifting, and regulation is restructuring the market — we weight demand + supply heaviest (50% combined), then real-time pain signals, then the three tailwinds.
|
||||
|
||||
---
|
||||
|
||||
## Component definitions
|
||||
|
||||
### 1. demand_severity (25%)
|
||||
Feeder: `bhi_demand_indicators` (CDC WONDER, BRFSS, YRBSS, NSCH).
|
||||
|
||||
For a given geo + age bracket, combine:
|
||||
- Suicide rate per 100k (CDC WONDER, ICD-10 X60-X84)
|
||||
- Drug overdose death rate per 100k (CDC WONDER, X40-X44 + Y10-Y14)
|
||||
- YRBSS "seriously considered suicide" % (adolescent)
|
||||
- BRFSS "mental health not good 14+ days" % (young adult via 18-24 bracket)
|
||||
- NSCH unmet mental health treatment need %
|
||||
|
||||
Normalize each to 0-100 against the national distribution (percentile rank), then average.
|
||||
Trend multiplier: +10 if 5-yr CAGR > 5%.
|
||||
|
||||
### 2. supply_shortage (25%)
|
||||
Feeders: `bhi_shortages` (HRSA HPSA) + `bhi_facilities` (SAMHSA + CMS).
|
||||
|
||||
For a geo:
|
||||
- HPSA mental health score (0-25, already normalized; rescale x4 -> 0-100)
|
||||
- Inverse of facility density: beds per 100k population (percentile-invert)
|
||||
- Inverse of adolescent/young-adult-specific bed density (if scoring those brackets)
|
||||
|
||||
Weighted average: 50% HPSA score, 30% total bed density, 20% age-targeted bed density.
|
||||
|
||||
### 3. pain_signal_volume (20%)
|
||||
Feeders: base Brain's `reddit_posts`, `app_reviews`, and `risk_factors` tables (already being built).
|
||||
|
||||
For a niche (e.g., "adolescent inpatient"):
|
||||
- Count of posts/reviews/risk-factor hits matching niche keywords in last 90 days
|
||||
- Z-score against the full base Brain niche distribution
|
||||
- Clamp to 0-100
|
||||
|
||||
Depends on base Brain being live — until then, this component defaults to 50 (neutral).
|
||||
|
||||
### 4. capacity_trend (10%)
|
||||
Feeder: `bhi_facilities` (opened_date, closed_date) + CMS POS termination records.
|
||||
|
||||
For the geo x niche:
|
||||
- Facilities opened in last 24 months minus closed in last 24 months, normalized by baseline facility count
|
||||
- Negative net = high score (more opportunity), positive net = low score (saturated)
|
||||
- Formula: `100 * (1 - (net_change + baseline) / (2 * baseline))` clamped 0-100
|
||||
|
||||
### 5. workforce_shortage (10%)
|
||||
Feeder: `bhi_workforce` (BLS OES).
|
||||
|
||||
For the MSA:
|
||||
- Wage growth YoY for SOC codes 29-1223, 21-1014, 21-1018, 103T (percentile rank)
|
||||
- Employment per 100k (inverse percentile)
|
||||
- Average them
|
||||
|
||||
High wage growth + low employment density = high shortage score = high opportunity for new supply.
|
||||
|
||||
### 6. regulatory_tailwind (5%)
|
||||
Feeder: `bhi_policy_events`.
|
||||
|
||||
Count of favorable policy events in the last 18 months for the geo:
|
||||
- Medicaid rate increases for BH services
|
||||
- New state mandates for adolescent crisis services
|
||||
- Expanded provider types (peer support, mobile crisis)
|
||||
- Federal rules (e.g., Mental Health Parity enforcement)
|
||||
|
||||
`count * 20`, clamped to 0-100.
|
||||
|
||||
### 7. govt_demand (5%)
|
||||
Feeder: base Brain's `sam_gov_opportunities` table (if present) + `bhi_policy_events`.
|
||||
|
||||
Active + awarded SAM.gov opportunities in NAICS 621112 (Physician offices - mental), 621420 (Outpatient mental health/SUD), 623220 (Residential mental health), 623210 (Residential intellectual/developmental), 624190 (Other individual/family services). Dollar-value-weighted and geo-filtered.
|
||||
|
||||
Log-scale: `min(100, 10 * log10(total_dollar_value + 1))`.
|
||||
|
||||
---
|
||||
|
||||
## Age bracket handling
|
||||
|
||||
Every row in `bhi_demand_indicators` carries an `age_bracket`. When scoring a niche tagged for adolescents (13-17), the demand_severity and pain_signal components filter to that bracket. Young-adult scores pull 18-25. "All" niches average both brackets 50/50.
|
||||
|
||||
Young-adult gap note: for young-adult scoring, supply_shortage should apply an extra +15 penalty on facility density since very few IPFs have dedicated young-adult units — this is captured via the `young_adult_unit` boolean in `bhi_facilities`.
|
||||
|
||||
---
|
||||
|
||||
## Output table (to be added)
|
||||
|
||||
Scores write to `bhi_scores` (created at runtime, not in bhi_tables.sql v1 — add once inputs are flowing):
|
||||
|
||||
```sql
|
||||
CREATE TABLE bhi_scores (
|
||||
id SERIAL PRIMARY KEY,
|
||||
niche TEXT,
|
||||
geo_type TEXT,
|
||||
geo_code TEXT,
|
||||
age_bracket TEXT,
|
||||
composite NUMERIC,
|
||||
demand_severity NUMERIC,
|
||||
supply_shortage NUMERIC,
|
||||
pain_signal NUMERIC,
|
||||
capacity_trend NUMERIC,
|
||||
workforce_short NUMERIC,
|
||||
reg_tailwind NUMERIC,
|
||||
govt_demand NUMERIC,
|
||||
computed_at TIMESTAMPTZ DEFAULT NOW()
|
||||
);
|
||||
```
|
||||
273
docs/sources.md
Normal file
273
docs/sources.md
Normal file
@@ -0,0 +1,273 @@
|
||||
# BHI Data Sources
|
||||
|
||||
All endpoints tested 2026-04-04 unless noted. "Tested: OK" means a live curl returned valid data.
|
||||
|
||||
Scope: behavioral health facilities, demand indicators, workforce, shortages, and policy for all 50 US states, tagged by age bracket (adolescent 13-17, young adult 18-25).
|
||||
|
||||
---
|
||||
|
||||
## PHASE A — Free, autonomous, ready to ingest
|
||||
|
||||
### 1. CMS IPFQR (Inpatient Psychiatric Facility Quality Reporting)
|
||||
- **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/{dataset_id}/0`
|
||||
- **Dataset IDs:**
|
||||
- `q9vs-r7wp` — IPFQR by Facility
|
||||
- `dc76-gh7x` — IPFQR by State
|
||||
- `s5xg-sys6` — IPFQR National
|
||||
- **Auth:** None
|
||||
- **Rate limit:** None documented; be polite (<= 5 req/sec)
|
||||
- **Update frequency:** Quarterly
|
||||
- **Record count:** ~1,600 IPFs (facility file); dozens of measures each
|
||||
- **Key fields:** `facility_id`, `facility_name`, `address`, `state`, `zip`, `countyparish`, HBIPS-2/3 restraint+seclusion, SMD, SUB-2/3, TOB-3, transition record, 30-day readmission
|
||||
- **Test curl (OK):**
|
||||
```
|
||||
curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0?limit=2"
|
||||
```
|
||||
- **Python snippet:**
|
||||
```python
|
||||
import requests
|
||||
r = requests.get("https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0",
|
||||
params={"limit": 500, "offset": 0})
|
||||
rows = r.json()["results"]
|
||||
```
|
||||
|
||||
### 2. CMS Hospital Compare / Care Compare (general hospital info)
|
||||
- **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0`
|
||||
- **Auth:** None | **Rate limit:** none | **Update:** Monthly
|
||||
- **Records:** ~5,300 hospitals
|
||||
- **Key fields:** `facility_id` (CCN), `facility_name`, `hospital_type`, `hospital_ownership`, `hospital_overall_rating`, mortality/safety/readmission group flags
|
||||
- **Test (OK):**
|
||||
```
|
||||
curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0?limit=2"
|
||||
```
|
||||
- Use to classify which acute hospitals have behavioral health units (cross-join with IPFQR CCNs).
|
||||
|
||||
### 3. CMS Provider of Services (POS) file
|
||||
- **Bulk page:** `https://data.cms.gov/provider-characteristics/hospitals-and-other-facilities/provider-of-services-file-quality-improvement-and-evaluation-system`
|
||||
- **JSON catalog:** `https://data.cms.gov/data.json` (search `dataset[].title` = "Provider of Services File")
|
||||
- **Auth:** None | **Update:** Quarterly | **Format:** CSV bulk
|
||||
- **Records:** ~80,000 Medicare-certified facilities (includes PSY, PRTF, hospitals)
|
||||
- **Key fields:** CCN, provider category, bed count, certification date, termination date, ownership
|
||||
- **Test (OK):** `curl -s "https://data.cms.gov/data.json"` — dataset list
|
||||
- Required for bed counts and termination (closure) tracking.
|
||||
|
||||
### 4. CMS Nursing Home Compare (Provider Information)
|
||||
- **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0`
|
||||
- **Auth:** None | **Update:** Monthly
|
||||
- **Records:** ~15,000 nursing homes
|
||||
- **Key fields:** CCN, provider_name, ownership, number_of_certified_beds, overall rating, chain info
|
||||
- **Test (OK):** `curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0?limit=2"`
|
||||
- Used to capture residential behavioral health (SNFs frequently host psych/BH residents).
|
||||
|
||||
### 5. SAMHSA Treatment Locator (findtreatment.gov)
|
||||
- **Endpoint:** `https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr={zip}`
|
||||
- **Auth:** None (browser UA helps but not required for JSON export)
|
||||
- **Rate limit:** None documented; HEAD returns 403 but GET returns 200 — use GET only
|
||||
- **Update:** Continuous (SAMHSA-maintained)
|
||||
- **Records:** ~96,000 BH treatment facilities (all service types)
|
||||
- **Key fields:** name1/name2, street, city, state, zip, phone, intake, hotline, website, lat, lon, services, typeFacility
|
||||
- **Test (OK):**
|
||||
```
|
||||
curl -s "https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr=10001"
|
||||
```
|
||||
Response: `{"page":1,"totalPages":3201,"recordCount":96009,"rows":[...]}`
|
||||
- **Python snippet:**
|
||||
```python
|
||||
import requests, time
|
||||
def fetch_all(zip_seed="10001"):
|
||||
base = "https://findtreatment.gov/locator/exportsAsJson/v2"
|
||||
page = 1
|
||||
while True:
|
||||
r = requests.get(base, params={"sType":"BH","sAddr":zip_seed,"pageSize":30,"page":page})
|
||||
d = r.json()
|
||||
yield from d["rows"]
|
||||
if page >= d["totalPages"]: break
|
||||
page += 1
|
||||
time.sleep(0.3)
|
||||
```
|
||||
|
||||
### 6. SAMHSA N-SSATS + N-MHSS
|
||||
- **Bulk:** `https://www.samhsa.gov/data/data-we-collect/n-ssats/datafiles` and `/n-mhss/datafiles`
|
||||
- **Auth:** None | **Update:** Annual | **Format:** SAS / SPSS / CSV
|
||||
- **Records:** N-SSATS ~16,000 SUD facilities/year; N-MHSS ~12,000 MH facilities/year
|
||||
- **Key fields:** facility id, services, payment accepted, populations served (including adolescent/young adult flags), bed counts, ownership
|
||||
- **Note:** Bulk ZIPs; no live API. Staged as manual-download job.
|
||||
|
||||
### 7. CDC WONDER (mortality — suicide, overdose, by county, age)
|
||||
- **Endpoint:** `https://wonder.cdc.gov/controller/datarequest/D76` (Underlying Cause of Death) — POST XML
|
||||
- **Auth:** None for non-restricted datasets; county-level suppressed for <10 deaths
|
||||
- **Update:** Annual
|
||||
- **Records:** All US mortality; we pull ICD-10 X60-X84 (suicide) + X40-X44/Y10-Y14 (overdose) by county, 13-17 and 18-25
|
||||
- **Test (OK):** landing page returns 200; POST XML required for data. See job stub `wonder_mortality.py` for the working XML template.
|
||||
|
||||
### 8. CDC BRFSS
|
||||
- **Endpoint (Socrata):** `https://data.cdc.gov/resource/dttw-5yxu.json`
|
||||
- **Auth:** None (Socrata app token optional for higher limits) | **Update:** Annual
|
||||
- **Records:** ~100k rows/year (state x question x breakout)
|
||||
- **Test (OK):**
|
||||
```
|
||||
curl -s "https://data.cdc.gov/resource/dttw-5yxu.json?$limit=2"
|
||||
```
|
||||
Returns depression prevalence, mental health days, etc. by state+demographic.
|
||||
|
||||
### 9. CDC YRBSS (Youth Risk Behavior Survey)
|
||||
- **Endpoints (Socrata, verified present via catalog):**
|
||||
- High school: `https://data.cdc.gov/resource/3qty-g4aq.json`
|
||||
- Middle school: `https://data.cdc.gov/resource/uqmk-4y2w.json`
|
||||
- **Auth:** None | **Update:** Biennial
|
||||
- **Records:** State + large urban district level; ~50k rows
|
||||
- **Key fields:** suicidal ideation, attempt, persistent sadness, substance use — exactly the adolescent demand signal we need.
|
||||
|
||||
### 10. IDEA Part B data (Emotional Disturbance by district)
|
||||
- **Landing:** `https://www2.ed.gov/programs/osepidea/618-data/static-tables/index.html`
|
||||
- **Auth:** None | **Format:** CSV static tables | **Update:** Annual
|
||||
- **Records:** ~14,000 school districts + state rollups
|
||||
- **Key fields:** Child count under ED classification, ages 6-21, by state and LEA
|
||||
- **Note:** Static CSVs; no API. Download script documents exact file URLs.
|
||||
|
||||
### 11. NSCH (National Survey of Children's Health) via HRSA
|
||||
- **Landing:** `https://www.childhealthdata.org/browse/survey` and `https://mchb.hrsa.gov/data-research/national-survey-childrens-health`
|
||||
- **Bulk (HRSA):** `https://mchb.hrsa.gov/sites/default/files/nsch/datafiles/` (year-specific)
|
||||
- **Auth:** None | **Update:** Annual | **Format:** SAS / Stata / CSV
|
||||
- **Records:** ~50k surveyed children, weighted to state-level estimates
|
||||
- **Key fields:** anxiety, depression, behavioral problems, received treatment, unmet need — by state x age.
|
||||
|
||||
### 12. BLS OES (behavioral health workforce by MSA)
|
||||
- **API:** `https://api.bls.gov/publicAPI/v2/timeseries/data/` (POST JSON)
|
||||
- **Auth:** Free registration key for >25 series/day (`https://data.bls.gov/registrationEngine/`). Without key: 25 series/query, 10 years/query, no key required but lower limits.
|
||||
- **Update:** Annual (May reference period)
|
||||
- **Series ID pattern:** `OEUM{area}{industry}{occupation}{datatype}`
|
||||
- **Relevant SOC codes:**
|
||||
- 29-1223 Psychiatrists
|
||||
- 29-1229 Other Physicians (incl. addiction medicine)
|
||||
- 21-1014 Mental Health Counselors
|
||||
- 21-1015 Rehabilitation Counselors
|
||||
- 21-1018 Substance Abuse/Behavioral Disorder Counselors
|
||||
- 21-1022 Mental Health and SUD Social Workers
|
||||
- 19-3033 Clinical & Counseling Psychologists
|
||||
- **Test (OK):** BLS API responds (test hit confirmed structure; real series IDs required)
|
||||
- **Bulk alternative:** `https://www.bls.gov/oes/special-requests/oesm{YY}ma.zip` (annual bulk by MSA) — no auth, ~50MB zip.
|
||||
|
||||
### 13. HRSA Mental Health HPSAs
|
||||
- **Bulk CSV (verified):** `https://data.hrsa.gov/DataDownload/DD_Files/BCD_HPSA_FCT_DET_MH.csv`
|
||||
- **Size:** ~23 MB
|
||||
- **Auth:** None | **Update:** Continuous (weekly snapshots)
|
||||
- **Records:** ~6,500 active MH HPSAs + historical
|
||||
- **Key fields:** HPSA ID, designation type, discipline (MH), score (0-25), state, county FIPS via HPSA Geography ID, population, designation date, withdrawn date, lat/lon
|
||||
- **Test (OK):** HTTP 200, 23 MB CSV returned.
|
||||
|
||||
### 14. CMS NPPES (National Plan & Provider Enumeration System)
|
||||
- **API:** `https://npiregistry.cms.hhs.gov/api/?version=2.1`
|
||||
- **Auth:** None | **Rate limit:** ~200 req/sec soft; 200 results max per query — paginate with `skip`
|
||||
- **Update:** Daily
|
||||
- **Records:** ~8 million NPIs; filter by taxonomy for behavioral health (~500k)
|
||||
- **Relevant taxonomy codes:**
|
||||
- 2084P0800X Psychiatry & Neurology - Psychiatry
|
||||
- 2084P0802X Addiction Psychiatry
|
||||
- 2084P0804X Child & Adolescent Psychiatry
|
||||
- 103T00000X Psychologist
|
||||
- 101YM0800X Mental Health Counselor
|
||||
- 103TC2200X Clinical Child & Adolescent Psychologist
|
||||
- 1041C0700X Clinical Social Worker
|
||||
- 324500000X Substance Abuse Rehabilitation Facility
|
||||
- 283Q00000X Psychiatric Hospital
|
||||
- 323P00000X Psychiatric Residential Treatment Facility
|
||||
- **Test (OK):**
|
||||
```
|
||||
curl -s "https://npiregistry.cms.hhs.gov/api/?version=2.1&taxonomy_description=psychiatric&state=NY&limit=2"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PHASE B — Requires application or registration
|
||||
|
||||
### 15. HCUP (AHRQ)
|
||||
- **Landing:** `https://hcup-us.ahrq.gov/tech_assist/centdist.jsp`
|
||||
- **Auth:** Data Use Agreement (DUA) required; free for research but application-based (~2-4 weeks)
|
||||
- **Records:** State inpatient/ED/ASC databases, ~40M discharges/yr nationally
|
||||
- **Action required:** Submit DUA + Data Use Training certificate. **BLOCKED until user applies.**
|
||||
|
||||
### 16. CMS Medicare Cost Reports (MCR)
|
||||
- **Bulk:** `https://www.cms.gov/data-research/statistics-trends-and-reports/cost-reports` (HOSPITAL2010 format)
|
||||
- **Auth:** None; just large downloads (~1-3 GB per year)
|
||||
- **Update:** Quarterly rolling
|
||||
- **Records:** ~6,000 hospital cost reports/year (CCN-level)
|
||||
- Staged as a fetch-and-parse job (uses `ccn` to join with `bhi_facilities`).
|
||||
|
||||
### 17. NEMSIS state crisis transport data
|
||||
- **Landing:** `https://nemsis.org/using-ems-data/request-research-data/`
|
||||
- **Auth:** Research Data Request (application) — typically 4-8 weeks
|
||||
- **BLOCKED until user applies.**
|
||||
|
||||
### 18. California HCAI (patient discharge data)
|
||||
- **Endpoint:** `https://hcai.ca.gov/data-and-reports/cost-transparency/` and `https://data.chhs.ca.gov/dataset?q=pdd`
|
||||
- **Auth:** Free (some files direct download; Limited Data Set requires DUA)
|
||||
- **Update:** Annual
|
||||
- **Records:** ~3.5M CA discharges/yr; psych DRGs extractable
|
||||
|
||||
### 19. NY SPARCS
|
||||
- **Landing:** `https://www.health.ny.gov/statistics/sparcs/`
|
||||
- **Auth:** Application for identified data; deidentified file free via `health.data.ny.gov`
|
||||
- **Deidentified endpoint:** `https://health.data.ny.gov/resource/u4ud-w55t.json` (Hospital Inpatient Discharges)
|
||||
- **Records:** ~2.5M NY discharges/yr
|
||||
|
||||
### 20. TX DSHS discharge data
|
||||
- **Landing:** `https://www.dshs.texas.gov/texas-health-care-information-collection/health-data-researcher-information/texas-inpatient-public-use`
|
||||
- **Auth:** Free (Public Use File is a direct download after click-through)
|
||||
- **Records:** ~3M TX discharges/yr
|
||||
|
||||
### 21. FL AHCA discharge data
|
||||
- **Landing:** `https://ahca.myflorida.com/health-care-policy-and-oversight/bureau-of-central-services/florida-center-for-health-information-and-transparency/data-analytics/order-data`
|
||||
- **Auth:** Application form + fee for identified; aggregate free
|
||||
- **BLOCKED until user applies for identified.**
|
||||
|
||||
---
|
||||
|
||||
## PHASE C — State RTF licensing databases
|
||||
|
||||
### 22. State-by-state RTF licensing scrapers
|
||||
Scope: residential treatment facilities serving adolescents. One scraper per state.
|
||||
|
||||
Verified public-search portals (no auth, scrape-friendly HTML/JSON):
|
||||
- **UT** — `https://hslic.utah.gov/` (Human Services License Information Lookup)
|
||||
- **CA** — `https://www.ccld.dss.ca.gov/transparencyapi/api/facilities` (Community Care Licensing API)
|
||||
- **TX** — `https://www.hhs.texas.gov/providers/long-term-care-providers/childrens-residential-facility-reimbursement-methodology` + search portal
|
||||
- **FL** — `https://apps.myflfamilies.com/provider/` (DCF provider search)
|
||||
- **NY** — `https://omh.ny.gov/omhweb/resources/providers/` (OMH provider directory)
|
||||
- **MT** — `https://dphhs.mt.gov/qad/licensure/licensedfacilitieslist` (static list)
|
||||
- **AZ** — `https://azcarecheck.azdhs.gov/` (public search)
|
||||
- **CO** — `https://apps.colorado.gov/apps/oapa/licensee.aspx` (Office of Early Childhood)
|
||||
- **OR** — `https://ccld.oregon.gov/ccld/search/` (Care Provider Directory)
|
||||
- **WA** — `https://fortress.wa.gov/dshs/adsaapps/lookup/` (LTC lookup)
|
||||
- **IL** — `https://www2.illinois.gov/dcfs/brighterfutures/Pages/default.aspx`
|
||||
- **MA** — `https://www.mass.gov/lists/licensed-residential-treatment-programs`
|
||||
- **PA** — `https://www.dhs.pa.gov/Services/Assistance/Pages/Child-Residential-Facility.aspx`
|
||||
|
||||
States requiring FOIA / no public portal (documented as BLOCKED for Phase C v1):
|
||||
- AL, AK, AR, DE, GA, HI, ID, IN, IA, KS, KY, LA, ME, MD, MI, MN, MS, MO, NE, NV, NH, NJ, NM, NC, ND, OH, OK, RI, SC, SD, TN, VT, VA, WV, WI, WY
|
||||
|
||||
The scraper job stub lists URL patterns for the 13 verified states and marks the rest "FOIA required."
|
||||
|
||||
---
|
||||
|
||||
## Test results summary (Phase A)
|
||||
|
||||
| # | Source | Status | Notes |
|
||||
|---|--------|--------|-------|
|
||||
| 1 | CMS IPFQR | OK | q9vs-r7wp returned facility rows |
|
||||
| 2 | CMS Hospital Compare | OK | xubh-q36u returned |
|
||||
| 3 | CMS POS | OK | catalog reachable, bulk CSV |
|
||||
| 4 | CMS Nursing Home | OK | 4pq5-n9py returned |
|
||||
| 5 | SAMHSA Locator | OK | 96,009 records confirmed |
|
||||
| 6 | SAMHSA N-SSATS/N-MHSS | OK (bulk) | ZIP download, no API |
|
||||
| 7 | CDC WONDER | OK | POST XML required, landing 200 |
|
||||
| 8 | CDC BRFSS | OK | Socrata JSON returned |
|
||||
| 9 | CDC YRBSS | OK | 3qty-g4aq + uqmk-4y2w |
|
||||
| 10 | IDEA Part B | OK (static) | Static CSV; no API |
|
||||
| 11 | NSCH | OK (bulk) | HRSA year files |
|
||||
| 12 | BLS OES | OK | API responds; needs real series IDs |
|
||||
| 13 | HRSA HPSA MH | OK | 23 MB CSV download confirmed |
|
||||
| 14 | NPPES | OK | 2 results returned for NY psych |
|
||||
|
||||
Blocked until auth/application:
|
||||
- HCUP (DUA), NEMSIS (application), FL AHCA identified, NY SPARCS identified.
|
||||
64
docs/target_questions.md
Normal file
64
docs/target_questions.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# BHI Layer — Target Opportunity Questions
|
||||
|
||||
These are the questions the BHI layer must answer. They double as acceptance criteria: the layer ships when every question can be answered with a SQL query or a short Python notebook against `brain` with BHI tables populated.
|
||||
|
||||
Scope assumptions: all 50 states, facility-level where available, tagged adolescent (13-17) and young adult (18-25).
|
||||
|
||||
## 1. Supply / capacity
|
||||
|
||||
1. Which US counties have the highest HPSA mental health scores AND the lowest bed density (top 50)?
|
||||
2. Which counties have ZERO licensed adolescent inpatient psychiatric beds within 60 miles?
|
||||
3. Which counties have ZERO licensed young-adult residential treatment beds within 60 miles?
|
||||
4. How many IPFs have closed vs opened in the last 24 months, by state?
|
||||
5. Which IPFs have the worst HBIPS restraint+seclusion rates and are therefore vulnerability candidates for competitive entry or acquisition?
|
||||
6. Which nursing homes are disproportionately housing under-65 residents with SMI (SNF-IMD dynamic) and are candidates for conversion/specialty buildout?
|
||||
7. Where are the biggest drops in psych bed count over the last 5 years (via POS termination data)?
|
||||
8. Which states have the lowest ratio of PRTF beds per 10k adolescents?
|
||||
|
||||
## 2. Demand
|
||||
|
||||
9. Which counties have the highest 13-17 suicide rate and fastest-growing trend (CDC WONDER)?
|
||||
10. Which counties have the highest 18-25 overdose death rate trend?
|
||||
11. Which states have the highest YRBSS "considered suicide" % and highest unmet-treatment need on NSCH, simultaneously?
|
||||
12. How does adolescent ED visit rate for self-harm compare across states (cross-joining HCUP when available)?
|
||||
13. Which school districts have the highest IDEA Part B Emotional Disturbance child count per 1,000 students?
|
||||
14. Which states are seeing the largest YoY increase in 988 + crisis line volume per capita?
|
||||
|
||||
## 3. Workforce
|
||||
|
||||
15. Which MSAs have the highest YoY wage growth for psychiatrists (SOC 29-1223) — indicates a shortage?
|
||||
16. Which MSAs have psychiatrist employment per 100k in the bottom quartile AND mental health HPSA coverage in the worst quartile?
|
||||
17. Where are LCSW/LMHC wages spiking (21-1014, 21-1018) while employment is flat?
|
||||
|
||||
## 4. Financial / opportunity
|
||||
|
||||
18. What is the median psych Medicare margin (revenue - cost) per discharge, by state, from MCR data?
|
||||
19. Which for-profit IPF chains are expanding fastest (opened_date + chain_id from nursing home join)?
|
||||
20. Which counties have the biggest gap between HPSA score and SAM.gov / state contract dollars flowing in (underinvested vs need)?
|
||||
21. What are the median acquisition multiples for BH facilities in each state? (Requires later enrichment.)
|
||||
|
||||
## 5. Adolescent transport / crisis (specific focus)
|
||||
|
||||
22. Which counties dispatch the most EMS runs coded "behavioral/psych" per 10k adolescents (NEMSIS, when access granted)?
|
||||
23. Where do adolescent psychiatric holds most frequently result in out-of-county or out-of-state transport (indicates no local capacity)?
|
||||
24. Which states have the longest average ED boarding time for adolescents awaiting inpatient psych admission (via AHRQ + state HAI reports)?
|
||||
25. Which states have dedicated secure transport statute/reimbursement (`bhi_policy_events` filter on "secure transport") — these are bluefields for BH transport vendors?
|
||||
26. Which counties combine: high adolescent suicide rate + no in-county adolescent psych beds + high ED boarding = highest-need adolescent transport markets?
|
||||
27. Which chains/operators already provide adolescent secure transport and where are their service gaps (via scraping state BHO contract registries)?
|
||||
|
||||
## 6. Regulatory / tailwind
|
||||
|
||||
28. Which states passed Medicaid rate increases for BH residential in the last 24 months?
|
||||
29. Which states expanded the definition of "mobile crisis response" to include adolescents in the last 24 months?
|
||||
30. Where are IMD exclusion waivers (Section 1115 SMI/SED waivers) active or pending?
|
||||
|
||||
## 7. Composite / prioritization
|
||||
|
||||
31. Top 10 states ranked by composite_score for "adolescent inpatient psychiatric"?
|
||||
32. Top 50 counties ranked by composite_score for "young adult residential SUD"?
|
||||
33. Top 20 MSAs ranked by composite_score for "outpatient adolescent therapy (IOP/PHP)"?
|
||||
34. For each of the top 10 composite-score opportunities, list: (a) top 3 operators already there, (b) workforce wage growth, (c) most recent policy event, (d) closest open SAM.gov opportunity.
|
||||
|
||||
---
|
||||
|
||||
**Acceptance criteria:** When the BHI layer is live and all Phase A sources are ingested, a user should be able to run SQL or ask the Brain's natural-language interface these 34 questions and get a grounded answer with citations to the underlying `bhi_*` tables.
|
||||
Reference in New Issue
Block a user