BHI layer v1: docs, schema, Phase A ingestion stubs

This commit is contained in:
BHI Staging Agent
2026-04-05 20:15:36 +00:00
commit 3dfd9ea3c6
21 changed files with 2399 additions and 0 deletions

156
docs/integration_plan.md Normal file
View File

@@ -0,0 +1,156 @@
# BHI Layer — Integration Plan
Steps to merge the BHI layer into the base Economic Brain after the base build finishes.
**Prereqs** (verified before step 1):
- Base Brain is running: `psql -d brain -c '\dt'` shows core tables including `job_runs`.
- `/home/ubuntu/economic-brain/` contains a working `jobs/` directory structure.
- DATABASE_URL env var exported and pointing at the `brain` Postgres.
---
## 1. Apply the BHI schema
```bash
cd /home/ubuntu/economic-brain-bhi
psql "$DATABASE_URL" -f schemas/bhi_tables.sql
psql "$DATABASE_URL" -c "\dt bhi_*"
# Expect 9 tables: bhi_facilities, bhi_facility_quality, bhi_facility_financials,
# bhi_demand_indicators, bhi_workforce, bhi_shortages, bhi_rtf_licensing,
# bhi_policy_events, bhi_crisis_calls
```
## 2. Copy ingestion jobs into the Brain's jobs tree
```bash
mkdir -p /home/ubuntu/economic-brain/jobs/bhi
cp /home/ubuntu/economic-brain-bhi/jobs/ingestion/*.py /home/ubuntu/economic-brain/jobs/bhi/
# _common.py is included; it reads DATABASE_URL from env already
```
Install Python deps if the base Brain doesn't already have them:
```bash
pip install requests psycopg2-binary
```
## 3. Smoke test every Phase A job (no DB writes)
```bash
cd /home/ubuntu/economic-brain/jobs/bhi
for f in cms_ipfqr.py cms_hospital_compare.py cms_nursing_home.py \
samhsa_locator.py hrsa_hpsa.py nppes.py cdc_brfss.py \
cdc_yrbss.py cdc_wonder_mortality.py bls_oes.py cms_pos.py \
samhsa_nssats_nmhss.py idea_part_b.py nsch.py; do
echo "=== $f ==="
python3 "$f" test || echo "FAIL: $f"
done
```
Every job should print `OK:` and exit 0. If any fail, fix the endpoint/URL in the job file before proceeding.
## 4. Run jobs in dependency order
```bash
# Facilities first (feed bhi_facilities.facility_id FK for quality/financials)
python3 cms_ipfqr.py
python3 cms_hospital_compare.py
python3 cms_nursing_home.py
python3 samhsa_locator.py
python3 cms_pos.py
python3 samhsa_nssats_nmhss.py
python3 nppes.py
# Shortages + demand (independent)
python3 hrsa_hpsa.py
python3 cdc_wonder_mortality.py
python3 cdc_brfss.py
python3 cdc_yrbss.py
python3 idea_part_b.py
python3 nsch.py
# Workforce
python3 bls_oes.py
```
Monitor `job_runs`:
```sql
SELECT job_name, status, started_at, finished_at, error
FROM job_runs WHERE job_name LIKE 'bhi_%' ORDER BY started_at DESC;
```
## 5. Import n8n workflows (scheduled refresh)
Create workflows in n8n (or add to existing scheduler):
| Workflow | Cron | Script |
|---|---|---|
| BHI: CMS facilities refresh | `0 3 * * 1` (weekly Mon 3am) | `cms_ipfqr.py`, `cms_hospital_compare.py`, `cms_nursing_home.py` |
| BHI: SAMHSA locator refresh | `0 4 1 * *` (monthly) | `samhsa_locator.py` |
| BHI: HRSA HPSA refresh | `0 5 * * 2` (weekly Tue 5am) | `hrsa_hpsa.py` |
| BHI: CDC demand refresh | `0 6 1 * *` (monthly) | `cdc_brfss.py`, `cdc_yrbss.py`, `cdc_wonder_mortality.py` |
| BHI: Workforce refresh | `0 7 1 */3 *` (quarterly) | `bls_oes.py` |
| BHI: CMS POS refresh | `0 8 1 */3 *` (quarterly) | `cms_pos.py` |
Workflow template: Cron node -> Execute Command (`python3 /home/ubuntu/economic-brain/jobs/bhi/<script>.py`) -> if non-zero, send alert to Slack / email.
## 6. Add command center page
Create `/home/ubuntu/command-center/pages/brain/behavioral-health.html` (or equivalent in the Brain's command-center framework) with sections:
1. **Facility map** — Leaflet map of `bhi_facilities` colored by `facility_type`, filterable by `adolescent_unit` / `young_adult_unit`.
2. **HPSA heatmap** — county-level choropleth of `bhi_shortages.score`.
3. **Demand indicators panel** — small multiples of suicide rate, overdose rate, BRFSS depression by state, split by age bracket.
4. **Composite ranking table** — top 50 opportunities by `composite_score` (see scoring.md).
5. **Recent policy events feed** — last 20 rows from `bhi_policy_events` ordered by `effective_date DESC`.
6. **Job status widget** — last run of each `bhi_*` job from `job_runs`.
Route: `/brain/behavioral-health`.
## 7. Test queries (acceptance smoke tests)
```sql
-- Facility count by type
SELECT facility_type, count(*) FROM bhi_facilities GROUP BY 1 ORDER BY 2 DESC;
-- Top 20 worst MH HPSAs
SELECT state, county_fips, score, population_served
FROM bhi_shortages WHERE withdrawn_date IS NULL
ORDER BY score DESC LIMIT 20;
-- Adolescent suicide rates, top states
SELECT geo_code, value FROM bhi_demand_indicators
WHERE measure='suicide_rate' AND age_bracket='13-17'
ORDER BY value DESC LIMIT 20;
-- Counties with IPF but zero adolescent units (cross-check)
SELECT state, count(*) FILTER (WHERE adolescent_unit) AS adolescent_units,
count(*) AS total
FROM bhi_facilities WHERE facility_type='IPF' GROUP BY state ORDER BY 2 ASC;
-- Workforce shortage: psychiatrists, top wage growth MSAs
SELECT msa_name, annual_wage_median
FROM bhi_workforce WHERE occupation_code='29-1223'
ORDER BY annual_wage_median DESC LIMIT 20;
-- job run health
SELECT job_name, status, count(*)
FROM job_runs WHERE job_name LIKE 'bhi_%'
GROUP BY 1, 2;
```
If every query returns rows and no job_run shows `status='error'`, the BHI layer is live.
## 8. Git merge to main Brain repo
```bash
cd /home/ubuntu/economic-brain
git checkout -b bhi-layer-merge
cp -r /home/ubuntu/economic-brain-bhi/schemas/bhi_tables.sql schemas/
cp -r /home/ubuntu/economic-brain-bhi/jobs/ingestion/* jobs/bhi/
cp -r /home/ubuntu/economic-brain-bhi/docs/* docs/bhi/
git add schemas/bhi_tables.sql jobs/bhi docs/bhi
git commit -m "Integrate BHI layer"
git push origin bhi-layer-merge
# Open PR for review on Gitea
```

125
docs/scoring.md Normal file
View File

@@ -0,0 +1,125 @@
# BHI Composite Scoring Function
## Formula
```
composite_score =
(demand_severity * 0.25) +
(supply_shortage * 0.25) +
(pain_signal_volume * 0.20) +
(capacity_trend * 0.10) +
(workforce_shortage * 0.10) +
(regulatory_tailwind * 0.05) +
(govt_demand * 0.05)
```
All components are normalized to 0-100 before weighting. Final `composite_score` is 0-100.
Each component is computed at the **geo x niche x age-bracket** level (state, county, or MSA depending on data).
Thesis this reflects (all-of-the-above): demand is outpacing supply, delivery model is shifting, and regulation is restructuring the market — we weight demand + supply heaviest (50% combined), then real-time pain signals, then the three tailwinds.
---
## Component definitions
### 1. demand_severity (25%)
Feeder: `bhi_demand_indicators` (CDC WONDER, BRFSS, YRBSS, NSCH).
For a given geo + age bracket, combine:
- Suicide rate per 100k (CDC WONDER, ICD-10 X60-X84)
- Drug overdose death rate per 100k (CDC WONDER, X40-X44 + Y10-Y14)
- YRBSS "seriously considered suicide" % (adolescent)
- BRFSS "mental health not good 14+ days" % (young adult via 18-24 bracket)
- NSCH unmet mental health treatment need %
Normalize each to 0-100 against the national distribution (percentile rank), then average.
Trend multiplier: +10 if 5-yr CAGR > 5%.
### 2. supply_shortage (25%)
Feeders: `bhi_shortages` (HRSA HPSA) + `bhi_facilities` (SAMHSA + CMS).
For a geo:
- HPSA mental health score (0-25, already normalized; rescale x4 -> 0-100)
- Inverse of facility density: beds per 100k population (percentile-invert)
- Inverse of adolescent/young-adult-specific bed density (if scoring those brackets)
Weighted average: 50% HPSA score, 30% total bed density, 20% age-targeted bed density.
### 3. pain_signal_volume (20%)
Feeders: base Brain's `reddit_posts`, `app_reviews`, and `risk_factors` tables (already being built).
For a niche (e.g., "adolescent inpatient"):
- Count of posts/reviews/risk-factor hits matching niche keywords in last 90 days
- Z-score against the full base Brain niche distribution
- Clamp to 0-100
Depends on base Brain being live — until then, this component defaults to 50 (neutral).
### 4. capacity_trend (10%)
Feeder: `bhi_facilities` (opened_date, closed_date) + CMS POS termination records.
For the geo x niche:
- Facilities opened in last 24 months minus closed in last 24 months, normalized by baseline facility count
- Negative net = high score (more opportunity), positive net = low score (saturated)
- Formula: `100 * (1 - (net_change + baseline) / (2 * baseline))` clamped 0-100
### 5. workforce_shortage (10%)
Feeder: `bhi_workforce` (BLS OES).
For the MSA:
- Wage growth YoY for SOC codes 29-1223, 21-1014, 21-1018, 103T (percentile rank)
- Employment per 100k (inverse percentile)
- Average them
High wage growth + low employment density = high shortage score = high opportunity for new supply.
### 6. regulatory_tailwind (5%)
Feeder: `bhi_policy_events`.
Count of favorable policy events in the last 18 months for the geo:
- Medicaid rate increases for BH services
- New state mandates for adolescent crisis services
- Expanded provider types (peer support, mobile crisis)
- Federal rules (e.g., Mental Health Parity enforcement)
`count * 20`, clamped to 0-100.
### 7. govt_demand (5%)
Feeder: base Brain's `sam_gov_opportunities` table (if present) + `bhi_policy_events`.
Active + awarded SAM.gov opportunities in NAICS 621112 (Physician offices - mental), 621420 (Outpatient mental health/SUD), 623220 (Residential mental health), 623210 (Residential intellectual/developmental), 624190 (Other individual/family services). Dollar-value-weighted and geo-filtered.
Log-scale: `min(100, 10 * log10(total_dollar_value + 1))`.
---
## Age bracket handling
Every row in `bhi_demand_indicators` carries an `age_bracket`. When scoring a niche tagged for adolescents (13-17), the demand_severity and pain_signal components filter to that bracket. Young-adult scores pull 18-25. "All" niches average both brackets 50/50.
Young-adult gap note: for young-adult scoring, supply_shortage should apply an extra +15 penalty on facility density since very few IPFs have dedicated young-adult units — this is captured via the `young_adult_unit` boolean in `bhi_facilities`.
---
## Output table (to be added)
Scores write to `bhi_scores` (created at runtime, not in bhi_tables.sql v1 — add once inputs are flowing):
```sql
CREATE TABLE bhi_scores (
id SERIAL PRIMARY KEY,
niche TEXT,
geo_type TEXT,
geo_code TEXT,
age_bracket TEXT,
composite NUMERIC,
demand_severity NUMERIC,
supply_shortage NUMERIC,
pain_signal NUMERIC,
capacity_trend NUMERIC,
workforce_short NUMERIC,
reg_tailwind NUMERIC,
govt_demand NUMERIC,
computed_at TIMESTAMPTZ DEFAULT NOW()
);
```

273
docs/sources.md Normal file
View File

@@ -0,0 +1,273 @@
# BHI Data Sources
All endpoints tested 2026-04-04 unless noted. "Tested: OK" means a live curl returned valid data.
Scope: behavioral health facilities, demand indicators, workforce, shortages, and policy for all 50 US states, tagged by age bracket (adolescent 13-17, young adult 18-25).
---
## PHASE A — Free, autonomous, ready to ingest
### 1. CMS IPFQR (Inpatient Psychiatric Facility Quality Reporting)
- **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/{dataset_id}/0`
- **Dataset IDs:**
- `q9vs-r7wp` — IPFQR by Facility
- `dc76-gh7x` — IPFQR by State
- `s5xg-sys6` — IPFQR National
- **Auth:** None
- **Rate limit:** None documented; be polite (<= 5 req/sec)
- **Update frequency:** Quarterly
- **Record count:** ~1,600 IPFs (facility file); dozens of measures each
- **Key fields:** `facility_id`, `facility_name`, `address`, `state`, `zip`, `countyparish`, HBIPS-2/3 restraint+seclusion, SMD, SUB-2/3, TOB-3, transition record, 30-day readmission
- **Test curl (OK):**
```
curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0?limit=2"
```
- **Python snippet:**
```python
import requests
r = requests.get("https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0",
params={"limit": 500, "offset": 0})
rows = r.json()["results"]
```
### 2. CMS Hospital Compare / Care Compare (general hospital info)
- **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0`
- **Auth:** None | **Rate limit:** none | **Update:** Monthly
- **Records:** ~5,300 hospitals
- **Key fields:** `facility_id` (CCN), `facility_name`, `hospital_type`, `hospital_ownership`, `hospital_overall_rating`, mortality/safety/readmission group flags
- **Test (OK):**
```
curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0?limit=2"
```
- Use to classify which acute hospitals have behavioral health units (cross-join with IPFQR CCNs).
### 3. CMS Provider of Services (POS) file
- **Bulk page:** `https://data.cms.gov/provider-characteristics/hospitals-and-other-facilities/provider-of-services-file-quality-improvement-and-evaluation-system`
- **JSON catalog:** `https://data.cms.gov/data.json` (search `dataset[].title` = "Provider of Services File")
- **Auth:** None | **Update:** Quarterly | **Format:** CSV bulk
- **Records:** ~80,000 Medicare-certified facilities (includes PSY, PRTF, hospitals)
- **Key fields:** CCN, provider category, bed count, certification date, termination date, ownership
- **Test (OK):** `curl -s "https://data.cms.gov/data.json"` — dataset list
- Required for bed counts and termination (closure) tracking.
### 4. CMS Nursing Home Compare (Provider Information)
- **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0`
- **Auth:** None | **Update:** Monthly
- **Records:** ~15,000 nursing homes
- **Key fields:** CCN, provider_name, ownership, number_of_certified_beds, overall rating, chain info
- **Test (OK):** `curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0?limit=2"`
- Used to capture residential behavioral health (SNFs frequently host psych/BH residents).
### 5. SAMHSA Treatment Locator (findtreatment.gov)
- **Endpoint:** `https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr={zip}`
- **Auth:** None (browser UA helps but not required for JSON export)
- **Rate limit:** None documented; HEAD returns 403 but GET returns 200 — use GET only
- **Update:** Continuous (SAMHSA-maintained)
- **Records:** ~96,000 BH treatment facilities (all service types)
- **Key fields:** name1/name2, street, city, state, zip, phone, intake, hotline, website, lat, lon, services, typeFacility
- **Test (OK):**
```
curl -s "https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr=10001"
```
Response: `{"page":1,"totalPages":3201,"recordCount":96009,"rows":[...]}`
- **Python snippet:**
```python
import requests, time
def fetch_all(zip_seed="10001"):
base = "https://findtreatment.gov/locator/exportsAsJson/v2"
page = 1
while True:
r = requests.get(base, params={"sType":"BH","sAddr":zip_seed,"pageSize":30,"page":page})
d = r.json()
yield from d["rows"]
if page >= d["totalPages"]: break
page += 1
time.sleep(0.3)
```
### 6. SAMHSA N-SSATS + N-MHSS
- **Bulk:** `https://www.samhsa.gov/data/data-we-collect/n-ssats/datafiles` and `/n-mhss/datafiles`
- **Auth:** None | **Update:** Annual | **Format:** SAS / SPSS / CSV
- **Records:** N-SSATS ~16,000 SUD facilities/year; N-MHSS ~12,000 MH facilities/year
- **Key fields:** facility id, services, payment accepted, populations served (including adolescent/young adult flags), bed counts, ownership
- **Note:** Bulk ZIPs; no live API. Staged as manual-download job.
### 7. CDC WONDER (mortality — suicide, overdose, by county, age)
- **Endpoint:** `https://wonder.cdc.gov/controller/datarequest/D76` (Underlying Cause of Death) — POST XML
- **Auth:** None for non-restricted datasets; county-level suppressed for <10 deaths
- **Update:** Annual
- **Records:** All US mortality; we pull ICD-10 X60-X84 (suicide) + X40-X44/Y10-Y14 (overdose) by county, 13-17 and 18-25
- **Test (OK):** landing page returns 200; POST XML required for data. See job stub `wonder_mortality.py` for the working XML template.
### 8. CDC BRFSS
- **Endpoint (Socrata):** `https://data.cdc.gov/resource/dttw-5yxu.json`
- **Auth:** None (Socrata app token optional for higher limits) | **Update:** Annual
- **Records:** ~100k rows/year (state x question x breakout)
- **Test (OK):**
```
curl -s "https://data.cdc.gov/resource/dttw-5yxu.json?$limit=2"
```
Returns depression prevalence, mental health days, etc. by state+demographic.
### 9. CDC YRBSS (Youth Risk Behavior Survey)
- **Endpoints (Socrata, verified present via catalog):**
- High school: `https://data.cdc.gov/resource/3qty-g4aq.json`
- Middle school: `https://data.cdc.gov/resource/uqmk-4y2w.json`
- **Auth:** None | **Update:** Biennial
- **Records:** State + large urban district level; ~50k rows
- **Key fields:** suicidal ideation, attempt, persistent sadness, substance use — exactly the adolescent demand signal we need.
### 10. IDEA Part B data (Emotional Disturbance by district)
- **Landing:** `https://www2.ed.gov/programs/osepidea/618-data/static-tables/index.html`
- **Auth:** None | **Format:** CSV static tables | **Update:** Annual
- **Records:** ~14,000 school districts + state rollups
- **Key fields:** Child count under ED classification, ages 6-21, by state and LEA
- **Note:** Static CSVs; no API. Download script documents exact file URLs.
### 11. NSCH (National Survey of Children's Health) via HRSA
- **Landing:** `https://www.childhealthdata.org/browse/survey` and `https://mchb.hrsa.gov/data-research/national-survey-childrens-health`
- **Bulk (HRSA):** `https://mchb.hrsa.gov/sites/default/files/nsch/datafiles/` (year-specific)
- **Auth:** None | **Update:** Annual | **Format:** SAS / Stata / CSV
- **Records:** ~50k surveyed children, weighted to state-level estimates
- **Key fields:** anxiety, depression, behavioral problems, received treatment, unmet need — by state x age.
### 12. BLS OES (behavioral health workforce by MSA)
- **API:** `https://api.bls.gov/publicAPI/v2/timeseries/data/` (POST JSON)
- **Auth:** Free registration key for >25 series/day (`https://data.bls.gov/registrationEngine/`). Without key: 25 series/query, 10 years/query, no key required but lower limits.
- **Update:** Annual (May reference period)
- **Series ID pattern:** `OEUM{area}{industry}{occupation}{datatype}`
- **Relevant SOC codes:**
- 29-1223 Psychiatrists
- 29-1229 Other Physicians (incl. addiction medicine)
- 21-1014 Mental Health Counselors
- 21-1015 Rehabilitation Counselors
- 21-1018 Substance Abuse/Behavioral Disorder Counselors
- 21-1022 Mental Health and SUD Social Workers
- 19-3033 Clinical & Counseling Psychologists
- **Test (OK):** BLS API responds (test hit confirmed structure; real series IDs required)
- **Bulk alternative:** `https://www.bls.gov/oes/special-requests/oesm{YY}ma.zip` (annual bulk by MSA) — no auth, ~50MB zip.
### 13. HRSA Mental Health HPSAs
- **Bulk CSV (verified):** `https://data.hrsa.gov/DataDownload/DD_Files/BCD_HPSA_FCT_DET_MH.csv`
- **Size:** ~23 MB
- **Auth:** None | **Update:** Continuous (weekly snapshots)
- **Records:** ~6,500 active MH HPSAs + historical
- **Key fields:** HPSA ID, designation type, discipline (MH), score (0-25), state, county FIPS via HPSA Geography ID, population, designation date, withdrawn date, lat/lon
- **Test (OK):** HTTP 200, 23 MB CSV returned.
### 14. CMS NPPES (National Plan & Provider Enumeration System)
- **API:** `https://npiregistry.cms.hhs.gov/api/?version=2.1`
- **Auth:** None | **Rate limit:** ~200 req/sec soft; 200 results max per query — paginate with `skip`
- **Update:** Daily
- **Records:** ~8 million NPIs; filter by taxonomy for behavioral health (~500k)
- **Relevant taxonomy codes:**
- 2084P0800X Psychiatry & Neurology - Psychiatry
- 2084P0802X Addiction Psychiatry
- 2084P0804X Child & Adolescent Psychiatry
- 103T00000X Psychologist
- 101YM0800X Mental Health Counselor
- 103TC2200X Clinical Child & Adolescent Psychologist
- 1041C0700X Clinical Social Worker
- 324500000X Substance Abuse Rehabilitation Facility
- 283Q00000X Psychiatric Hospital
- 323P00000X Psychiatric Residential Treatment Facility
- **Test (OK):**
```
curl -s "https://npiregistry.cms.hhs.gov/api/?version=2.1&taxonomy_description=psychiatric&state=NY&limit=2"
```
---
## PHASE B — Requires application or registration
### 15. HCUP (AHRQ)
- **Landing:** `https://hcup-us.ahrq.gov/tech_assist/centdist.jsp`
- **Auth:** Data Use Agreement (DUA) required; free for research but application-based (~2-4 weeks)
- **Records:** State inpatient/ED/ASC databases, ~40M discharges/yr nationally
- **Action required:** Submit DUA + Data Use Training certificate. **BLOCKED until user applies.**
### 16. CMS Medicare Cost Reports (MCR)
- **Bulk:** `https://www.cms.gov/data-research/statistics-trends-and-reports/cost-reports` (HOSPITAL2010 format)
- **Auth:** None; just large downloads (~1-3 GB per year)
- **Update:** Quarterly rolling
- **Records:** ~6,000 hospital cost reports/year (CCN-level)
- Staged as a fetch-and-parse job (uses `ccn` to join with `bhi_facilities`).
### 17. NEMSIS state crisis transport data
- **Landing:** `https://nemsis.org/using-ems-data/request-research-data/`
- **Auth:** Research Data Request (application) — typically 4-8 weeks
- **BLOCKED until user applies.**
### 18. California HCAI (patient discharge data)
- **Endpoint:** `https://hcai.ca.gov/data-and-reports/cost-transparency/` and `https://data.chhs.ca.gov/dataset?q=pdd`
- **Auth:** Free (some files direct download; Limited Data Set requires DUA)
- **Update:** Annual
- **Records:** ~3.5M CA discharges/yr; psych DRGs extractable
### 19. NY SPARCS
- **Landing:** `https://www.health.ny.gov/statistics/sparcs/`
- **Auth:** Application for identified data; deidentified file free via `health.data.ny.gov`
- **Deidentified endpoint:** `https://health.data.ny.gov/resource/u4ud-w55t.json` (Hospital Inpatient Discharges)
- **Records:** ~2.5M NY discharges/yr
### 20. TX DSHS discharge data
- **Landing:** `https://www.dshs.texas.gov/texas-health-care-information-collection/health-data-researcher-information/texas-inpatient-public-use`
- **Auth:** Free (Public Use File is a direct download after click-through)
- **Records:** ~3M TX discharges/yr
### 21. FL AHCA discharge data
- **Landing:** `https://ahca.myflorida.com/health-care-policy-and-oversight/bureau-of-central-services/florida-center-for-health-information-and-transparency/data-analytics/order-data`
- **Auth:** Application form + fee for identified; aggregate free
- **BLOCKED until user applies for identified.**
---
## PHASE C — State RTF licensing databases
### 22. State-by-state RTF licensing scrapers
Scope: residential treatment facilities serving adolescents. One scraper per state.
Verified public-search portals (no auth, scrape-friendly HTML/JSON):
- **UT** — `https://hslic.utah.gov/` (Human Services License Information Lookup)
- **CA** — `https://www.ccld.dss.ca.gov/transparencyapi/api/facilities` (Community Care Licensing API)
- **TX** — `https://www.hhs.texas.gov/providers/long-term-care-providers/childrens-residential-facility-reimbursement-methodology` + search portal
- **FL** — `https://apps.myflfamilies.com/provider/` (DCF provider search)
- **NY** — `https://omh.ny.gov/omhweb/resources/providers/` (OMH provider directory)
- **MT** — `https://dphhs.mt.gov/qad/licensure/licensedfacilitieslist` (static list)
- **AZ** — `https://azcarecheck.azdhs.gov/` (public search)
- **CO** — `https://apps.colorado.gov/apps/oapa/licensee.aspx` (Office of Early Childhood)
- **OR** — `https://ccld.oregon.gov/ccld/search/` (Care Provider Directory)
- **WA** — `https://fortress.wa.gov/dshs/adsaapps/lookup/` (LTC lookup)
- **IL** — `https://www2.illinois.gov/dcfs/brighterfutures/Pages/default.aspx`
- **MA** — `https://www.mass.gov/lists/licensed-residential-treatment-programs`
- **PA** — `https://www.dhs.pa.gov/Services/Assistance/Pages/Child-Residential-Facility.aspx`
States requiring FOIA / no public portal (documented as BLOCKED for Phase C v1):
- AL, AK, AR, DE, GA, HI, ID, IN, IA, KS, KY, LA, ME, MD, MI, MN, MS, MO, NE, NV, NH, NJ, NM, NC, ND, OH, OK, RI, SC, SD, TN, VT, VA, WV, WI, WY
The scraper job stub lists URL patterns for the 13 verified states and marks the rest "FOIA required."
---
## Test results summary (Phase A)
| # | Source | Status | Notes |
|---|--------|--------|-------|
| 1 | CMS IPFQR | OK | q9vs-r7wp returned facility rows |
| 2 | CMS Hospital Compare | OK | xubh-q36u returned |
| 3 | CMS POS | OK | catalog reachable, bulk CSV |
| 4 | CMS Nursing Home | OK | 4pq5-n9py returned |
| 5 | SAMHSA Locator | OK | 96,009 records confirmed |
| 6 | SAMHSA N-SSATS/N-MHSS | OK (bulk) | ZIP download, no API |
| 7 | CDC WONDER | OK | POST XML required, landing 200 |
| 8 | CDC BRFSS | OK | Socrata JSON returned |
| 9 | CDC YRBSS | OK | 3qty-g4aq + uqmk-4y2w |
| 10 | IDEA Part B | OK (static) | Static CSV; no API |
| 11 | NSCH | OK (bulk) | HRSA year files |
| 12 | BLS OES | OK | API responds; needs real series IDs |
| 13 | HRSA HPSA MH | OK | 23 MB CSV download confirmed |
| 14 | NPPES | OK | 2 results returned for NY psych |
Blocked until auth/application:
- HCUP (DUA), NEMSIS (application), FL AHCA identified, NY SPARCS identified.

64
docs/target_questions.md Normal file
View File

@@ -0,0 +1,64 @@
# BHI Layer — Target Opportunity Questions
These are the questions the BHI layer must answer. They double as acceptance criteria: the layer ships when every question can be answered with a SQL query or a short Python notebook against `brain` with BHI tables populated.
Scope assumptions: all 50 states, facility-level where available, tagged adolescent (13-17) and young adult (18-25).
## 1. Supply / capacity
1. Which US counties have the highest HPSA mental health scores AND the lowest bed density (top 50)?
2. Which counties have ZERO licensed adolescent inpatient psychiatric beds within 60 miles?
3. Which counties have ZERO licensed young-adult residential treatment beds within 60 miles?
4. How many IPFs have closed vs opened in the last 24 months, by state?
5. Which IPFs have the worst HBIPS restraint+seclusion rates and are therefore vulnerability candidates for competitive entry or acquisition?
6. Which nursing homes are disproportionately housing under-65 residents with SMI (SNF-IMD dynamic) and are candidates for conversion/specialty buildout?
7. Where are the biggest drops in psych bed count over the last 5 years (via POS termination data)?
8. Which states have the lowest ratio of PRTF beds per 10k adolescents?
## 2. Demand
9. Which counties have the highest 13-17 suicide rate and fastest-growing trend (CDC WONDER)?
10. Which counties have the highest 18-25 overdose death rate trend?
11. Which states have the highest YRBSS "considered suicide" % and highest unmet-treatment need on NSCH, simultaneously?
12. How does adolescent ED visit rate for self-harm compare across states (cross-joining HCUP when available)?
13. Which school districts have the highest IDEA Part B Emotional Disturbance child count per 1,000 students?
14. Which states are seeing the largest YoY increase in 988 + crisis line volume per capita?
## 3. Workforce
15. Which MSAs have the highest YoY wage growth for psychiatrists (SOC 29-1223) — indicates a shortage?
16. Which MSAs have psychiatrist employment per 100k in the bottom quartile AND mental health HPSA coverage in the worst quartile?
17. Where are LCSW/LMHC wages spiking (21-1014, 21-1018) while employment is flat?
## 4. Financial / opportunity
18. What is the median psych Medicare margin (revenue - cost) per discharge, by state, from MCR data?
19. Which for-profit IPF chains are expanding fastest (opened_date + chain_id from nursing home join)?
20. Which counties have the biggest gap between HPSA score and SAM.gov / state contract dollars flowing in (underinvested vs need)?
21. What are the median acquisition multiples for BH facilities in each state? (Requires later enrichment.)
## 5. Adolescent transport / crisis (specific focus)
22. Which counties dispatch the most EMS runs coded "behavioral/psych" per 10k adolescents (NEMSIS, when access granted)?
23. Where do adolescent psychiatric holds most frequently result in out-of-county or out-of-state transport (indicates no local capacity)?
24. Which states have the longest average ED boarding time for adolescents awaiting inpatient psych admission (via AHRQ + state HAI reports)?
25. Which states have dedicated secure transport statute/reimbursement (`bhi_policy_events` filter on "secure transport") — these are bluefields for BH transport vendors?
26. Which counties combine: high adolescent suicide rate + no in-county adolescent psych beds + high ED boarding = highest-need adolescent transport markets?
27. Which chains/operators already provide adolescent secure transport and where are their service gaps (via scraping state BHO contract registries)?
## 6. Regulatory / tailwind
28. Which states passed Medicaid rate increases for BH residential in the last 24 months?
29. Which states expanded the definition of "mobile crisis response" to include adolescents in the last 24 months?
30. Where are IMD exclusion waivers (Section 1115 SMI/SED waivers) active or pending?
## 7. Composite / prioritization
31. Top 10 states ranked by composite_score for "adolescent inpatient psychiatric"?
32. Top 50 counties ranked by composite_score for "young adult residential SUD"?
33. Top 20 MSAs ranked by composite_score for "outpatient adolescent therapy (IOP/PHP)"?
34. For each of the top 10 composite-score opportunities, list: (a) top 3 operators already there, (b) workforce wage growth, (c) most recent policy event, (d) closest open SAM.gov opportunity.
---
**Acceptance criteria:** When the BHI layer is live and all Phase A sources are ingested, a user should be able to run SQL or ask the Brain's natural-language interface these 34 questions and get a grounded answer with citations to the underlying `bhi_*` tables.