# BHI Data Sources All endpoints tested 2026-04-04 unless noted. "Tested: OK" means a live curl returned valid data. Scope: behavioral health facilities, demand indicators, workforce, shortages, and policy for all 50 US states, tagged by age bracket (adolescent 13-17, young adult 18-25). --- ## PHASE A — Free, autonomous, ready to ingest ### 1. CMS IPFQR (Inpatient Psychiatric Facility Quality Reporting) - **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/{dataset_id}/0` - **Dataset IDs:** - `q9vs-r7wp` — IPFQR by Facility - `dc76-gh7x` — IPFQR by State - `s5xg-sys6` — IPFQR National - **Auth:** None - **Rate limit:** None documented; be polite (<= 5 req/sec) - **Update frequency:** Quarterly - **Record count:** ~1,600 IPFs (facility file); dozens of measures each - **Key fields:** `facility_id`, `facility_name`, `address`, `state`, `zip`, `countyparish`, HBIPS-2/3 restraint+seclusion, SMD, SUB-2/3, TOB-3, transition record, 30-day readmission - **Test curl (OK):** ``` curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0?limit=2" ``` - **Python snippet:** ```python import requests r = requests.get("https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0", params={"limit": 500, "offset": 0}) rows = r.json()["results"] ``` ### 2. CMS Hospital Compare / Care Compare (general hospital info) - **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0` - **Auth:** None | **Rate limit:** none | **Update:** Monthly - **Records:** ~5,300 hospitals - **Key fields:** `facility_id` (CCN), `facility_name`, `hospital_type`, `hospital_ownership`, `hospital_overall_rating`, mortality/safety/readmission group flags - **Test (OK):** ``` curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0?limit=2" ``` - Use to classify which acute hospitals have behavioral health units (cross-join with IPFQR CCNs). ### 3. CMS Provider of Services (POS) file - **Bulk page:** `https://data.cms.gov/provider-characteristics/hospitals-and-other-facilities/provider-of-services-file-quality-improvement-and-evaluation-system` - **JSON catalog:** `https://data.cms.gov/data.json` (search `dataset[].title` = "Provider of Services File") - **Auth:** None | **Update:** Quarterly | **Format:** CSV bulk - **Records:** ~80,000 Medicare-certified facilities (includes PSY, PRTF, hospitals) - **Key fields:** CCN, provider category, bed count, certification date, termination date, ownership - **Test (OK):** `curl -s "https://data.cms.gov/data.json"` — dataset list - Required for bed counts and termination (closure) tracking. ### 4. CMS Nursing Home Compare (Provider Information) - **Endpoint:** `https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0` - **Auth:** None | **Update:** Monthly - **Records:** ~15,000 nursing homes - **Key fields:** CCN, provider_name, ownership, number_of_certified_beds, overall rating, chain info - **Test (OK):** `curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0?limit=2"` - Used to capture residential behavioral health (SNFs frequently host psych/BH residents). ### 5. SAMHSA Treatment Locator (findtreatment.gov) - **Endpoint:** `https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr={zip}` - **Auth:** None (browser UA helps but not required for JSON export) - **Rate limit:** None documented; HEAD returns 403 but GET returns 200 — use GET only - **Update:** Continuous (SAMHSA-maintained) - **Records:** ~96,000 BH treatment facilities (all service types) - **Key fields:** name1/name2, street, city, state, zip, phone, intake, hotline, website, lat, lon, services, typeFacility - **Test (OK):** ``` curl -s "https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr=10001" ``` Response: `{"page":1,"totalPages":3201,"recordCount":96009,"rows":[...]}` - **Python snippet:** ```python import requests, time def fetch_all(zip_seed="10001"): base = "https://findtreatment.gov/locator/exportsAsJson/v2" page = 1 while True: r = requests.get(base, params={"sType":"BH","sAddr":zip_seed,"pageSize":30,"page":page}) d = r.json() yield from d["rows"] if page >= d["totalPages"]: break page += 1 time.sleep(0.3) ``` ### 6. SAMHSA N-SSATS + N-MHSS - **Bulk:** `https://www.samhsa.gov/data/data-we-collect/n-ssats/datafiles` and `/n-mhss/datafiles` - **Auth:** None | **Update:** Annual | **Format:** SAS / SPSS / CSV - **Records:** N-SSATS ~16,000 SUD facilities/year; N-MHSS ~12,000 MH facilities/year - **Key fields:** facility id, services, payment accepted, populations served (including adolescent/young adult flags), bed counts, ownership - **Note:** Bulk ZIPs; no live API. Staged as manual-download job. ### 7. CDC WONDER (mortality — suicide, overdose, by county, age) - **Endpoint:** `https://wonder.cdc.gov/controller/datarequest/D76` (Underlying Cause of Death) — POST XML - **Auth:** None for non-restricted datasets; county-level suppressed for <10 deaths - **Update:** Annual - **Records:** All US mortality; we pull ICD-10 X60-X84 (suicide) + X40-X44/Y10-Y14 (overdose) by county, 13-17 and 18-25 - **Test (OK):** landing page returns 200; POST XML required for data. See job stub `wonder_mortality.py` for the working XML template. ### 8. CDC BRFSS - **Endpoint (Socrata):** `https://data.cdc.gov/resource/dttw-5yxu.json` - **Auth:** None (Socrata app token optional for higher limits) | **Update:** Annual - **Records:** ~100k rows/year (state x question x breakout) - **Test (OK):** ``` curl -s "https://data.cdc.gov/resource/dttw-5yxu.json?$limit=2" ``` Returns depression prevalence, mental health days, etc. by state+demographic. ### 9. CDC YRBSS (Youth Risk Behavior Survey) - **Endpoints (Socrata, verified present via catalog):** - High school: `https://data.cdc.gov/resource/3qty-g4aq.json` - Middle school: `https://data.cdc.gov/resource/uqmk-4y2w.json` - **Auth:** None | **Update:** Biennial - **Records:** State + large urban district level; ~50k rows - **Key fields:** suicidal ideation, attempt, persistent sadness, substance use — exactly the adolescent demand signal we need. ### 10. IDEA Part B data (Emotional Disturbance by district) - **Landing:** `https://www2.ed.gov/programs/osepidea/618-data/static-tables/index.html` - **Auth:** None | **Format:** CSV static tables | **Update:** Annual - **Records:** ~14,000 school districts + state rollups - **Key fields:** Child count under ED classification, ages 6-21, by state and LEA - **Note:** Static CSVs; no API. Download script documents exact file URLs. ### 11. NSCH (National Survey of Children's Health) via HRSA - **Landing:** `https://www.childhealthdata.org/browse/survey` and `https://mchb.hrsa.gov/data-research/national-survey-childrens-health` - **Bulk (HRSA):** `https://mchb.hrsa.gov/sites/default/files/nsch/datafiles/` (year-specific) - **Auth:** None | **Update:** Annual | **Format:** SAS / Stata / CSV - **Records:** ~50k surveyed children, weighted to state-level estimates - **Key fields:** anxiety, depression, behavioral problems, received treatment, unmet need — by state x age. ### 12. BLS OES (behavioral health workforce by MSA) - **API:** `https://api.bls.gov/publicAPI/v2/timeseries/data/` (POST JSON) - **Auth:** Free registration key for >25 series/day (`https://data.bls.gov/registrationEngine/`). Without key: 25 series/query, 10 years/query, no key required but lower limits. - **Update:** Annual (May reference period) - **Series ID pattern:** `OEUM{area}{industry}{occupation}{datatype}` - **Relevant SOC codes:** - 29-1223 Psychiatrists - 29-1229 Other Physicians (incl. addiction medicine) - 21-1014 Mental Health Counselors - 21-1015 Rehabilitation Counselors - 21-1018 Substance Abuse/Behavioral Disorder Counselors - 21-1022 Mental Health and SUD Social Workers - 19-3033 Clinical & Counseling Psychologists - **Test (OK):** BLS API responds (test hit confirmed structure; real series IDs required) - **Bulk alternative:** `https://www.bls.gov/oes/special-requests/oesm{YY}ma.zip` (annual bulk by MSA) — no auth, ~50MB zip. ### 13. HRSA Mental Health HPSAs - **Bulk CSV (verified):** `https://data.hrsa.gov/DataDownload/DD_Files/BCD_HPSA_FCT_DET_MH.csv` - **Size:** ~23 MB - **Auth:** None | **Update:** Continuous (weekly snapshots) - **Records:** ~6,500 active MH HPSAs + historical - **Key fields:** HPSA ID, designation type, discipline (MH), score (0-25), state, county FIPS via HPSA Geography ID, population, designation date, withdrawn date, lat/lon - **Test (OK):** HTTP 200, 23 MB CSV returned. ### 14. CMS NPPES (National Plan & Provider Enumeration System) - **API:** `https://npiregistry.cms.hhs.gov/api/?version=2.1` - **Auth:** None | **Rate limit:** ~200 req/sec soft; 200 results max per query — paginate with `skip` - **Update:** Daily - **Records:** ~8 million NPIs; filter by taxonomy for behavioral health (~500k) - **Relevant taxonomy codes:** - 2084P0800X Psychiatry & Neurology - Psychiatry - 2084P0802X Addiction Psychiatry - 2084P0804X Child & Adolescent Psychiatry - 103T00000X Psychologist - 101YM0800X Mental Health Counselor - 103TC2200X Clinical Child & Adolescent Psychologist - 1041C0700X Clinical Social Worker - 324500000X Substance Abuse Rehabilitation Facility - 283Q00000X Psychiatric Hospital - 323P00000X Psychiatric Residential Treatment Facility - **Test (OK):** ``` curl -s "https://npiregistry.cms.hhs.gov/api/?version=2.1&taxonomy_description=psychiatric&state=NY&limit=2" ``` --- ## PHASE B — Requires application or registration ### 15. HCUP (AHRQ) - **Landing:** `https://hcup-us.ahrq.gov/tech_assist/centdist.jsp` - **Auth:** Data Use Agreement (DUA) required; free for research but application-based (~2-4 weeks) - **Records:** State inpatient/ED/ASC databases, ~40M discharges/yr nationally - **Action required:** Submit DUA + Data Use Training certificate. **BLOCKED until user applies.** ### 16. CMS Medicare Cost Reports (MCR) - **Bulk:** `https://www.cms.gov/data-research/statistics-trends-and-reports/cost-reports` (HOSPITAL2010 format) - **Auth:** None; just large downloads (~1-3 GB per year) - **Update:** Quarterly rolling - **Records:** ~6,000 hospital cost reports/year (CCN-level) - Staged as a fetch-and-parse job (uses `ccn` to join with `bhi_facilities`). ### 17. NEMSIS state crisis transport data - **Landing:** `https://nemsis.org/using-ems-data/request-research-data/` - **Auth:** Research Data Request (application) — typically 4-8 weeks - **BLOCKED until user applies.** ### 18. California HCAI (patient discharge data) - **Endpoint:** `https://hcai.ca.gov/data-and-reports/cost-transparency/` and `https://data.chhs.ca.gov/dataset?q=pdd` - **Auth:** Free (some files direct download; Limited Data Set requires DUA) - **Update:** Annual - **Records:** ~3.5M CA discharges/yr; psych DRGs extractable ### 19. NY SPARCS - **Landing:** `https://www.health.ny.gov/statistics/sparcs/` - **Auth:** Application for identified data; deidentified file free via `health.data.ny.gov` - **Deidentified endpoint:** `https://health.data.ny.gov/resource/u4ud-w55t.json` (Hospital Inpatient Discharges) - **Records:** ~2.5M NY discharges/yr ### 20. TX DSHS discharge data - **Landing:** `https://www.dshs.texas.gov/texas-health-care-information-collection/health-data-researcher-information/texas-inpatient-public-use` - **Auth:** Free (Public Use File is a direct download after click-through) - **Records:** ~3M TX discharges/yr ### 21. FL AHCA discharge data - **Landing:** `https://ahca.myflorida.com/health-care-policy-and-oversight/bureau-of-central-services/florida-center-for-health-information-and-transparency/data-analytics/order-data` - **Auth:** Application form + fee for identified; aggregate free - **BLOCKED until user applies for identified.** --- ## PHASE C — State RTF licensing databases ### 22. State-by-state RTF licensing scrapers Scope: residential treatment facilities serving adolescents. One scraper per state. Verified public-search portals (no auth, scrape-friendly HTML/JSON): - **UT** — `https://hslic.utah.gov/` (Human Services License Information Lookup) - **CA** — `https://www.ccld.dss.ca.gov/transparencyapi/api/facilities` (Community Care Licensing API) - **TX** — `https://www.hhs.texas.gov/providers/long-term-care-providers/childrens-residential-facility-reimbursement-methodology` + search portal - **FL** — `https://apps.myflfamilies.com/provider/` (DCF provider search) - **NY** — `https://omh.ny.gov/omhweb/resources/providers/` (OMH provider directory) - **MT** — `https://dphhs.mt.gov/qad/licensure/licensedfacilitieslist` (static list) - **AZ** — `https://azcarecheck.azdhs.gov/` (public search) - **CO** — `https://apps.colorado.gov/apps/oapa/licensee.aspx` (Office of Early Childhood) - **OR** — `https://ccld.oregon.gov/ccld/search/` (Care Provider Directory) - **WA** — `https://fortress.wa.gov/dshs/adsaapps/lookup/` (LTC lookup) - **IL** — `https://www2.illinois.gov/dcfs/brighterfutures/Pages/default.aspx` - **MA** — `https://www.mass.gov/lists/licensed-residential-treatment-programs` - **PA** — `https://www.dhs.pa.gov/Services/Assistance/Pages/Child-Residential-Facility.aspx` States requiring FOIA / no public portal (documented as BLOCKED for Phase C v1): - AL, AK, AR, DE, GA, HI, ID, IN, IA, KS, KY, LA, ME, MD, MI, MN, MS, MO, NE, NV, NH, NJ, NM, NC, ND, OH, OK, RI, SC, SD, TN, VT, VA, WV, WI, WY The scraper job stub lists URL patterns for the 13 verified states and marks the rest "FOIA required." --- ## Test results summary (Phase A) | # | Source | Status | Notes | |---|--------|--------|-------| | 1 | CMS IPFQR | OK | q9vs-r7wp returned facility rows | | 2 | CMS Hospital Compare | OK | xubh-q36u returned | | 3 | CMS POS | OK | catalog reachable, bulk CSV | | 4 | CMS Nursing Home | OK | 4pq5-n9py returned | | 5 | SAMHSA Locator | OK | 96,009 records confirmed | | 6 | SAMHSA N-SSATS/N-MHSS | OK (bulk) | ZIP download, no API | | 7 | CDC WONDER | OK | POST XML required, landing 200 | | 8 | CDC BRFSS | OK | Socrata JSON returned | | 9 | CDC YRBSS | OK | 3qty-g4aq + uqmk-4y2w | | 10 | IDEA Part B | OK (static) | Static CSV; no API | | 11 | NSCH | OK (bulk) | HRSA year files | | 12 | BLS OES | OK | API responds; needs real series IDs | | 13 | HRSA HPSA MH | OK | 23 MB CSV download confirmed | | 14 | NPPES | OK | 2 results returned for NY psych | Blocked until auth/application: - HCUP (DUA), NEMSIS (application), FL AHCA identified, NY SPARCS identified.