Files
economic-brain-bhi/docs/sources.md
2026-04-05 20:15:36 +00:00

14 KiB

BHI Data Sources

All endpoints tested 2026-04-04 unless noted. "Tested: OK" means a live curl returned valid data.

Scope: behavioral health facilities, demand indicators, workforce, shortages, and policy for all 50 US states, tagged by age bracket (adolescent 13-17, young adult 18-25).


PHASE A — Free, autonomous, ready to ingest

1. CMS IPFQR (Inpatient Psychiatric Facility Quality Reporting)

  • Endpoint: https://data.cms.gov/provider-data/api/1/datastore/query/{dataset_id}/0
  • Dataset IDs:
    • q9vs-r7wp — IPFQR by Facility
    • dc76-gh7x — IPFQR by State
    • s5xg-sys6 — IPFQR National
  • Auth: None
  • Rate limit: None documented; be polite (<= 5 req/sec)
  • Update frequency: Quarterly
  • Record count: ~1,600 IPFs (facility file); dozens of measures each
  • Key fields: facility_id, facility_name, address, state, zip, countyparish, HBIPS-2/3 restraint+seclusion, SMD, SUB-2/3, TOB-3, transition record, 30-day readmission
  • Test curl (OK):
    curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0?limit=2"
    
  • Python snippet:
    import requests
    r = requests.get("https://data.cms.gov/provider-data/api/1/datastore/query/q9vs-r7wp/0",
                     params={"limit": 500, "offset": 0})
    rows = r.json()["results"]
    

2. CMS Hospital Compare / Care Compare (general hospital info)

  • Endpoint: https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0
  • Auth: None | Rate limit: none | Update: Monthly
  • Records: ~5,300 hospitals
  • Key fields: facility_id (CCN), facility_name, hospital_type, hospital_ownership, hospital_overall_rating, mortality/safety/readmission group flags
  • Test (OK):
    curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/xubh-q36u/0?limit=2"
    
  • Use to classify which acute hospitals have behavioral health units (cross-join with IPFQR CCNs).

3. CMS Provider of Services (POS) file

  • Bulk page: https://data.cms.gov/provider-characteristics/hospitals-and-other-facilities/provider-of-services-file-quality-improvement-and-evaluation-system
  • JSON catalog: https://data.cms.gov/data.json (search dataset[].title = "Provider of Services File")
  • Auth: None | Update: Quarterly | Format: CSV bulk
  • Records: ~80,000 Medicare-certified facilities (includes PSY, PRTF, hospitals)
  • Key fields: CCN, provider category, bed count, certification date, termination date, ownership
  • Test (OK): curl -s "https://data.cms.gov/data.json" — dataset list
  • Required for bed counts and termination (closure) tracking.

4. CMS Nursing Home Compare (Provider Information)

  • Endpoint: https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0
  • Auth: None | Update: Monthly
  • Records: ~15,000 nursing homes
  • Key fields: CCN, provider_name, ownership, number_of_certified_beds, overall rating, chain info
  • Test (OK): curl -s "https://data.cms.gov/provider-data/api/1/datastore/query/4pq5-n9py/0?limit=2"
  • Used to capture residential behavioral health (SNFs frequently host psych/BH residents).

5. SAMHSA Treatment Locator (findtreatment.gov)

  • Endpoint: https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr={zip}
  • Auth: None (browser UA helps but not required for JSON export)
  • Rate limit: None documented; HEAD returns 403 but GET returns 200 — use GET only
  • Update: Continuous (SAMHSA-maintained)
  • Records: ~96,000 BH treatment facilities (all service types)
  • Key fields: name1/name2, street, city, state, zip, phone, intake, hotline, website, lat, lon, services, typeFacility
  • Test (OK):
    curl -s "https://findtreatment.gov/locator/exportsAsJson/v2?sType=BH&sAddr=10001"
    
    Response: {"page":1,"totalPages":3201,"recordCount":96009,"rows":[...]}
  • Python snippet:
    import requests, time
    def fetch_all(zip_seed="10001"):
        base = "https://findtreatment.gov/locator/exportsAsJson/v2"
        page = 1
        while True:
            r = requests.get(base, params={"sType":"BH","sAddr":zip_seed,"pageSize":30,"page":page})
            d = r.json()
            yield from d["rows"]
            if page >= d["totalPages"]: break
            page += 1
            time.sleep(0.3)
    

6. SAMHSA N-SSATS + N-MHSS

  • Bulk: https://www.samhsa.gov/data/data-we-collect/n-ssats/datafiles and /n-mhss/datafiles
  • Auth: None | Update: Annual | Format: SAS / SPSS / CSV
  • Records: N-SSATS ~16,000 SUD facilities/year; N-MHSS ~12,000 MH facilities/year
  • Key fields: facility id, services, payment accepted, populations served (including adolescent/young adult flags), bed counts, ownership
  • Note: Bulk ZIPs; no live API. Staged as manual-download job.

7. CDC WONDER (mortality — suicide, overdose, by county, age)

  • Endpoint: https://wonder.cdc.gov/controller/datarequest/D76 (Underlying Cause of Death) — POST XML
  • Auth: None for non-restricted datasets; county-level suppressed for <10 deaths
  • Update: Annual
  • Records: All US mortality; we pull ICD-10 X60-X84 (suicide) + X40-X44/Y10-Y14 (overdose) by county, 13-17 and 18-25
  • Test (OK): landing page returns 200; POST XML required for data. See job stub wonder_mortality.py for the working XML template.

8. CDC BRFSS

  • Endpoint (Socrata): https://data.cdc.gov/resource/dttw-5yxu.json
  • Auth: None (Socrata app token optional for higher limits) | Update: Annual
  • Records: ~100k rows/year (state x question x breakout)
  • Test (OK):
    curl -s "https://data.cdc.gov/resource/dttw-5yxu.json?$limit=2"
    
    Returns depression prevalence, mental health days, etc. by state+demographic.

9. CDC YRBSS (Youth Risk Behavior Survey)

  • Endpoints (Socrata, verified present via catalog):
    • High school: https://data.cdc.gov/resource/3qty-g4aq.json
    • Middle school: https://data.cdc.gov/resource/uqmk-4y2w.json
  • Auth: None | Update: Biennial
  • Records: State + large urban district level; ~50k rows
  • Key fields: suicidal ideation, attempt, persistent sadness, substance use — exactly the adolescent demand signal we need.

10. IDEA Part B data (Emotional Disturbance by district)

  • Landing: https://www2.ed.gov/programs/osepidea/618-data/static-tables/index.html
  • Auth: None | Format: CSV static tables | Update: Annual
  • Records: ~14,000 school districts + state rollups
  • Key fields: Child count under ED classification, ages 6-21, by state and LEA
  • Note: Static CSVs; no API. Download script documents exact file URLs.

11. NSCH (National Survey of Children's Health) via HRSA

  • Landing: https://www.childhealthdata.org/browse/survey and https://mchb.hrsa.gov/data-research/national-survey-childrens-health
  • Bulk (HRSA): https://mchb.hrsa.gov/sites/default/files/nsch/datafiles/ (year-specific)
  • Auth: None | Update: Annual | Format: SAS / Stata / CSV
  • Records: ~50k surveyed children, weighted to state-level estimates
  • Key fields: anxiety, depression, behavioral problems, received treatment, unmet need — by state x age.

12. BLS OES (behavioral health workforce by MSA)

  • API: https://api.bls.gov/publicAPI/v2/timeseries/data/ (POST JSON)
  • Auth: Free registration key for >25 series/day (https://data.bls.gov/registrationEngine/). Without key: 25 series/query, 10 years/query, no key required but lower limits.
  • Update: Annual (May reference period)
  • Series ID pattern: OEUM{area}{industry}{occupation}{datatype}
  • Relevant SOC codes:
    • 29-1223 Psychiatrists
    • 29-1229 Other Physicians (incl. addiction medicine)
    • 21-1014 Mental Health Counselors
    • 21-1015 Rehabilitation Counselors
    • 21-1018 Substance Abuse/Behavioral Disorder Counselors
    • 21-1022 Mental Health and SUD Social Workers
    • 19-3033 Clinical & Counseling Psychologists
  • Test (OK): BLS API responds (test hit confirmed structure; real series IDs required)
  • Bulk alternative: https://www.bls.gov/oes/special-requests/oesm{YY}ma.zip (annual bulk by MSA) — no auth, ~50MB zip.

13. HRSA Mental Health HPSAs

  • Bulk CSV (verified): https://data.hrsa.gov/DataDownload/DD_Files/BCD_HPSA_FCT_DET_MH.csv
  • Size: ~23 MB
  • Auth: None | Update: Continuous (weekly snapshots)
  • Records: ~6,500 active MH HPSAs + historical
  • Key fields: HPSA ID, designation type, discipline (MH), score (0-25), state, county FIPS via HPSA Geography ID, population, designation date, withdrawn date, lat/lon
  • Test (OK): HTTP 200, 23 MB CSV returned.

14. CMS NPPES (National Plan & Provider Enumeration System)

  • API: https://npiregistry.cms.hhs.gov/api/?version=2.1
  • Auth: None | Rate limit: ~200 req/sec soft; 200 results max per query — paginate with skip
  • Update: Daily
  • Records: ~8 million NPIs; filter by taxonomy for behavioral health (~500k)
  • Relevant taxonomy codes:
    • 2084P0800X Psychiatry & Neurology - Psychiatry
    • 2084P0802X Addiction Psychiatry
    • 2084P0804X Child & Adolescent Psychiatry
    • 103T00000X Psychologist
    • 101YM0800X Mental Health Counselor
    • 103TC2200X Clinical Child & Adolescent Psychologist
    • 1041C0700X Clinical Social Worker
    • 324500000X Substance Abuse Rehabilitation Facility
    • 283Q00000X Psychiatric Hospital
    • 323P00000X Psychiatric Residential Treatment Facility
  • Test (OK):
    curl -s "https://npiregistry.cms.hhs.gov/api/?version=2.1&taxonomy_description=psychiatric&state=NY&limit=2"
    

PHASE B — Requires application or registration

15. HCUP (AHRQ)

  • Landing: https://hcup-us.ahrq.gov/tech_assist/centdist.jsp
  • Auth: Data Use Agreement (DUA) required; free for research but application-based (~2-4 weeks)
  • Records: State inpatient/ED/ASC databases, ~40M discharges/yr nationally
  • Action required: Submit DUA + Data Use Training certificate. BLOCKED until user applies.

16. CMS Medicare Cost Reports (MCR)

  • Bulk: https://www.cms.gov/data-research/statistics-trends-and-reports/cost-reports (HOSPITAL2010 format)
  • Auth: None; just large downloads (~1-3 GB per year)
  • Update: Quarterly rolling
  • Records: ~6,000 hospital cost reports/year (CCN-level)
  • Staged as a fetch-and-parse job (uses ccn to join with bhi_facilities).

17. NEMSIS state crisis transport data

  • Landing: https://nemsis.org/using-ems-data/request-research-data/
  • Auth: Research Data Request (application) — typically 4-8 weeks
  • BLOCKED until user applies.

18. California HCAI (patient discharge data)

  • Endpoint: https://hcai.ca.gov/data-and-reports/cost-transparency/ and https://data.chhs.ca.gov/dataset?q=pdd
  • Auth: Free (some files direct download; Limited Data Set requires DUA)
  • Update: Annual
  • Records: ~3.5M CA discharges/yr; psych DRGs extractable

19. NY SPARCS

  • Landing: https://www.health.ny.gov/statistics/sparcs/
  • Auth: Application for identified data; deidentified file free via health.data.ny.gov
  • Deidentified endpoint: https://health.data.ny.gov/resource/u4ud-w55t.json (Hospital Inpatient Discharges)
  • Records: ~2.5M NY discharges/yr

20. TX DSHS discharge data

  • Landing: https://www.dshs.texas.gov/texas-health-care-information-collection/health-data-researcher-information/texas-inpatient-public-use
  • Auth: Free (Public Use File is a direct download after click-through)
  • Records: ~3M TX discharges/yr

21. FL AHCA discharge data

  • Landing: https://ahca.myflorida.com/health-care-policy-and-oversight/bureau-of-central-services/florida-center-for-health-information-and-transparency/data-analytics/order-data
  • Auth: Application form + fee for identified; aggregate free
  • BLOCKED until user applies for identified.

PHASE C — State RTF licensing databases

22. State-by-state RTF licensing scrapers

Scope: residential treatment facilities serving adolescents. One scraper per state.

Verified public-search portals (no auth, scrape-friendly HTML/JSON):

  • UThttps://hslic.utah.gov/ (Human Services License Information Lookup)
  • CAhttps://www.ccld.dss.ca.gov/transparencyapi/api/facilities (Community Care Licensing API)
  • TXhttps://www.hhs.texas.gov/providers/long-term-care-providers/childrens-residential-facility-reimbursement-methodology + search portal
  • FLhttps://apps.myflfamilies.com/provider/ (DCF provider search)
  • NYhttps://omh.ny.gov/omhweb/resources/providers/ (OMH provider directory)
  • MThttps://dphhs.mt.gov/qad/licensure/licensedfacilitieslist (static list)
  • AZhttps://azcarecheck.azdhs.gov/ (public search)
  • COhttps://apps.colorado.gov/apps/oapa/licensee.aspx (Office of Early Childhood)
  • ORhttps://ccld.oregon.gov/ccld/search/ (Care Provider Directory)
  • WAhttps://fortress.wa.gov/dshs/adsaapps/lookup/ (LTC lookup)
  • ILhttps://www2.illinois.gov/dcfs/brighterfutures/Pages/default.aspx
  • MAhttps://www.mass.gov/lists/licensed-residential-treatment-programs
  • PAhttps://www.dhs.pa.gov/Services/Assistance/Pages/Child-Residential-Facility.aspx

States requiring FOIA / no public portal (documented as BLOCKED for Phase C v1):

  • AL, AK, AR, DE, GA, HI, ID, IN, IA, KS, KY, LA, ME, MD, MI, MN, MS, MO, NE, NV, NH, NJ, NM, NC, ND, OH, OK, RI, SC, SD, TN, VT, VA, WV, WI, WY

The scraper job stub lists URL patterns for the 13 verified states and marks the rest "FOIA required."


Test results summary (Phase A)

# Source Status Notes
1 CMS IPFQR OK q9vs-r7wp returned facility rows
2 CMS Hospital Compare OK xubh-q36u returned
3 CMS POS OK catalog reachable, bulk CSV
4 CMS Nursing Home OK 4pq5-n9py returned
5 SAMHSA Locator OK 96,009 records confirmed
6 SAMHSA N-SSATS/N-MHSS OK (bulk) ZIP download, no API
7 CDC WONDER OK POST XML required, landing 200
8 CDC BRFSS OK Socrata JSON returned
9 CDC YRBSS OK 3qty-g4aq + uqmk-4y2w
10 IDEA Part B OK (static) Static CSV; no API
11 NSCH OK (bulk) HRSA year files
12 BLS OES OK API responds; needs real series IDs
13 HRSA HPSA MH OK 23 MB CSV download confirmed
14 NPPES OK 2 results returned for NY psych

Blocked until auth/application:

  • HCUP (DUA), NEMSIS (application), FL AHCA identified, NY SPARCS identified.