Access and Work with HCUP Resources and Datasets • HCUPtools

HCUPtools is a comprehensive R package for accessing and working with publicly available resources from the Agency for Healthcare Research and Quality (AHRQ) Healthcare Cost and Utilization Project (HCUP). The package provides streamlined access to HCUP’s Clinical Classifications Software Refined (CCSR) mapping files and Summary Trend Tables, enabling researchers and analysts to efficiently map ICD-10 codes to CCSR categories and access HCUP statistical reports.

Important Disclaimer: This package is an independent, non-commercial tool developed by a third party. It is not affiliated with, endorsed by, or supported by AHRQ or HCUP in any way. This package is not an official AHRQ or HCUP product.

Overview

HCUPtools facilitates access to free, publicly available HCUP resources, including:

CCSR Mapping Files: Clinical Classifications Software Refined (CCSR) for ICD-10-CM diagnosis codes and ICD-10-PCS procedure codes
HCUP Summary Trend Tables: Aggregated statistical reports on hospital utilization trends

The package does not redistribute HCUP data files but facilitates direct download from the official AHRQ HCUP website, ensuring users always have access to the latest versions and maintain compliance with HCUP data use policies.

Key Features

Direct Download: Automatically download CCSR mapping files and Summary Trend Tables from HCUP
ICD-10 Mapping: Map ICD-10-CM diagnosis codes and ICD-10-PCS procedure codes to CCSR categories
Multiple Formats: Support for long, wide, and default-only output formats
Cross-Classification: Handle one-to-many mappings (multiple CCSR categories per ICD-10 code)
Version Management: Access multiple CCSR versions and change logs
Citation Generation: Automatically generate proper AHRQ/HCUP citations
File Reading: Read downloaded files from disk (ZIP, CSV, Excel, or directories)
Caching: Intelligent caching to avoid redundant downloads

Installation

# Install from CRAN
install.packages("HCUPtools")

# Load the package
library(HCUPtools)

Quick Start

Basic Workflow

# 1. Download the latest diagnosis CCSR mapping file
dx_map <- download_ccsr("diagnosis")

# 2. Create sample patient data with ICD-10 codes
sample_data <- tibble::tibble(
  patient_id = 1:5,
  admission_date = as.Date(c("2024-01-15", "2024-02-20", "2024-03-10", 
                              "2024-04-05", "2024-05-12")),
  icd10_code = c("E11.9", "I10", "M79.3", "E78.5", "K21.9")
)

# 3. Map ICD-10 codes to CCSR categories
mapped_data <- ccsr_map(
  data = sample_data,
  code_col = "icd10_code",
  map_df = dx_map
)

# 4. Get CCSR category descriptions
descriptions <- get_ccsr_description(
  c("ADM010", "NEP003", "CIR019"), 
  map_df = dx_map
)

# 5. Generate proper citation
hcup_citation()

Comprehensive Usage Examples

1. Downloading CCSR Mapping Files

# Download latest version (recommended)
dx_map <- download_ccsr("diagnosis")
pr_map <- download_ccsr("procedure")

# Download specific version
dx_map_v2025 <- download_ccsr("diagnosis", version = "v2025.1")

# List all available versions
all_versions <- list_ccsr_versions()
print(all_versions)

# List versions for specific type
dx_versions <- list_ccsr_versions("diagnosis")
pr_versions <- list_ccsr_versions("procedure")

2. Mapping ICD-10 Codes to CCSR Categories

The ccsr_map() function supports three output formats to accommodate different analytical needs:

Long Format (Default)

Best for cross-classification analysis where you need to count all assigned CCSR categories:

# Long format - duplicates records for each CCSR category
mapped_long <- ccsr_map(
  data = sample_data,
  code_col = "icd10_code",
  map_df = dx_map,
  output_format = "long"  # default
)

# Result: One row per ICD-10 code per CCSR category
# Useful for: Counting occurrences of each CCSR category

Wide Format

Best for patient-level analysis where you want all categories in one row:

# Wide format - multiple CCSR columns
mapped_wide <- ccsr_map(
  data = sample_data,
  code_col = "icd10_code",
  map_df = dx_map,
  output_format = "wide"
)

# Result: One row per ICD-10 code with columns CCSR_1, CCSR_2, etc.
# Useful for: Patient-level analysis, keeping all categories together

Default Category Only

Best for principal diagnosis analysis:

# Default category only (diagnosis codes only)
mapped_default <- ccsr_map(
  data = sample_data,
  code_col = "icd10_code",
  map_df = dx_map,
  default_only = TRUE
)

# Result: One row per ICD-10 code with only the default CCSR category
# Useful for: Principal diagnosis analysis, following HCUP recommendations

3. Working with Procedure Codes

# Download procedure mapping
pr_map <- download_ccsr("procedure")

# Create sample procedure data
procedure_data <- tibble::tibble(
  case_id = 1:3,
  icd10_pcs = c("0DB60ZZ", "0DT70ZZ", "0WQ3XZ")
)

# Map procedure codes
mapped_procedures <- ccsr_map(
  data = procedure_data,
  code_col = "icd10_pcs",
  map_df = pr_map
)

4. Accessing CCSR Descriptions

# Get descriptions for specific CCSR codes
ccsr_codes <- c("ADM010", "NEP003", "CIR019", "END001")
descriptions <- get_ccsr_description(ccsr_codes, map_df = dx_map)
print(descriptions)

# Auto-download if mapping not provided
descriptions_auto <- get_ccsr_description(
  c("ADM010", "NEP003"), 
  type = "diagnosis"
)

5. Downloading HCUP Summary Trend Tables

# List all available tables (interactive menu)
available_tables <- download_trend_tables()

# Download a specific table by ID
table_path <- download_trend_tables("2a")  # All inpatient discharges

# Download all tables as ZIP file (~81 MB)
all_tables <- download_trend_tables("all")

6. Reading Downloaded Files

If you’ve already downloaded files, read them directly:

# Read CCSR file from various formats
dx_map <- read_ccsr("path/to/DXCCSR-v2026-1.zip")
dx_map <- read_ccsr("path/to/DXCCSR_v2026-1.csv")
dx_map <- read_ccsr("path/to/DXCCSR_v2026-1.xlsx")
dx_map <- read_ccsr("path/to/extracted_directory/")

# Read trend table Excel file
national_data <- read_trend_table(
  "path/to/HCUP_SummaryTrendTables_T2a.xlsx",
  sheet = "National"
)

# List available sheets
sheets <- list_trend_table_sheets(
  "path/to/HCUP_SummaryTrendTables_T2a.xlsx"
)

# Read specific state data
state_data <- read_trend_table(
  "path/to/HCUP_SummaryTrendTables_T2a.xlsx",
  sheet = "California"
)

7. Viewing CCSR Change Logs

# Get change log as data table (default)
changelog <- ccsr_changelog(version = "v2026.1")
print(changelog)

# Get change log URL
changelog_url <- ccsr_changelog(version = "v2026.1", format = "url")

# View change log in default PDF viewer
ccsr_changelog(version = "v2026.1", format = "view")

# Download change log file
changelog_file <- ccsr_changelog(version = "v2026.1", format = "download")

8. Generating Citations

Always cite HCUP data properly in publications:

# Text citation for CCSR
cat(hcup_citation())

# Citation for Summary Trend Tables
cat(hcup_citation(resource = "trend_tables"))

# BibTeX format (for LaTeX documents)
cat(hcup_citation(format = "bibtex"))

# R citation object (for R markdown)
citation_obj <- hcup_citation(format = "r")
print(citation_obj)

Complete Function Reference

Function	Description
`download_ccsr()`	Download CCSR mapping files from HCUP website
`read_ccsr()`	Read CCSR mapping files from disk (ZIP, CSV, Excel, or directory)
`ccsr_map()`	Map ICD-10 codes to CCSR categories (long/wide/default formats)
`get_ccsr_description()`	Get clinical descriptions for CCSR codes
`list_ccsr_versions()`	List available CCSR versions
`ccsr_changelog()`	Get CCSR change log for a specific version
`download_trend_tables()`	Download HCUP Summary Trend Tables
`read_trend_table()`	Read HCUP Summary Trend Table Excel files from disk
`list_trend_table_sheets()`	List available sheets in a trend table file
`hcup_citation()`	Generate citations for HCUP resources (CCSR or Trend Tables)

Important Legal and Compliance Information

Data Usage Compliance

This package facilitates access to publicly available and free HCUP resources:

CCSR Mapping Files - Classification software tools (free download from HCUP Tools & Software)
HCUP Summary Trend Tables - Aggregated statistical reports (free download from HCUP Reports)

Critical: This package does NOT access any HCUP databases (NIS, KID, SID, NEDS, etc.) that require purchase through the HCUP Central Distributor. All resources accessed by this package are freely available public tools and reports.

User Responsibilities

Users are responsible for:

Understanding and complying with all applicable AHRQ/HCUP data usage policies and restrictions
Verifying the accuracy of results
Citing the appropriate AHRQ/HCUP sources in publications
Ensuring compliance with all HCUP Data Use Agreements (DUAs)

Essential Resources

Disclaimer

This package is an independent, non-commercial tool developed by a third party. It is not affiliated with, endorsed by, or supported by AHRQ or HCUP in any way. This package is not an official AHRQ or HCUP product. The developers of this package have no relationship with AHRQ or HCUP.

This package is provided “as is” without warranty of any kind. Users are solely responsible for verifying the accuracy of results and ensuring compliance with all applicable HCUP data use policies.

For official HCUP information, visit: https://hcup-us.ahrq.gov/

Technical Details

Dependencies

Core Dependencies: - httr2: HTTP requests for downloading files from HCUP website - readr: Reading CSV files with proper encoding - dplyr: Data manipulation and joins - tidyr: Data reshaping (long/wide formats) - tibble: Modern data frame format - rlang: Advanced R programming utilities - xml2: HTML/XML parsing for web scraping

Optional Dependencies: - readxl: Reading Excel files (for trend tables) - data.table: High-performance data operations (optional) - pdftools: PDF text extraction (for change logs)

Data Handling Features

Encoding: Files are read with UTF-8 encoding to handle special characters correctly
Leading Zeros: ICD-10 codes are preserved as character strings to maintain leading zeros (e.g., “E11.9” not “E11.9”)
Caching: Downloaded files are cached by default to avoid redundant network requests
Cross-Classification: The package handles one-to-many mappings (multiple CCSR categories per ICD-10 code)
Version Management: Automatic detection of latest versions and support for historical versions

Performance Considerations

Caching: First download requires internet connection; subsequent calls use cached files
Large Files: CCSR mapping files are ~75,000 rows; Summary Trend Tables can be large Excel files
Memory: All data is loaded into memory as tibbles; consider data.table option for very large datasets

Citation

If you use this package in your research, please cite both the package and the HCUP data:

# Package citation
citation("HCUPtools")

# CCSR data citation (automatically formatted)
hcup_citation()

# Summary Trend Tables citation
hcup_citation(resource = "trend_tables")

Contributing

Contributions are welcome! Please feel free to:

Submit issues and bug reports
Propose new features
Submit pull requests
Improve documentation

Acknowledgments

AHRQ/HCUP: For providing the CCSR classification tool and making it publicly available
R Community: For the excellent tools and packages that make this work possible

Contact

Package Maintainer: Vikrant Dev Rathore
GitHub: vikrant31/HCUPtools
Issues: GitHub Issues

Note: This package is under active development. Features and function signatures may change in future versions. For the most up-to-date information, please visit the GitHub repository.

HCUPtools: Access and Work with HCUP Resources