Getting Started
This guide will help you get started with comorbidipy for calculating clinical comorbidity scores from ICD codes.
Installation
From PyPI
From source
Basic Usage
Preparing Your Data
comorbidipy expects a DataFrame with at least two columns:
- id: Patient or episode identifier
- code: ICD-9 or ICD-10 diagnosis codes
import polars as pl
# Example data format
df = pl.DataFrame({
"id": ["P001", "P001", "P001", "P002", "P002"],
"code": ["I21", "E112", "I50", "J44", "K703"],
"age": [65, 65, 65, 72, 72], # Optional for age-adjusted scores
})
Calculating Charlson Comorbidity Index
from comorbidipy import comorbidity, ScoreType, MappingVariant, WeightingVariant
result = comorbidity(
df,
id_col="id",
code_col="code",
age_col="age", # Optional - enables age-adjusted score
score=ScoreType.CHARLSON,
variant=MappingVariant.QUAN,
weighting=WeightingVariant.CHARLSON,
)
print(result)
Calculating Elixhauser Comorbidity Index
result = comorbidity(
df,
id_col="id",
code_col="code",
score=ScoreType.ELIXHAUSER,
weighting=WeightingVariant.VAN_WALRAVEN,
)
Calculating Hospital Frailty Risk Score
from comorbidipy import hfrs
# HFRS only requires id and code
result = hfrs(df, id_col="id", code_col="code")
Identifying Disabilities
Command Line Interface
comorbidipy provides a full-featured CLI for processing files directly:
# View help
comorbidipy --help
# Calculate Charlson score
comorbidipy charlson input.csv output.parquet
# With options
comorbidipy charlson input.csv output.csv \
--id-col patient_id \
--code-col diagnosis_code \
--age-col patient_age \
--mapping quan \
--weights charlson
# Calculate HFRS
comorbidipy hfrs-cmd input.parquet output.parquet
# Show available options
comorbidipy info
Supported File Formats
| Format | Read | Write | Streaming |
|---|---|---|---|
| CSV | ✅ | ✅ | ✅ |
| Parquet | ✅ | ✅ | ✅ |
| JSON | ✅ | ✅ | ❌ |
| NDJSON | ✅ | ✅ | ✅ |
| Avro | ✅ | ✅ | ❌ |
Performance Tips
Processing Large Files
For files that don't fit in memory, use streaming mode:
Use Parquet Format
Parquet files are significantly faster to read/write than CSV:
# Read Parquet
df = pl.read_parquet("data.parquet")
# Or use CLI
comorbidipy charlson input.parquet output.parquet
LazyFrame for Deferred Computation
# Use LazyFrame for memory efficiency
lazy_df = pl.scan_parquet("large_file.parquet")
result = comorbidity(lazy_df, id="id", code="code", age=None)
Next Steps
- Charlson Comorbidity Index - Detailed documentation
- CLI Reference - Full command-line options
- API Reference - Python API documentation