Skip to content

Getting Started

This guide will help you get started with comorbidipy for calculating clinical comorbidity scores from ICD codes.

Installation

From PyPI

pip install comorbidipy

From source

git clone https://github.com/vvcb/comorbidipy.git
cd comorbidipy
pip install -e .

Basic Usage

Preparing Your Data

comorbidipy expects a DataFrame with at least two columns:

  • id: Patient or episode identifier
  • code: ICD-9 or ICD-10 diagnosis codes
import polars as pl

# Example data format
df = pl.DataFrame({
    "id": ["P001", "P001", "P001", "P002", "P002"],
    "code": ["I21", "E112", "I50", "J44", "K703"],
    "age": [65, 65, 65, 72, 72],  # Optional for age-adjusted scores
})

Calculating Charlson Comorbidity Index

from comorbidipy import comorbidity, ScoreType, MappingVariant, WeightingVariant

result = comorbidity(
    df,
    id_col="id",
    code_col="code",
    age_col="age",  # Optional - enables age-adjusted score
    score=ScoreType.CHARLSON,
    variant=MappingVariant.QUAN,
    weighting=WeightingVariant.CHARLSON,
)

print(result)

Calculating Elixhauser Comorbidity Index

result = comorbidity(
    df,
    id_col="id",
    code_col="code",
    score=ScoreType.ELIXHAUSER,
    weighting=WeightingVariant.VAN_WALRAVEN,
)

Calculating Hospital Frailty Risk Score

from comorbidipy import hfrs

# HFRS only requires id and code
result = hfrs(df, id_col="id", code_col="code")

Identifying Disabilities

from comorbidipy import disability

result = disability(df, id_col="id", code_col="code")

Command Line Interface

comorbidipy provides a full-featured CLI for processing files directly:

# View help
comorbidipy --help

# Calculate Charlson score
comorbidipy charlson input.csv output.parquet

# With options
comorbidipy charlson input.csv output.csv \
    --id-col patient_id \
    --code-col diagnosis_code \
    --age-col patient_age \
    --mapping quan \
    --weights charlson

# Calculate HFRS
comorbidipy hfrs-cmd input.parquet output.parquet

# Show available options
comorbidipy info

Supported File Formats

Format Read Write Streaming
CSV
Parquet
JSON
NDJSON
Avro

Performance Tips

Processing Large Files

For files that don't fit in memory, use streaming mode:

comorbidipy charlson large_input.parquet output.parquet --streaming

Use Parquet Format

Parquet files are significantly faster to read/write than CSV:

# Read Parquet
df = pl.read_parquet("data.parquet")

# Or use CLI
comorbidipy charlson input.parquet output.parquet

LazyFrame for Deferred Computation

# Use LazyFrame for memory efficiency
lazy_df = pl.scan_parquet("large_file.parquet")
result = comorbidity(lazy_df, id="id", code="code", age=None)

Next Steps