Skip to contents

Calculates which feature contributes most to each record's anomaly score. This provides a "reason code" explaining why each record was flagged as anomalous.

Usage

calculate_feature_importance(flagged_data, metadata, top_k = 1, max_cols = 10)

Arguments

flagged_data

A data frame with anomaly scores and is_anomaly flags, typically the output of flag_top_anomalies().

metadata

Metadata from prep_for_anomaly(), containing information about numeric and categorical columns.

top_k

Integer indicating how many top contributing features to consider. Default is 1 (returns only the most important feature).

max_cols

Integer indicating maximum number of columns to consider for feature importance. If NULL, uses all columns. Default is 10 for performance.

Value

The input data frame with additional columns:

reason_feature

Name of the feature contributing most to the anomaly

reason_value

The value of that feature for this record

reason_code

A brief description combining feature name and value

reason_deviation

The standardized deviation from the median (for numeric) or frequency (for categorical)

Examples

# \donttest{
data <- data.frame(
  patient_id = 1:50,
  age = rnorm(50, 50, 15),
  cost = rnorm(50, 10000, 5000)
)
scored_data <- score_anomaly(data, id_cols = "patient_id")
flagged_data <- flag_top_anomalies(scored_data)
metadata <- attr(scored_data, "metadata")
flagged_data <- calculate_feature_importance(flagged_data, metadata)
# }