Calculate Feature Importance for Anomalies — calculate_feature

Calculates which feature contributes most to each record's anomaly score. This provides a "reason code" explaining why each record was flagged as anomalous.

Usage

calculate_feature_importance(flagged_data, metadata, top_k = 1, max_cols = 10)

Arguments

flagged_data: A data frame with anomaly scores and is_anomaly flags, typically the output of flag_top_anomalies().
metadata: Metadata from prep_for_anomaly(), containing information about numeric and categorical columns.
top_k: Integer indicating how many top contributing features to consider. Default is 1 (returns only the most important feature).
max_cols: Integer indicating maximum number of columns to consider for feature importance. If NULL, uses all columns. Default is 10 for performance.

Value

The input data frame with additional columns:

reason_feature: Name of the feature contributing most to the anomaly
reason_value: The value of that feature for this record
reason_code: A brief description combining feature name and value
reason_deviation: The standardized deviation from the median (for numeric) or frequency (for categorical)

Examples

# \donttest{
data <- data.frame(
  patient_id = 1:50,
  age = rnorm(50, 50, 15),
  cost = rnorm(50, 10000, 5000)
)
scored_data <- score_anomaly(data, id_cols = "patient_id")
flagged_data <- flag_top_anomalies(scored_data)
metadata <- attr(scored_data, "metadata")
flagged_data <- calculate_feature_importance(flagged_data, metadata)
# }