
Calculate Feature Importance for Anomalies
Source:R/calculate_feature_importance.R
calculate_feature_importance.RdCalculates which feature contributes most to each record's anomaly score. This provides a "reason code" explaining why each record was flagged as anomalous.
Arguments
- flagged_data
A data frame with anomaly scores and is_anomaly flags, typically the output of
flag_top_anomalies().- metadata
Metadata from
prep_for_anomaly(), containing information about numeric and categorical columns.- top_k
Integer indicating how many top contributing features to consider. Default is 1 (returns only the most important feature).
- max_cols
Integer indicating maximum number of columns to consider for feature importance. If NULL, uses all columns. Default is 10 for performance.
Value
The input data frame with additional columns:
- reason_feature
Name of the feature contributing most to the anomaly
- reason_value
The value of that feature for this record
- reason_code
A brief description combining feature name and value
- reason_deviation
The standardized deviation from the median (for numeric) or frequency (for categorical)
Examples
# \donttest{
data <- data.frame(
patient_id = 1:50,
age = rnorm(50, 50, 15),
cost = rnorm(50, 10000, 5000)
)
scored_data <- score_anomaly(data, id_cols = "patient_id")
flagged_data <- flag_top_anomalies(scored_data)
metadata <- attr(scored_data, "metadata")
flagged_data <- calculate_feature_importance(flagged_data, metadata)
# }