Usage
score_anomaly(
data,
method = "iforest",
contamination = 0.05,
ground_truth_col = NULL,
id_cols = NULL,
exclude_cols = NULL,
...
)Arguments
- data
A data frame containing the data to be scored.
- method
Character string indicating the anomaly detection method. Options: "iforest" (Isolation Forest, default) or "lof" (Local Outlier Factor).
- contamination
Numeric value between 0 and 1 indicating the expected proportion of anomalies in the data. Default is 0.05 (5
ground_truth_colCharacter string naming a column in
datathat contains binary ground truth labels (0/1 or FALSE/TRUE) for known anomalies. If provided, benchmarking metrics will be calculated. Default is NULL.id_colsCharacter vector of column names to exclude from scoring. Passed to
prep_for_anomaly().exclude_colsCharacter vector of additional columns to exclude. Passed to
prep_for_anomaly()....Additional arguments passed to the underlying algorithm. For Isolation Forest:
ntrees,sample_size,max_depth. For LOF:minPts(number of neighbors; deprecatedkis converted tominPts).
A data frame with the original data plus an anomaly_score column.
If ground_truth_col is provided, the result includes an attribute
benchmark_metrics containing: auc_roc (Area Under the ROC Curve),
auc_pr (Area Under the Precision-Recall Curve), top_k_recall
(List of recall values for top K records: K = 10, 50, 100, 500), and
contamination_rate (Actual proportion flagged as anomalous).
Calculates anomaly scores for each record using Isolation Forest or
Local Outlier Factor algorithms. Optionally evaluates performance against
ground truth labels for benchmarking.
