Prepare Data for Anomaly Detection — prep_for

Preprocesses data for unsupervised anomaly detection by handling identifiers, scaling numerical features, and encoding categorical variables.

Usage

prep_for_anomaly(
  data,
  id_cols = NULL,
  exclude_cols = NULL,
  scale_method = "mad"
)

Arguments

data: A data frame containing the data to be preprocessed.
id_cols: Character vector of column names to exclude from scoring (e.g., patient IDs, encounter IDs). If NULL, attempts to auto-detect common ID column patterns.
exclude_cols: Character vector of additional columns to exclude from scoring. Default is NULL.
scale_method: Character string indicating the scaling method for numerical variables. Options: "mad" (Median Absolute Deviation, default), "minmax" (min-max normalization), or "none" (no scaling).

Value

A list containing:

prepared_data

A numeric matrix ready for anomaly detection

metadata

A list with mapping information:

original_data: The original data frame
id_cols: Column names used as identifiers
numeric_cols: Column names of numeric variables
categorical_cols: Column names of categorical variables
excluded_cols: Column names excluded from scoring

Examples

data <- data.frame(
  patient_id = 1:20,
  age = rnorm(20, 50, 15),
  cost = rnorm(20, 10000, 5000),
  gender = sample(c("M", "F"), 20, replace = TRUE)
)
prep_result <- prep_for_anomaly(data, id_cols = "patient_id")