Skip to contents

Executes the complete anomaly detection pipeline (preprocessing, scoring, flagging) and generates a professional PDF, HTML, or DOCX report with visualizations and prioritized audit listings.

Usage

generate_audit_report(
  data,
  filename = "dq_audit_report",
  output_dir = NULL,
  output_format = "pdf",
  method = "iforest",
  contamination = 0.05,
  top_n = 100,
  id_cols = NULL,
  exclude_cols = NULL,
  ground_truth_col = NULL,
  ...
)

Arguments

data

A data frame containing the data to be audited.

filename

Character string for the output file (without extension). Default is "dq_audit_report".

output_dir

Character string specifying the directory for the output file. If NULL (default), uses tempdir(). Users should specify a directory explicitly for production use.

output_format

Character string indicating the output format. Options: "pdf" (default), "html", or "docx" (for editable Word document). Note: PDF format provides the best color rendering for heat map tables. DOCX format is generated by first creating a PDF, then converting to DOCX.

method

Character string indicating the anomaly detection method. Passed to score_anomaly(). Default is "iforest".

contamination

Numeric value between 0 and 1. Passed to score_anomaly(). Default is 0.05.

top_n

Integer indicating the number of top anomalous records to display in the prioritized audit listing. Default is 100.

id_cols

Character vector of column names to exclude from scoring. Passed to prep_for_anomaly().

exclude_cols

Character vector of additional columns to exclude. Passed to prep_for_anomaly().

ground_truth_col

Character string naming a column with ground truth labels. If provided, benchmarking metrics will be included in the report.

...

Additional arguments passed to score_anomaly().

Value

Invisibly returns the path to the generated report file.

Examples

# \donttest{
data <- data.frame(
  patient_id = 1:50,
  age = rnorm(50, 50, 15),
  cost = rnorm(50, 10000, 5000),
  gender = sample(c("M", "F"), 50, replace = TRUE)
)
# Generate HTML report (fastest, no LaTeX/pandoc required)
generate_audit_report(data, filename = "my_audit", output_format = "html",
                       output_dir = tempdir())
#> Scoring anomalies...
#> Flagging top anomalies...
#> Error in flagged_data %>% dplyr::arrange(dplyr::desc(.data$anomaly_score)) %>%     dplyr::slice_head(n = top_n): could not find function "%>%"
# }