
Generate Automated Data Quality Audit Report
Source:R/generate_audit_report.R
generate_audit_report.RdExecutes the complete anomaly detection pipeline (preprocessing, scoring, flagging) and generates a professional PDF, HTML, or DOCX report with visualizations and prioritized audit listings.
Usage
generate_audit_report(
data,
filename = "dq_audit_report",
output_dir = NULL,
output_format = "pdf",
method = "iforest",
contamination = 0.05,
top_n = 100,
id_cols = NULL,
exclude_cols = NULL,
ground_truth_col = NULL,
...
)Arguments
- data
A data frame containing the data to be audited.
- filename
Character string for the output file (without extension). Default is "dq_audit_report".
- output_dir
Character string specifying the directory for the output file. If NULL (default), uses tempdir(). Users should specify a directory explicitly for production use.
- output_format
Character string indicating the output format. Options: "pdf" (default), "html", or "docx" (for editable Word document). Note: PDF format provides the best color rendering for heat map tables. DOCX format is generated by first creating a PDF, then converting to DOCX.
- method
Character string indicating the anomaly detection method. Passed to
score_anomaly(). Default is "iforest".- contamination
Numeric value between 0 and 1. Passed to
score_anomaly(). Default is 0.05.- top_n
Integer indicating the number of top anomalous records to display in the prioritized audit listing. Default is 100.
- id_cols
Character vector of column names to exclude from scoring. Passed to
prep_for_anomaly().- exclude_cols
Character vector of additional columns to exclude. Passed to
prep_for_anomaly().- ground_truth_col
Character string naming a column with ground truth labels. If provided, benchmarking metrics will be included in the report.
- ...
Additional arguments passed to
score_anomaly().
Examples
# \donttest{
data <- data.frame(
patient_id = 1:50,
age = rnorm(50, 50, 15),
cost = rnorm(50, 10000, 5000),
gender = sample(c("M", "F"), 50, replace = TRUE)
)
# Generate HTML report (fastest, no LaTeX/pandoc required)
generate_audit_report(data, filename = "my_audit", output_format = "html",
output_dir = tempdir())
#> Scoring anomalies...
#> Flagging top anomalies...
#> Error in flagged_data %>% dplyr::arrange(dplyr::desc(.data$anomaly_score)) %>% dplyr::slice_head(n = top_n): could not find function "%>%"
# }