Reads two historical snapshot records (by ID) from the DuckDB database and computes table-level, schema, and per-column statistical drift. Optionally renders an HTML drift report and/or a plain-text report.
Usage
compare_snapshots(
dataset_name,
snapshot_id_prev = NULL,
snapshot_id_curr = NULL,
db_path = NULL,
config_dir = ".",
report = TRUE,
text_report = FALSE,
open_report = interactive()
)Arguments
- dataset_name
Character. Dataset name to compare.
- snapshot_id_prev
Integer or
NULL. ID of the earlier snapshot. IfNULL, defaults to the second-most-recent snapshot by ID.- snapshot_id_curr
Integer or
NULL. ID of the later snapshot. IfNULL, defaults to the most-recent snapshot by ID.- db_path
Character or
NULL. Path to the DuckDB snapshot database. IfNULL(the default), the path is read fromsnapshot_dbindqcheckr.yml.- config_dir
Character. Path to the directory containing
dqcheckr.yml. Used to read thresholds,report_output_dir, and (whendb_pathisNULL)snapshot_db.- report
Logical. Whether to render an HTML drift report.
- text_report
Logical. Whether to also write a plain-text report alongside the HTML report. Defaults to
FALSE.- open_report
Logical. Whether to open the HTML report in the browser after rendering (only takes effect in interactive sessions).
Value
Invisibly, a named list with elements dataset_name,
snap_prev, snap_curr, table_drift,
schema_changes, missing_rate_changes,
non_numeric_changes, mean_shifts, distinct_changes.
Examples
# \donttest{
tmp <- tempdir()
db_path <- file.path(tmp, "snap.duckdb")
cfg_yml <- file.path(tmp, "dqcheckr.yml")
ds_yml <- file.path(tmp, "starwars_csv.yml")
dat <- system.file("demonstrations/data/starwars.csv", package = "dqcheckr")
writeLines(c(
paste0('snapshot_db: "', db_path, '"'),
paste0('report_output_dir: "', tmp, '"'),
'default_rules:',
' max_missing_rate: 0.60',
' min_row_count: 80'
), cfg_yml)
writeLines(c(
'dataset_name: "starwars_csv"',
paste0('current_file: "', dat, '"'),
'format: csv',
'encoding: "UTF-8"',
'delimiter: ","'
), ds_yml)
run_dq_check("starwars_csv", config_dir = tmp, open_report = FALSE)
#> [dqcheckr] starwars_csv: FAIL - 0 warning(s), 2 failure(s). Report: /private/tmp/claude-501/RtmpTXbJKh/starwars_csv_20260601_201401.html
run_dq_check("starwars_csv", config_dir = tmp, open_report = FALSE)
#> [dqcheckr] starwars_csv: FAIL - 0 warning(s), 2 failure(s). Report: /private/tmp/claude-501/RtmpTXbJKh/starwars_csv_20260601_201403.html
drift <- compare_snapshots("starwars_csv", config_dir = tmp, report = FALSE)
#> [dqcheckr] drift: starwars_csv snapshot #1 vs #2
names(drift)
#> [1] "dataset_name" "snap_prev" "snap_curr"
#> [4] "table_drift" "schema_changes" "missing_rate_changes"
#> [7] "non_numeric_changes" "mean_shifts" "distinct_changes"
# }