Benchmark Workflows and Guardrails
Use benchmark workflows to validate throughput/latency and enforce regression guardrails across default and feature-specific configs.
When to use
- You need fast smoke checks before wider experimentation.
- You want standardized performance baselines by preset/suite.
- You need CI gating with warn/fail regression thresholds.
Baseline workflows
Quick smoke and broader standard runs:
dagzoo benchmark --suite smoke --preset cpu --out-dir benchmarks/results/smoke_cpu
dagzoo benchmark --suite standard --preset cpu --out-dir benchmarks/results/standard_cpu
Diagnostics-enabled benchmark:
dagzoo benchmark \
--suite smoke \
--preset cpu \
--diagnostics \
--out-dir benchmarks/results/smoke_cpu_diag
Device override note:
--deviceonly applies when a benchmark run selects exactly one--preset.- Multi-preset runs must encode device choice in each preset/config; the CLI now
rejects ambiguous shared
--deviceoverrides.
Feature-specific guardrail runs
dagzoo benchmark \
--config configs/preset_filter_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--hardware-policy none \
--no-memory \
--out-dir benchmarks/results/smoke_filter
dagzoo benchmark \
--config configs/preset_missingness_mar.yaml \
--preset custom \
--suite smoke \
--no-memory \
--out-dir benchmarks/results/smoke_missing_mar
dagzoo benchmark \
--config configs/preset_shift_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--no-memory \
--out-dir benchmarks/results/smoke_shift_guardrails
dagzoo benchmark \
--config configs/preset_noise_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--no-memory \
--out-dir benchmarks/results/smoke_noise_guardrails
Filter-enabled benchmark workflow
Use the filter smoke preset when you want one canonical CPU benchmark run that surfaces filter-stage throughput, accepted-corpus throughput, and acceptance yield together:
dagzoo benchmark \
--config configs/preset_filter_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--hardware-policy none \
--no-memory \
--out-dir benchmarks/results/smoke_filter
Inspect these summary.json preset-result fields first:
filter_datasets_per_minutefilter_accepted_datasets_per_minutefilter_accepted_datasets_measuredfilter_rejected_datasets_measuredfilter_acceptance_rate_dataset_levelfilter_rejection_rate_dataset_levelfilter_rejection_rate_attempt_levelfilter_retry_dataset_rate
The CLI preset line prints the same headline values as filter/min,
filter_accepted/min, filter_accept_dataset_pct, and
filter_reject_dataset_pct.
Diversity audit workflow
Use diversity-audit when you need a baseline-vs-variant comparison of the
accepted corpus, not just benchmark throughput:
dagzoo diversity-audit \
--baseline-config configs/default.yaml \
--variant-config configs/preset_shift_benchmark_smoke.yaml \
--suite smoke \
--num-datasets 10 \
--warmup 0 \
--device cpu \
--out-dir benchmarks/results/diversity_audit_shift
Inspect these summary.json fields first:
comparisons[*].diversity_statuscomparisons[*].diversity_composite_shift_pctcomparisons[*].datasets_per_minute_delta_pctcomparisons[*].filter_accepted_datasets_per_minute_delta_pct
The rewritten audit persists summary.json and summary.md as the canonical
equivalence/local-overlap and cross-run diversity outputs.
Filter calibration workflow
Use filter-calibration when you want to sweep filter thresholds on one
filter-enabled config and rank accepted-corpus throughput against diversity
shift:
dagzoo filter-calibration \
--config configs/preset_filter_benchmark_smoke.yaml \
--suite smoke \
--device cpu \
--out-dir benchmarks/results/filter_calibration_smoke
Inspect these summary.json fields first:
summary.best_overall_threshold_requestedsummary.best_passing_threshold_requestedsummary.best_overall_diversity_statuscandidates[*].filter_accepted_datasets_per_minutecandidates[*].filter_acceptance_rate_dataset_levelcandidates[*].diversity_statuscandidates[*].diversity_composite_shift_pct
Like the rewritten diversity audit, filter calibration persists only
summary.json and summary.md.
Regression gating
For CI-like checks:
dagzoo benchmark \
--config configs/preset_shift_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--warn-threshold-pct 10 \
--fail-threshold-pct 20 \
--fail-on-regression \
--hardware-policy none \
--no-memory \
--out-dir benchmarks/results/ci_smoke_shift_local
What to inspect
When present in a run summary, inspect:
missingness_guardrailslineage_guardrailsshift_guardrailsnoise_guardrails
Also review throughput/latency aggregates for preset/suite trends.
Related docs
- Workflow hub: usage-guide.md
- Output contract: output-format.md
- Noise workflows: noise.md