Diagnostics
Runtime observability metrics and diagnostic outputs.
Use diagnostics when you want per-dataset observability artifacts to verify coverage, spot drift, and debug generation behavior.
When to use
- You need per-dataset records in shard
metadata.ndjsonand summary-level metric coverage. - You are validating whether presets or CLI overrides hit expected ranges.
- You want benchmark runs to include richer context for guardrail triage.
Quick start
Enable diagnostics directly:
dagzoo generate \
--config configs/default.yaml \
--num-datasets 50 \
--diagnostics \
--out data/run_diag
Use the discoverable preset:
dagzoo generate \
--config configs/preset_diagnostics_on.yaml \
--num-datasets 25 \
--diagnostics \
--out data/run_diag_preset
Key options
--diagnostics: emit diagnostics artifacts for generated datasets.--out: output directory containing datasets and diagnostic payloads.
Diagnostics also work with benchmark:
dagzoo benchmark \
--suite smoke \
--preset cpu \
--diagnostics \
--out-dir benchmarks/results/smoke_cpu_diag
What to inspect
- Per-dataset
metadata.ndjsonrecords for realized generation parameters. - Coverage summaries for meta-features and enabled observability metrics.
- Benchmark summary guardrail sections that include diagnostics context.
Exact output contracts are documented in output-format.md.
Diagnostics target bands
Diagnostics supports optional diagnostics.meta_feature_targets to annotate
coverage summaries with in-band counts/fractions for selected metrics.
Target bands do not alter generation; they are reporting metadata only.
Related docs
- Workflow hub: usage-guide.md
- System terminology: how-it-works.md