Robustness Stress Profiles
stress.profile selects named harder-generation regimes that resolve onto the
normal generator config. Use it when you want a reproducible stress envelope
without hand-authoring a large custom YAML for each run.
This surface is different from the curated recipe catalog:
recipe:<name>remains the stable public adoption layer for named reference packs underrecipes/.stress.profile=<name>is the advanced YAML control for selecting one named stress regime inside a repo-local config.- Recipe entries labeled
stress profileare ready-made examples of the same kind of harder-generation workflow.
Use robustness stress profiles when you want reproducible harder-task or anti-memorization slices while keeping the current missingness, shift, noise, and diagnostics surfaces intact.
When to use
Why it matters for your prior
- You want one named harder regime that downstream model comparisons can hold fixed across runs.
- You need stronger relationship-structure or mechanism-composition pressure than the default baseline without opening a parallel config branch.
- You want diagnostics and diversity-audit evidence that the regime differs from baseline in intended directions.
Operational triggers
- You want a reproducible anti-memorization classification slice.
- You need one graph-breadth-heavy slice for relationship-structure audits.
- You want a compositional mechanism slice that pushes family mixing harder than the default baseline.
- You want a categorical/cardinality-heavy lane without hand-authoring the converter and cardinality envelope.
- You want one hybrid slice that combines graph/source-shape reuse with compositional matrix/kernel reuse.
- You want one robustness-composition slice that explicitly couples missingness, shift, and noise with the harder mechanism mix.
Shipped core profiles
anti_memorization_piecewise_classification_slice_v1
- Intended regime: default classification envelope with the
anti_memorization_piecewise_v1steering preset turned on. - Main lever composition:
- default classification envelope
- steering-driven missingness/graph/noise progression
- no extra graph-breadth or compositional mechanism bias
Generate smoke run:
dagzoo generate \
--config configs/preset_stress_classification_slice_generate_smoke.yaml \
--num-datasets 25 \
--out data/run_stress_classification_slice_smoke
Benchmark smoke run:
dagzoo benchmark \
--config configs/preset_stress_classification_slice_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--diagnostics \
--no-memory \
--out-dir benchmarks/results/smoke_stress_classification_slice
Inspect first:
coverage_summary.jsonmetrics.graph_depth_ratiometrics.graph_target_depth_ratiometrics.graph_reachability_ratiometrics.graph_target_ancestor_fraction
anti_memorization_piecewise_classification_graph_breadth_slice_v1
- Intended regime: broader graph/topology slice that increases node count and target-ancestor breadth pressure while retaining the anti-memorization steering path.
- Main lever composition:
- larger node envelope
- stricter target relevance/indegree floor
- wider emitted feature envelope
Generate smoke run:
dagzoo generate \
--config configs/preset_stress_graph_breadth_generate_smoke.yaml \
--num-datasets 25 \
--out data/run_stress_graph_breadth_smoke
Benchmark smoke run:
dagzoo benchmark \
--config configs/preset_stress_graph_breadth_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--diagnostics \
--no-memory \
--out-dir benchmarks/results/smoke_stress_graph_breadth
Inspect first:
coverage_summary.jsonmetrics.graph_indegree_stdmetrics.graph_outdegree_stdmetrics.graph_target_depth_ratiometrics.graph_ancestor_overlap_meanmetrics.graph_target_ancestor_fraction
anti_memorization_piecewise_classification_compositional_slice_v1
- Intended regime: compositional mechanism slice that biases toward
piecewise,product,gp, andtreeuptake while retaining the same anti-memorization steering path. - Main lever composition:
- softened but still non-default mechanism family mix centered on
piecewise,product,gp, andtree - broader feature and categorical-cardinality envelope than the baseline
- raised graph floor plus a light target relevance floor instead of the stricter structural gating used by the graph-breadth slice
- tuned grouped batch target of
8_000_000cells for better CPU throughput on this heavier compositional regime
- softened but still non-default mechanism family mix centered on
Generate smoke run:
dagzoo generate \
--config configs/preset_stress_compositional_generate_smoke.yaml \
--num-datasets 25 \
--out data/run_stress_compositional_smoke
Benchmark smoke run:
dagzoo benchmark \
--config configs/preset_stress_compositional_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--diagnostics \
--no-memory \
--out-dir benchmarks/results/smoke_stress_compositional
Inspect first:
coverage_summary.jsonmechanism_family_summarymetrics.mechanism_family_cooccurrence_ratiometrics.graph_ancestor_overlap_mean
Current internal evaluation profiles
These profiles are available through repo-local configs and stress.profile,
but they are still evaluation lanes rather than promoted public recipes. Keep
them audit-gated and compare them against baseline and the current
compositional slice before treating them as a broader adoption candidate.
anti_memorization_piecewise_classification_categorical_cardinality_slice_v1
- Intended regime: categorical-heavy harder slice with broader correlated categorical-cardinality regimes than the default baseline.
- Main lever composition:
- raised categorical ratio floor and wider class envelope
- larger categorical cardinality ceiling tied to the existing high-cardinality workflow
- anti-memorization steering retained as the base harder slice control
Generate smoke run:
dagzoo generate \
--config configs/preset_stress_categorical_cardinality_generate_smoke.yaml \
--num-datasets 25 \
--out data/run_stress_categorical_cardinality_smoke
Benchmark smoke run:
dagzoo benchmark \
--config configs/preset_stress_categorical_cardinality_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--diagnostics \
--no-memory \
--out-dir benchmarks/results/smoke_stress_categorical_cardinality
Inspect first:
coverage_summary.jsonparity_surface_summary.categorical_cardinalityparity_surface_summary.converter_method_countsmetrics.cat_cardinality_mean
anti_memorization_piecewise_classification_hybrid_slice_v1
- Intended regime: hybrid structural+compositional slice that combines the graph/source-shape reuse pressure from the graph-breadth lane with the matrix/kernel/root reuse from the compositional lane.
- Main lever composition:
- broader node and feature envelope with graph gating enabled
- parent-arity/source-shape policy reuse
- correlated matrix/kernel/root-base-kind reuse
Generate smoke run:
dagzoo generate \
--config configs/preset_stress_hybrid_generate_smoke.yaml \
--num-datasets 25 \
--out data/run_stress_hybrid_smoke
Benchmark smoke run:
dagzoo benchmark \
--config configs/preset_stress_hybrid_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--diagnostics \
--no-memory \
--out-dir benchmarks/results/smoke_stress_hybrid
Inspect first:
coverage_summary.jsonparity_surface_summary.matrix_kind_countsparity_surface_summary.root_base_kind_countsparity_surface_summary.source_shape_policy_counts
anti_memorization_piecewise_classification_robustness_composition_slice_v1
- Intended regime: explicit robustness composition that couples missingness, shift, and mixed noise with the anti-memorization harder slice instead of leaving those levers as separate recipes.
- Main lever composition:
- structured MNAR missingness
- mixed graph/variance drift
- explicit Gaussian/Laplace/Student-t residual mixture
Generate smoke run:
dagzoo generate \
--config configs/preset_stress_robustness_composition_generate_smoke.yaml \
--num-datasets 25 \
--out data/run_stress_robustness_composition_smoke
Benchmark smoke run:
dagzoo benchmark \
--config configs/preset_stress_robustness_composition_benchmark_smoke.yaml \
--preset custom \
--suite smoke \
--diagnostics \
--no-memory \
--out-dir benchmarks/results/smoke_stress_robustness_composition
Inspect first:
coverage_summary.jsonmissingnessshiftnoise_distributionparity_surface_summary.gp_variant_counts
Baseline comparison workflow
To compare a stress profile against the baseline, use dagzoo diversity-audit
with configs/default.yaml as the baseline and one stress benchmark preset as
the variant:
dagzoo diversity-audit \
--baseline-config configs/default.yaml \
--variant-config configs/preset_stress_graph_breadth_benchmark_smoke.yaml \
--suite smoke \
--num-datasets 10 \
--warmup 0 \
--device cpu \
--out-dir benchmarks/results/diversity_audit_stress_graph_breadth
Swap --variant-config to any of the other stress benchmark presets for the
other profiles. Inspect summary.json and summary.md first, then inspect
coverage artifacts from a diagnostics-enabled benchmark run when you need the
relationship-structure or mechanism-family metrics behind the shift.
To turn one diversity-audit run into a parity-focused maintainer report:
./.venv/bin/python scripts/render_tabiclv2_parity_report.py \
--summary-json benchmarks/results/diversity_audit_stress_graph_breadth/summary.json \
--out-dir benchmarks/results/diversity_audit_stress_graph_breadth/parity_report
To run the full internal RD-005 promotion suite across the incumbent, structural control, and current challenger lanes:
./.venv/bin/python scripts/evaluate_rd005_follow_on_suite.py \
--baseline-config configs/default.yaml \
--out-root benchmarks/results/rd005_follow_on \
--suite smoke \
--num-datasets 8 \
--seed 123 \
--device cpu
Read follow_on_promotion_summary.json / follow_on_promotion_summary.md
first when you want the explicit promotion status for each internal lane.
Diagnostics and guardrails
- Generate smoke presets enable diagnostics so they write
coverage_summary.jsonandcoverage_summary.md. - Benchmark smoke presets keep diagnostics off by default. Pass
--diagnosticswhen you want coverage artifacts in addition to the benchmark summary. dagzoo diversity-auditsummary.jsonnow carriesparity_surface_summaryalongside the mechanism-family summary so the remaining TabICLv2 parity gaps can be measured directly instead of inferred indirectly.- The full RD-005 suite runner keeps the public/internal boundary explicit:
internal lanes can receive
promote,hold_internal, orstructural_control_only, but public recipe promotion remains a separate follow-up step after a winner clears the gate. - Benchmark summaries stay on the current contract. Steering and
stress-profile evidence lives in diagnostics artifacts and in
dagzoo diversity-audit, not in a new benchmark-only field family.
Related docs
- Workflow hub: usage-guide.md
- Benchmark guardrails: benchmark-guardrails.md
- Diagnostics: diagnostics.md
- Steering: steering.md