Reference Packs

Named dagzoo recipe packs, confidence tiers, and citation guidance.

Reference packs are the public, named generation configs for dagzoo.

They let you start from a named recipe instead of authoring a full YAML config. A paper, benchmark, or downstream workflow can point to one of these packs directly, and a new user can run one immediately with:

dagzoo generate --config recipe:<name> --num-datasets 25 --out data/<run_name>

The same YAML files are checked into the repo under recipes/ so you can inspect, pin, and cite the exact config behind a public recipe name.


Stability model

  • Stable adoption layer: recipe:<name> references and documented artifact contracts
  • Advanced authoring layer: repo-local configs/*.yaml
  • Confidence tiers:
    • baseline: maintained default starting point
    • paper-backed approximation: intended to approximate a published prior without overclaiming exact equivalence
    • stress profile: reproducible stress regime rather than a paper-prior claim

Catalog

RecipeConfidenceExpected regimeRepo YAML
default-baselinebaselineBalanced mixed-type classification with no extra stress regimerecipes/default-baseline.yaml
tabpfn-v1-prior-approxpaper-backed approximationSmall-data, numeric-heavy classificationrecipes/tabpfn-v1-prior-approx.yaml
high-cardinality-stressstress profileCategorical-heavy tasks with larger cardinality envelopesrecipes/high-cardinality-stress.yaml
missingness-robustnessstress profileModerate-to-heavy structured missingness with explicit MNAR controlsrecipes/missingness-robustness.yaml
shift-stressstress profileMixed graph-and-noise drift for controlled shift experimentsrecipes/shift-stress.yaml

Recipe notes and citations

default-baseline

  • Purpose: general-purpose starting point for mixed-type classification studies
  • Prior note: latent DAG with emitted features assigned to nodes and the target emitted from one selected latent node; optional missingness is a later observation model over the emitted features
  • Citation note: cite dagzoo itself plus the specific recipe name

tabpfn-v1-prior-approx

  • Purpose: practical approximation for TabPFN-style small-data classification workflows
  • Prior note: numeric-heavy latent-node prior with the same selected-node target derivation as the rest of the shipped catalog
  • Citations:
    • Accurate predictions on small data with a tabular foundation model
    • TabICLv2: A better, faster, scalable, and open tabular foundation model

high-cardinality-stress

  • Purpose: stress categorical-heavy workloads that exceed the lighter default envelope
  • Citation:
    • Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks

missingness-robustness

  • Purpose: force structured missingness into the training prior without hand-authoring the config
  • Citations:
    • A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities
    • TabICLv2: A better, faster, scalable, and open tabular foundation model

shift-stress

  • Purpose: reproducible mixed drift for train/test shift stress testing
  • Citation:
    • Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data

Example usage

dagzoo recipe list
dagzoo generate --config recipe:default-baseline --num-datasets 25 --out data/default_baseline
dagzoo generate --config recipe:high-cardinality-stress --num-datasets 25 --out data/high_cardinality

Inside a repo checkout, you can also reference the same configs by path:

dagzoo generate --config recipes/default-baseline.yaml --num-datasets 25 --out data/default_baseline