Export Contract Fields

Exhaustive field-by-field export contract catalog generated from the checked-in inventory.
  • Inventory schema: dagzoo-export-contract-inventory-v1
  • Machine-readable source of truth: reference/export_contract_inventory.yaml
  • Path patterns use * for dynamic map keys and [] for list item shapes.
  • audit_status is a field-review classification only; it does not change the live export surface.

dataset_catalog.parquet record payload

Canonical per-dataset JSON payload stored in each generated shard catalog row.

PathTypePresenceStabilityProducerAuditKnown Consumer / Rationale
dataset_idstringalwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepstable canonical dataset id
dataset_indexintalwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeeprecord index within the generated run
feature_typeslist[string]alwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepemitted feature typing
feature_types[]stringwhen feature count > 0stabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepper-column emitted feature type
group_idsobjectwhen run grouping ids are availablestabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepstable downstream grouping keys
group_ids.cohortstringwhen heterogeneous cohort grouping ids are availablestabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepshared heterogeneous raw-generation cohort key
group_ids.layout_planstringwhen fixed-layout grouping ids are availablestabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepshared layout-plan grouping key
group_ids.request_runstringwhen group_ids presentstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeeprequest-run grouping key
interventionobjectwhen hard-intervention metadata is availablestabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepsummary-only intervention contract for downstream consumers
intervention.modestringwhen intervention presentstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepemitted intervention regime
intervention.signaturestringwhen intervention presentstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepstable intervention identity summary
n_classes`intnull`alwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeep
n_featuresintalwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeeppersisted feature count
n_testintalwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeeppersisted test row count
n_trainintalwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeeppersisted train row count
target_derivationstringwhen target derivation metadata is availablestabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepcurrent target-derivation contract marker
target_relevanceobjectwhen target relevance audit metadata is availablestabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepobserved feature relevance summary for the latent target
target_relevance.feature_countintwhen target_relevance presentstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepcount of emitted features with a path into the target node
target_relevance.feature_fractionfloatwhen target_relevance presentstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepfeature_count normalized by emitted feature count
taskstringalwaysstabledagzoo.io.shard_contract.build_dataset_catalog_recordkeepdataset task label

Packed parquet split row

Row schema shared by shard train.parquet and test.parquet.

PathTypePresenceStabilityProducerAuditKnown Consumer / Rationale
dataset_indexint64alwaysstabledagzoo.io.parquet_writer._build_split_tablekeeprow-to-dataset join key
row_indexint64alwaysstabledagzoo.io.parquet_writer._build_split_tablekeeprow position within one split
x`list[float32float64]`alwaysstabledagzoo.io.parquet_writer._build_split_tablekeep
x[]`float32float64`when feature count > 0stabledagzoo.io.parquet_writer._build_split_tablekeep
y`int64float`alwaysstabledagzoo.io.parquet_writer._build_split_tablekeep

Generate handoff manifest

Minimal downstream handoff manifest written by dagzoo generate –handoff-root.

PathTypePresenceStabilityProducerAuditKnown Consumer / Rationale
artifacts_relativeobjectalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepmanifest-relative artifact paths
artifacts_relative.curated_dirstringwhen curated accepted-only shards existstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeeprelative curated-dir path
artifacts_relative.generated_dirstringalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeeprelative generated-dir path
identityobjectalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepstable generated corpus identity payload
identity.generate_run_idstringalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepstable generate-run id
identity.generated_corpus_idstringalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepstable generated corpus id
identity.source_familystringalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeephandoff source family tag
provenanceobjectwhen generated catalogs expose corpus-level provenancestabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepsummarized latent-target provenance for downstream corpus consumers
provenance.interventionobjectwhen hard-intervention provenance is presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepsummary-only intervention provenance for downstream corpora
provenance.intervention.modestringwhen provenance.intervention presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepgenerated corpus intervention regime
provenance.intervention.signaturestringwhen provenance.intervention presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepstable intervention identity summary for the generated corpus
provenance.target_derivationstringwhen provenance presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepcurrent target-derivation contract marker
provenance.target_relevant_feature_count_rangeobjectwhen provenance presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepmin/max relevant feature counts across the generated corpus
provenance.target_relevant_feature_count_range.maxintwhen provenance.target_relevant_feature_count_range presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepupper bound of observed relevant feature counts
provenance.target_relevant_feature_count_range.minintwhen provenance.target_relevant_feature_count_range presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeeplower bound of observed relevant feature counts
provenance.target_relevant_feature_fraction_rangeobjectwhen provenance presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepmin/max relevant feature fractions across the generated corpus
provenance.target_relevant_feature_fraction_range.maxfloatwhen provenance.target_relevant_feature_fraction_range presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepupper bound of observed relevant feature fractions
provenance.target_relevant_feature_fraction_range.minfloatwhen provenance.target_relevant_feature_fraction_range presentstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeeplower bound of observed relevant feature fractions
schema_namestringalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeephandoff manifest schema identifier
schema_versionintalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeephandoff manifest schema version
summaryobjectalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepgenerated dataset count summary
summary.generated_datasetsintalwaysstabledagzoo.core.generate_handoff.write_generate_handoff_manifestkeepnumber of generated datasets

Coverage summary JSON

Stable top-level diagnostics coverage summary fields.

PathTypePresenceStabilityProducerAuditKnown Consumer / Rationale
generated_atstringalwaysstabledagzoo.diagnostics.coverage.write_coverage_summary_jsonkeepdiagnostics summary timestamp
histogram_binsintalwaysstabledagzoo.diagnostics.coverage.write_coverage_summary_jsonkeephistogram bin count used for diagnostics
num_datasetsintalwaysstabledagzoo.diagnostics.coverage.write_coverage_summary_jsonkeepnumber of aggregated datasets
quantileslist[float]alwaysstabledagzoo.diagnostics.coverage.write_coverage_summary_jsonkeeprequested quantile levels
quantiles[]floatwhen quantiles presentstabledagzoo.diagnostics.coverage.write_coverage_summary_jsonkeepone requested quantile level
task_countsobjectalwaysstabledagzoo.diagnostics.coverage.write_coverage_summary_jsonkeepaggregated task counts
task_counts.*intwhen task_counts presentstabledagzoo.diagnostics.coverage.write_coverage_summary_jsonkeepone aggregated task count