Inference Contract
Use this contract when you need to know what this repo hands to downstream runtime systems and what must remain stable.
Use these alongside this contract:
- quickstart:
README.md - workflow runbooks:
docs/workflows.md - design decisions and repo structure:
docs/development/design-decisions.md - architecture reference:
docs/development/model-architecture.md - model config reference:
docs/development/model-config.md - canonical roadmap:
docs/development/roadmap.md
Planning and architecture rationale live under docs/development/, while this
page stays focused on the exported bundle and the validation boundary that
downstream consumers rely on.
Overview
The handoff boundary is simple:
- this repo trains models and packages export bundles
- a separate runtime repo or system can then load those bundles
If you want to know what the repo exports, what files are inside the bundle, and what this repo does or does not own at runtime, this is the canonical page.
Schema Version
Current version: tab-foundry-export-v3
The current v3 contract uses a single-manifest layout. Earlier v3 bundles that
used inference_config.json and preprocessor_state.json sidecars are
obsolete and must be regenerated. Earlier single-manifest v3 bundles that do
not include manifest_sha256 are also obsolete and must be regenerated.
Bundle Layout
An export bundle is a directory containing:
manifest.jsonweights.safetensors
manifest.json
Required keys:
schema_versionproducer:{name, version, git_sha}(git_shamay benull)task:classificationmanifest_sha256: SHA-256 of the canonical UTF-8 JSON encoding of the full manifest withmanifest_sha256omitted; validators reject stale or missing valuesmodel:{arch, d_col, d_icl, input_normalization, feature_group_size, many_class_train_mode, max_mixed_radix_digits}stageis an additive optional field and is emitted only forarch=tabfoundry_staged.stage_label,module_overrides,staged_dropout, andpre_encoder_clipare additive optional staged-surface fields. Newly regenerated staged bundles persist them when present so export/load replay matches the training checkpoint surface exactly.- Exporter also emits architecture reconstruction fields:
{tfcol_n_heads, tfcol_n_layers, tfcol_n_inducing, tfrow_n_heads, tfrow_n_layers, tfrow_cls_tokens, tficl_n_heads, tficl_n_layers, tficl_ff_expansion, many_class_base, head_hidden_dim, use_digit_position_embed, sandwich_latents, sandwich_layers, sandwich_heads, sandwich_ff_expansion, sandwich_summary_tokens_per_axis, sandwich_self_attention_per_cross, sandwich_pre_row_attention_layers, sandwich_pre_column_attention_layers, sandwich_pre_column_inducing_tokens}. - Validators accept manifests that omit the optional reconstruction fields and apply the current model defaults.
- Validators also accept older manifests that omit
stageand the additive staged-surface fields, applying current defaults for omitted values. - See
docs/development/model-config.mdfor the meaning of each model field and the current canonical defaults.
inferencetaskmodel_arch(tabfoundry_simple,tabfoundry_staged, ortabfoundry_sandwich)model_stage(optional; emitted only fortabfoundry_staged)group_shifts([0, 1, 3])feature_group_sizemany_class_threshold(10)many_class_inference_mode(full_probs)
preprocessorfeature_order_policy(positional_feature_ids)missing_value_policystrategy(train_mean)all_nan_fill(resolved runtime fill value, default0.0)impute_missing(resolved runtime imputation toggle, defaulttrue)
classification_label_policy{mapping=train_only_remap, unseen_test_label=filter}
dtype_policy(features=float32,classification_labels=int64,regression_targets=float32)
weights:{file, sha256}created_at_utc: ISO8601 UTC timestamp
Preprocessing Semantics
- Bundle metadata now records preprocessing policy only. v3 bundles do not persist dataset-specific fitted preprocessing values.
- Runtime preprocessing is always derived from the incoming support set
(
x_train,y_train) before applying the model. - The bundled
preprocessorsection is the executable runtime policy surface for the reference consumer, not a cache of export-time train statistics. - Task-level sandwich
feature_typesare runtime request metadata, not bundled preprocessor metadata.run_reference_consumer(..., feature_types=[...])requires an explicit per-request list for sandwich bundles. - Bundles with
preprocessor.missing_value_policy.impute_missing=falseremain policy-valid, but the reference consumer only executes them when runtime feature inputs are already finite; otherwise it raises instead of returning non-finite predictions. - Older v3 bundles that omit
preprocessor.missing_value_policy.impute_missingremain readable and default that field totrue. manifest_sha256protects the embeddedmodel,inference,preprocessor,weights, and other top-level v3 manifest metadata against post-export edits.
Many-Class Modes
manifest.model.many_class_train_modeconfigures the training-time branch behavior (path_nllvsfull_probs) and is used when reconstructing the model from an export bundle.manifest.inference.many_class_inference_modeis an inference-runtime contract field and is currently fixed tofull_probs.
Producer Commands
Export from checkpoint:
tab-foundry export bundle \
--checkpoint outputs/cls_smoke/checkpoints/best.pt \
--out-dir outputs/exports/cls_smoke_v3
Validate bundle:
tab-foundry export validate \
--bundle-dir outputs/exports/cls_smoke_v3
Reference Consumer
This repo includes a reference-only executable consumer in
tab_foundry.export.loader_ref.
Scope:
- dense numeric matrices in
- persisted checksum and schema validation
- runtime-derived preprocessing using the support set
- model-native outputs out (
class_probsfor classification) - earlier task rejection for unsupported non-classification bundles
- strict output validation for classification heads: logits must be wide enough
for the declared class count, and emitted
class_probsmust match it exactly
Out of scope here:
- serving APIs
- dataframe adapters
- long-lived runtime ownership
- generalized production inference policy beyond the separate-repo handoff
Compatibility Policy
- Training checkpoints remain unchanged and are still used for resume/training workflows.
- Export bundles are the cross-repo inference handoff contract.
tab-foundry-export-v3is the default export format and the only executable reference-consumer contract in this repo.- Existing v3 bundles produced with sidecar metadata are intentionally obsolete after the single-manifest redesign and must be regenerated.
- Existing single-manifest v3 bundles without
manifest_sha256are intentionally obsolete after the metadata-integrity fix and must be regenerated. - Existing v3 bundles without
preprocessor.missing_value_policy.impute_missingremain validator-readable and default totrueuntil regenerated. - Existing v3 bundles without the additive staged-surface model fields remain validator-readable and default those values until regenerated.
tab-foundry-export-v2andtab-foundry-export-v1bundles are intentionally unsupported and must be regenerated onto the current classification-only export surface.- Checkpoint export/load now treats omitted
feature_group_sizeas1. Legacy grouped-token checkpoints that omitted that field are intentionally rejected and must be regenerated or loaded with an explicitfeature_group_sizeoverride before export.