frontier-transfer-certifier is an alpha research prototype: a Python package
and CLI for deciding whether a small-scale agent intervention has enough
replayable evidence to justify a larger frontier trial.
This package provides full theory-registry coverage for the paper and implements selected conservative executable pathways for frontier-transfer certification. Formal items not backed by dedicated compilers are represented as schema-only artifacts, fail-closed diagnostics, or impossibility diagnostics. The package is not a formal proof assistant and not a complete implementation of every theorem.
The tool is deliberately conservative. It is not a predictor that says "the small model improved, so the frontier model will improve." It is an audit compiler that asks a narrower question:
Given replayable inputs, explicit descriptors, evaluation anchors, provenance tags, and lower-bound assumptions, is there still a positive conservative margin after all transport, protocol, cost, and confidence penalties are subtracted?
Implemented compiler paths return one of two outcomes:
certificate: the lower margin is positive under the supplied finite evidence and required external evidence has replay hashes, so the intervention can be treated as a candidate for a frontier trial.abstention: some required evidence, descriptor coordinate, raw-log synthesis, semantic translation proof, evaluation anchor, or provenance tag is missing or invalid, so the tool refuses to make a positive claim.
This implementation follows:
Takahashi, K. (2026). Small-to-Frontier Transfer Theory for Agentic AI. Zenodo. https://doi.org/10.5281/zenodo.19619480
- Positive certificates are emitted only by implemented compiler paths.
- Registry coverage means every formal paper item is represented. It does not mean every theorem has a dedicated executable compiler.
- Schema-only, diagnostic-only, and impossibility-diagnostic registry entries cannot support positive certification by themselves.
- External or imported evidence consumed by a positive certificate must be linked to field-level replay evidence with valid SHA-256 digests. External evidence without replay hashes weakens replayability and causes abstention.
- Trusted-core and source-translation support is prototype-level unless checker digests, witness hashes, source certificate hashes, and translation witness hashes are supplied.
- Confidence support is limited to fixed-horizon bounded-mean intervals and explicit manifest slack. Anytime-valid auditing and global alpha accounting are represented conservatively, not fully implemented.
- Large portfolio fallback uses deterministic density ordering without claiming an approximation guarantee.
- The package is designed to fail closed when assumptions are absent, unsupported, or under-specified.
This alpha release is not a complete computational reproduction of the paper. The following parts are represented in the registry and documentation, but are not implemented as general positive certifiers:
- mechanically checked proofs of the paper's theorems;
- automatic discovery of hidden causal structure or missing descriptors from arbitrary unstructured logs;
- open-world raw-log synthesis beyond finite manifest-declared event algebras, channels, statistics, and novelty gates;
- replay retrieval or verification of external artifacts beyond user-supplied SHA-256 digests;
- trusted-core kernel verification beyond digest and witness-hash checks;
- source-translation proof checking beyond typed metadata, source-certificate hashes, translation-witness hashes, and positive source margins;
- anytime-valid confidence sequences and global alpha accounting;
- approximation guarantees for large portfolio allocation unless future compilers verify the required monotonicity or submodularity assumptions;
- external predictive validity of semantic units beyond the finite packet, packing, anchor-loss, and translation-loss calculations supplied in the manifest.
These limitations are intentional trust boundaries. When a positive compiler path cannot represent and check the required assumptions, the output is a reason-coded abstention rather than a weakened certificate.
At the core, the package compiles a manifest into a lower-bound ledger:
G_lower - H_upper - C_upper
- transport_penalty
- protocol_penalty
- confidence_slack
> 0
where:
Gis the direct lower-bounded gain.His the upper-bounded harmful or adverse load.Cis the upper-bounded residual cost.- The remaining terms are explicit conservative penalties.
The input is a YAML or JSON manifest. The output is deterministic JSON and a
stable SHA-256 replay hash. The canonical replay hash is computed over the
normalized manifest model using sorted-key JSON, deterministic separators, and
no NaN or infinity. Raw file-byte hashing is available only with
ftc replay --raw.
uv sync
uv run ftc profile
uv run ftc theorems
uv run ftc theory coverage
uv run ftc validate examples/split_inference_minimal.yaml
uv run ftc certify examples/split_inference_minimal.yamlExpected certificate summary:
output_type certificate
certificate_mode layered_validated
margin_lower 0.266
accepted_for_frontier_trial True
Write the certificate to disk:
uv run ftc certify examples/split_inference_minimal.yaml --out out/certificate.jsonUse only the layers you need. The layers are modular and can be adopted incrementally.
uv run ftc profile minimal
uv run ftc profile standard
uv run ftc profile advanced
uv run ftc profile full| Profile | Level | Use when | Main commands |
|---|---|---|---|
minimal |
0 | You only need schema validation, replay hashes, and abstention readiness checks. | ftc validate, ftc replay, ftc schema export |
standard |
1 | You have explicit component bounds or split-inference observables for one intervention. | ftc certify |
advanced |
3 | You need generated raw-log evidence, descriptor repair, or budgeted portfolio allocation. | ftc synthesize, ftc descriptor, ftc allocate |
full |
4 | You need all currently implemented alpha compiler paths: raw-log synthesis, descriptor-language growth, allocation, and semantic translation checks. | ftc validate, ftc synthesize, ftc descriptor, ftc certify, ftc allocate, ftc semantic |
Validate and hash a manifest:
uv run ftc validate examples/split_inference_minimal.yaml
uv run ftc replay examples/split_inference_minimal.yaml
uv run ftc replay examples/split_inference_minimal.yaml --rawList theorem IDs accepted by the implemented compiler scopes:
uv run ftc theorems
uv run ftc theorems transport
uv run ftc theory list
uv run ftc theory show thm:certificate-sound
uv run ftc theory coverage --jsonCompile a single-intervention certificate:
uv run ftc certify examples/split_inference_minimal.yamlCompile stage-one raw-log open-world synthesis:
uv run ftc synthesize examples/raw_log_open_world_minimal.yaml
uv run ftc certify examples/raw_log_open_world_minimal.yamlSelect a descriptor-language scaffold:
uv run ftc descriptor examples/descriptor_growth_minimal.yamlAllocate a certified portfolio:
uv run ftc allocate examples/allocation_minimal.yamlCompile semantic and source-translation checks:
uv run ftc semantic examples/semantic_translation_minimal.yamlExport JSON Schema:
uv run ftc schema export transport_certificate_core --out out/transport.schema.json
uv run ftc schema export raw_log_synthesis --out out/raw_log_synthesis.schema.json
uv run ftc schema export descriptor_language_growth --out out/descriptor_growth.schema.json
uv run ftc schema export audit_policy_and_allocation --out out/allocation.schema.json
uv run ftc schema export semantic_and_translation_manifest --out out/semantic.schema.jsonCreate example manifests:
uv run ftc init split-inference --out scratch/split.yaml
uv run ftc init raw-log-open-world --out scratch/raw.yaml
uv run ftc init descriptor-growth --out scratch/descriptor.yaml
uv run ftc init allocation --out scratch/allocation.yaml
uv run ftc init semantic-translation --out scratch/semantic.yamlfrom frontier_transfer_certifier import (
compile_allocation,
compile_certificate,
compile_descriptor_growth,
compile_raw_synthesis,
compile_semantic_translation,
get_profile,
)
profile = get_profile("standard")
print(profile.commands)
certificate = compile_certificate("examples/split_inference_minimal.yaml")
print(certificate.output_type)
stage_one = compile_raw_synthesis("examples/raw_log_open_world_minimal.yaml")
print(stage_one.output_type)
descriptor = compile_descriptor_growth("examples/descriptor_growth_minimal.yaml")
print(descriptor.selected_scaffold_id)
allocation = compile_allocation("examples/allocation_minimal.yaml")
print(allocation.selected_candidates)
semantic = compile_semantic_translation("examples/semantic_translation_minimal.yaml")
print(semantic.semantic_lower_index)The package never silently invents missing theory objects. A positive certificate requires compatible provenance for every consumed field.
If a positive certificate consumes imported evidence, each consumed imported
field must have a field_evidence_ref pointing to a replay_ref with a valid
64-character lowercase SHA-256 digest. Example manifests use deterministic
example digests; replace them with real evidence hashes before relying on a
certificate outside demonstrations.
All core schemas reject unknown fields by default. Measurement, bound, penalty, probability, budget, weight, cost, margin, overlap, drift, and confidence fields must be finite; NaN and infinity are invalid. Harmful-cost, penalty, overhead, tail, drift, curvature, and slope upper bounds are nonnegative unless a field is explicitly documented as a signed lower-bound quantity.
Supported provenance tags:
raw_generated: produced by a finite compiler in this package, such as raw-log synthesis or descriptor-language growth.imported_and_validated: imported from another source and validated in this framework.declared_and_validated: declared bookkeeping objects such as confidence budgets, split partitions, or deterministic tie rules.blocked: forbidden for downstream positive claims.
Important output modes:
native_raw_log_complete: all substantive consumed evidence is raw-generated by finite subcompilers implemented by this package or validated bookkeeping.layered_validated: at least one substantive consumed evidence object is imported and validated.abstention: no positive claim is made.
The included examples are intentionally conservative. Split-inference and
finite-context manifests are layered_validated because their source-theory
objects are supplied as validated inputs. A field should be tagged
raw_generated only when it is emitted by a finite replayable subcompiler that
this package can identify, such as raw_log_synthesis or
descriptor_language_growth.
The theorem-dependency list is also checked. A manifest cannot name arbitrary
paper theorem IDs and receive a positive result unless the current compiler
scope implements or operationalizes those dependencies. ftc validate performs
this strict check by default; pass --schema-only when you only want Pydantic
schema validation.
examples/split_inference_minimal.yaml: split-inference worked example.examples/finite_context_minimal.yaml: explicit bounded component evidence.examples/raw_log_open_world_minimal.yaml: event-local raw logs compiled into stage-one bounds.examples/descriptor_growth_minimal.yaml: descriptor scaffold selection.examples/allocation_minimal.yaml: singleton, pairwise, coordination, resource, factor, and dense residual allocation packets.examples/semantic_translation_minimal.yaml: semantic packing with trusted source-translation checks.
- Theory To Code: mapping from paper objects to modules, models, and compiler behavior.
- Theory Registry: generated machine-readable coverage of every formal paper item.
- Alpha Implementation Matrix: registry coverage policy and selected implementation status summary.
- Architecture: module responsibilities and extension points.
- Manifest Guide: YAML manifest structure.
- Audit Review: code-review findings, implemented fixes, and remaining conservative boundaries.
- Operations: release, CI, security, and production-use checklist.
- Threat Model: security and misuse boundaries.
- The CLI performs no network communication by default.
- Raw private logs should not be copied into the repository. Store external references and replay hashes in manifests instead.
.env, credentials, private logs, local caches,out/, anddata/private/are ignored by Git.- CLI output avoids printing secret-like values.
- CI is configured for tests, linting, typing, and secret-scan regression checks.
Registry coverage and implementation coverage are different. The registry is pinned to a recorded paper-source snapshot and covers every parsed formal item. Only selected registry entries are implemented by positive certifier code. All other entries are represented by schema-only or diagnostic artifacts that fail closed.
Trusted-core and source-translation claims require concrete witness hashes and checker digests before they can contribute to a positive semantic output. Unsupported theorem dependencies fail validation or produce abstention. The project does not claim production readiness, formal verification, or complete implementation of the paper.
Apache License 2.0. This repository intentionally does not include a NOTICE
file.