Skip to content

kadubon/frontier-transfer-certifier

Repository files navigation

Frontier Transfer Certifier

frontier-transfer-certifier is an alpha research prototype: a Python package and CLI for deciding whether a small-scale agent intervention has enough replayable evidence to justify a larger frontier trial.

This package provides full theory-registry coverage for the paper and implements selected conservative executable pathways for frontier-transfer certification. Formal items not backed by dedicated compilers are represented as schema-only artifacts, fail-closed diagnostics, or impossibility diagnostics. The package is not a formal proof assistant and not a complete implementation of every theorem.

The tool is deliberately conservative. It is not a predictor that says "the small model improved, so the frontier model will improve." It is an audit compiler that asks a narrower question:

Given replayable inputs, explicit descriptors, evaluation anchors, provenance tags, and lower-bound assumptions, is there still a positive conservative margin after all transport, protocol, cost, and confidence penalties are subtracted?

Implemented compiler paths return one of two outcomes:

  • certificate: the lower margin is positive under the supplied finite evidence and required external evidence has replay hashes, so the intervention can be treated as a candidate for a frontier trial.
  • abstention: some required evidence, descriptor coordinate, raw-log synthesis, semantic translation proof, evaluation anchor, or provenance tag is missing or invalid, so the tool refuses to make a positive claim.

This implementation follows:

Takahashi, K. (2026). Small-to-Frontier Transfer Theory for Agentic AI. Zenodo. https://doi.org/10.5281/zenodo.19619480

Scope And Limitations

  • Positive certificates are emitted only by implemented compiler paths.
  • Registry coverage means every formal paper item is represented. It does not mean every theorem has a dedicated executable compiler.
  • Schema-only, diagnostic-only, and impossibility-diagnostic registry entries cannot support positive certification by themselves.
  • External or imported evidence consumed by a positive certificate must be linked to field-level replay evidence with valid SHA-256 digests. External evidence without replay hashes weakens replayability and causes abstention.
  • Trusted-core and source-translation support is prototype-level unless checker digests, witness hashes, source certificate hashes, and translation witness hashes are supplied.
  • Confidence support is limited to fixed-horizon bounded-mean intervals and explicit manifest slack. Anytime-valid auditing and global alpha accounting are represented conservatively, not fully implemented.
  • Large portfolio fallback uses deterministic density ordering without claiming an approximation guarantee.
  • The package is designed to fail closed when assumptions are absent, unsupported, or under-specified.

Not Reproduced In Alpha

This alpha release is not a complete computational reproduction of the paper. The following parts are represented in the registry and documentation, but are not implemented as general positive certifiers:

  • mechanically checked proofs of the paper's theorems;
  • automatic discovery of hidden causal structure or missing descriptors from arbitrary unstructured logs;
  • open-world raw-log synthesis beyond finite manifest-declared event algebras, channels, statistics, and novelty gates;
  • replay retrieval or verification of external artifacts beyond user-supplied SHA-256 digests;
  • trusted-core kernel verification beyond digest and witness-hash checks;
  • source-translation proof checking beyond typed metadata, source-certificate hashes, translation-witness hashes, and positive source margins;
  • anytime-valid confidence sequences and global alpha accounting;
  • approximation guarantees for large portfolio allocation unless future compilers verify the required monotonicity or submodularity assumptions;
  • external predictive validity of semantic units beyond the finite packet, packing, anchor-loss, and translation-loss calculations supplied in the manifest.

These limitations are intentional trust boundaries. When a positive compiler path cannot represent and check the required assumptions, the output is a reason-coded abstention rather than a weakened certificate.

What The Tool Does

At the core, the package compiles a manifest into a lower-bound ledger:

G_lower - H_upper - C_upper
  - transport_penalty
  - protocol_penalty
  - confidence_slack
  > 0

where:

  • G is the direct lower-bounded gain.
  • H is the upper-bounded harmful or adverse load.
  • C is the upper-bounded residual cost.
  • The remaining terms are explicit conservative penalties.

The input is a YAML or JSON manifest. The output is deterministic JSON and a stable SHA-256 replay hash. The canonical replay hash is computed over the normalized manifest model using sorted-key JSON, deterministic separators, and no NaN or infinity. Raw file-byte hashing is available only with ftc replay --raw.

Five-Minute Quickstart

uv sync
uv run ftc profile
uv run ftc theorems
uv run ftc theory coverage
uv run ftc validate examples/split_inference_minimal.yaml
uv run ftc certify examples/split_inference_minimal.yaml

Expected certificate summary:

output_type                 certificate
certificate_mode            layered_validated
margin_lower                0.266
accepted_for_frontier_trial True

Write the certificate to disk:

uv run ftc certify examples/split_inference_minimal.yaml --out out/certificate.json

Choose A Usage Profile

Use only the layers you need. The layers are modular and can be adopted incrementally.

uv run ftc profile minimal
uv run ftc profile standard
uv run ftc profile advanced
uv run ftc profile full
Profile Level Use when Main commands
minimal 0 You only need schema validation, replay hashes, and abstention readiness checks. ftc validate, ftc replay, ftc schema export
standard 1 You have explicit component bounds or split-inference observables for one intervention. ftc certify
advanced 3 You need generated raw-log evidence, descriptor repair, or budgeted portfolio allocation. ftc synthesize, ftc descriptor, ftc allocate
full 4 You need all currently implemented alpha compiler paths: raw-log synthesis, descriptor-language growth, allocation, and semantic translation checks. ftc validate, ftc synthesize, ftc descriptor, ftc certify, ftc allocate, ftc semantic

CLI Examples

Validate and hash a manifest:

uv run ftc validate examples/split_inference_minimal.yaml
uv run ftc replay examples/split_inference_minimal.yaml
uv run ftc replay examples/split_inference_minimal.yaml --raw

List theorem IDs accepted by the implemented compiler scopes:

uv run ftc theorems
uv run ftc theorems transport
uv run ftc theory list
uv run ftc theory show thm:certificate-sound
uv run ftc theory coverage --json

Compile a single-intervention certificate:

uv run ftc certify examples/split_inference_minimal.yaml

Compile stage-one raw-log open-world synthesis:

uv run ftc synthesize examples/raw_log_open_world_minimal.yaml
uv run ftc certify examples/raw_log_open_world_minimal.yaml

Select a descriptor-language scaffold:

uv run ftc descriptor examples/descriptor_growth_minimal.yaml

Allocate a certified portfolio:

uv run ftc allocate examples/allocation_minimal.yaml

Compile semantic and source-translation checks:

uv run ftc semantic examples/semantic_translation_minimal.yaml

Export JSON Schema:

uv run ftc schema export transport_certificate_core --out out/transport.schema.json
uv run ftc schema export raw_log_synthesis --out out/raw_log_synthesis.schema.json
uv run ftc schema export descriptor_language_growth --out out/descriptor_growth.schema.json
uv run ftc schema export audit_policy_and_allocation --out out/allocation.schema.json
uv run ftc schema export semantic_and_translation_manifest --out out/semantic.schema.json

Create example manifests:

uv run ftc init split-inference --out scratch/split.yaml
uv run ftc init raw-log-open-world --out scratch/raw.yaml
uv run ftc init descriptor-growth --out scratch/descriptor.yaml
uv run ftc init allocation --out scratch/allocation.yaml
uv run ftc init semantic-translation --out scratch/semantic.yaml

Python API

from frontier_transfer_certifier import (
    compile_allocation,
    compile_certificate,
    compile_descriptor_growth,
    compile_raw_synthesis,
    compile_semantic_translation,
    get_profile,
)

profile = get_profile("standard")
print(profile.commands)

certificate = compile_certificate("examples/split_inference_minimal.yaml")
print(certificate.output_type)

stage_one = compile_raw_synthesis("examples/raw_log_open_world_minimal.yaml")
print(stage_one.output_type)

descriptor = compile_descriptor_growth("examples/descriptor_growth_minimal.yaml")
print(descriptor.selected_scaffold_id)

allocation = compile_allocation("examples/allocation_minimal.yaml")
print(allocation.selected_candidates)

semantic = compile_semantic_translation("examples/semantic_translation_minimal.yaml")
print(semantic.semantic_lower_index)

Manifest Requirements

The package never silently invents missing theory objects. A positive certificate requires compatible provenance for every consumed field.

If a positive certificate consumes imported evidence, each consumed imported field must have a field_evidence_ref pointing to a replay_ref with a valid 64-character lowercase SHA-256 digest. Example manifests use deterministic example digests; replace them with real evidence hashes before relying on a certificate outside demonstrations.

All core schemas reject unknown fields by default. Measurement, bound, penalty, probability, budget, weight, cost, margin, overlap, drift, and confidence fields must be finite; NaN and infinity are invalid. Harmful-cost, penalty, overhead, tail, drift, curvature, and slope upper bounds are nonnegative unless a field is explicitly documented as a signed lower-bound quantity.

Supported provenance tags:

  • raw_generated: produced by a finite compiler in this package, such as raw-log synthesis or descriptor-language growth.
  • imported_and_validated: imported from another source and validated in this framework.
  • declared_and_validated: declared bookkeeping objects such as confidence budgets, split partitions, or deterministic tie rules.
  • blocked: forbidden for downstream positive claims.

Important output modes:

  • native_raw_log_complete: all substantive consumed evidence is raw-generated by finite subcompilers implemented by this package or validated bookkeeping.
  • layered_validated: at least one substantive consumed evidence object is imported and validated.
  • abstention: no positive claim is made.

The included examples are intentionally conservative. Split-inference and finite-context manifests are layered_validated because their source-theory objects are supplied as validated inputs. A field should be tagged raw_generated only when it is emitted by a finite replayable subcompiler that this package can identify, such as raw_log_synthesis or descriptor_language_growth.

The theorem-dependency list is also checked. A manifest cannot name arbitrary paper theorem IDs and receive a positive result unless the current compiler scope implements or operationalizes those dependencies. ftc validate performs this strict check by default; pass --schema-only when you only want Pydantic schema validation.

Included Examples

  • examples/split_inference_minimal.yaml: split-inference worked example.
  • examples/finite_context_minimal.yaml: explicit bounded component evidence.
  • examples/raw_log_open_world_minimal.yaml: event-local raw logs compiled into stage-one bounds.
  • examples/descriptor_growth_minimal.yaml: descriptor scaffold selection.
  • examples/allocation_minimal.yaml: singleton, pairwise, coordination, resource, factor, and dense residual allocation packets.
  • examples/semantic_translation_minimal.yaml: semantic packing with trusted source-translation checks.

Documentation

Security And Data Handling

  • The CLI performs no network communication by default.
  • Raw private logs should not be copied into the repository. Store external references and replay hashes in manifests instead.
  • .env, credentials, private logs, local caches, out/, and data/private/ are ignored by Git.
  • CLI output avoids printing secret-like values.
  • CI is configured for tests, linting, typing, and secret-scan regression checks.

Alpha Release Trust Boundary

Registry coverage and implementation coverage are different. The registry is pinned to a recorded paper-source snapshot and covers every parsed formal item. Only selected registry entries are implemented by positive certifier code. All other entries are represented by schema-only or diagnostic artifacts that fail closed.

Trusted-core and source-translation claims require concrete witness hashes and checker digests before they can contribute to a positive semantic output. Unsupported theorem dependencies fail validation or produce abstention. The project does not claim production readiness, formal verification, or complete implementation of the paper.

License

Apache License 2.0. This repository intentionally does not include a NOTICE file.

About

Fail-closed certification toolkit for small-to-frontier transfer in agentic AI, with typed manifests, replayable evidence, and theory-registry coverage.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages