Skip to content

Add extensible prover implementation #103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

tothtamas28
Copy link
Contributor

@tothtamas28 tothtamas28 commented May 15, 2025

This PR introduces a module kprovex that defines an extensible prover based on APRPRover. It has the the following submodules:

  • kprovex.api: the plugin API definition. A plugin provides the K definition to the prover, as well as functions for loading and printing proofs of that definition.
  • kprovex._default: semantics-agnostic defaults for loading and printing proofs.
  • kprovex._loader: plugin loader.
  • kprovex._kprovex: prover implementation.

Furthermore, it adds a simple plugin implementation for riscv-semantics.

The advantage of this architecture is that kprovex can be upstreamed to pyk without breaking it or the riscv-semantics prover. (Naturally, imports still need to be adjusted.)

@tothtamas28 tothtamas28 self-assigned this May 15, 2025
Comment on lines +87 to +95
class Plugin(ABC):
@abstractmethod
def dist(self) -> Dist: ...

def inits(self) -> Mapping[str, Init]:
return {}

def shows(self) -> Mapping[str, Show]:
return {}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, a plugin defines three things:

  • K definition to be used.
  • Ways for loading specifications (optional).
  • Ways for printing proof nodes (optional).

from .api import Config, Init, Show


def init_from_claims(config: Config, spec_file: Path, claim_id: str) -> APRProof:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All plugins implicitly support loading claims from K files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that everywhere we have claim_id as a str, but not a str | None. Will this enable the case where a user wants to either (i) run the only claimthat is in th efile, or (ii) run all th eclaims in a given file?

return proof


def show_pretty_term(config: Config, term: KInner) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, all plugins support printing proof nodes using kore-print.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using kore_print? I think we should use Formatter or PrettyPrinter instead, does kore_print corrcetly handle thinsg like variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does kore_print corrcetly handle thinsg like variables?

Yes, kore_print handles symbolic terms as well.

Why are we using kore_print? I think we should use Formatter or PrettyPrinter instead

We could use Formatter for sure. An advantage of kore_print is that it handles an extra (non-trivial) case for parenthesization. Also, although I do not have measurements, for large terms, kore_print is probably significantly faster as it is implemented in C++.

Should we switch to Formatter as the default Show implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No if kore_print works on symbolic values we should be good.

Comment on lines +23 to +31
class KRiscVPlugin(Plugin):
def dist(self) -> Dist:
from pyk.kdist import kdist

return Dist(
haskell_dir=kdist.get('riscv-semantics.haskell'),
llvm_lib_dir=kdist.get('riscv-semantics.llvm-lib'),
source_dirs=(kdist.get('riscv-semantics.source'),),
)
Copy link
Contributor Author

@tothtamas28 tothtamas28 May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minimal plugin for riscv-semantics: it only defines the definitions to be used.

Comment on lines +20 to +21
[tool.poetry.plugins.kprovex]
riscv = "kriscv.symtools:KRiscVPlugin"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prover plugin is registered.

from .api import Init, Plugin, Show


def create_prover(plugin_id: str, proof_dir: str | Path, *, bug_report: BugReport | None = None) -> KProveX:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A prover for a registered plugin can be instantiated by referring to the plugin by name:

prover = create_prover('riscv', proof_dir='.proofs')

return _ID_PATTERN.fullmatch(s) is not None


PLUGINS: Final = _load_plugins()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this module is imported, all plugins are loaded. (Lazy loading could be improved so that each plugin is loaded only when it is first referred to.)


@final
@dataclass
class KProveX:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prover implementation: currently supports a minimal set of useful features including show, view, advance+proof, prune, etc.

@tothtamas28
Copy link
Contributor Author

After finalizing the interfaces I'll add docstrings for the public API.

@tothtamas28 tothtamas28 marked this pull request as ready for review May 19, 2025 14:34
Comment on lines +65 to +79
def init_proof(
self,
spec_file: str | Path,
claim_id: str,
*,
init_id: str | None = None,
exist_ok: bool = False,
) -> str:
spec_file = Path(spec_file)
init = self._load_init(init_id=init_id)
proof = init(config=self.config, spec_file=spec_file, claim_id=claim_id)
if not exist_ok and APRProof.proof_data_exists(proof.id, self.proof_dir):
raise ValueError(f'Proof with id already exists: {proof.id}')

proof.write_proof_data()
return proof.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we makes it more flexible? For example, when I'm working on the zekvm-harness, I starts from a concrete configuration and constructs a claim by myself:

        # concrete config
        elf_file = build_elf(test_id, load_template, build_config)
        symdata = {resolve_symbol(elf_file, f'OP{i}'): (32, f'W{i}') for i in range(arg_count)}
        # make symbolic initial state
        init_config = _init_config(symdata, build_config, elf_file, tools(build_config.target))
        # construct claim
        kclaim = cterm_build_claim(test_id.upper(), init_config, _final_config(symtool))
        proof = APRProof.from_claim(symtool.kprove.definition, kclaim[0], {}, symtool.proof_dir)

BTW, must we have a final state? For my experience, I always start with a totally abstract state to explore the state transition graph because there are always problems at the begining. Additionally, we can eliminate the time to implies the final state if I know the step it might reach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation for this class to to simplify the following test:

    # Given
    symtool = symtools(f'{build_config.target}-haskell', f'{build_config.target}-lib', 'zkevm-semantics.source')
    if APRProof.proof_data_exists(test_id.upper(), symtool.proof_dir):
        proof = APRProof.read_proof_data(proof_dir=symtool.proof_dir, id=test_id.upper())
    else:
        elf_file = build_elf(test_id, load_template, build_config)
        symdata = {resolve_symbol(elf_file, f'OP{i}'): (32, f'W{i}') for i in range(arg_count)}
        init_config = _init_config(symdata, build_config, elf_file, tools(build_config.target))
        kclaim = cterm_build_claim(test_id.upper(), init_config, _final_config(symtool))
        proof = APRProof.from_claim(symtool.kprove.definition, kclaim[0], {}, symtool.proof_dir)

    # When
    with symtool.explore(id=test_id.upper()) as kcfg_explore:
        prover = APRProver(
            kcfg_explore=kcfg_explore,
            execute_depth=DEPTH,
            # optimize_kcfg=True,
        )
        prover.advance_proof(proof, max_iterations=MAX_ITERATIONS)

    proof_show = APRProofShow(symtool.kprove)
    show_result = '\n'.join(proof_show.show(proof, [node.id for node in proof.kcfg.nodes]))
    (symtool.proof_dir / f'{test_id.upper()}-proof-result.txt').write_text(show_result)

into something like

kprovex = create_prover(build_config.target, proof_dir, bug_report = bug_report)

elf_file = build_elf(test_id, load_template, build_config)
symdata = {resolve_symbol(elf_file, f'OP{i}'): (32, f'W{i}') for i in range(arg_count)}
init_config = _init_config(symdata, build_config, elf_file, tools(build_config.target))
# I can change the reinit to True to restart a proof
proof_id = kprovex.init_proof_from_cterm(test_id.upper(), init_config, _final_config(symtool), exist_ok = True, reinit = False)  
# I might prune a node if it is too far away from the error point for easy investigation.
# kprovex.prune_node(proof_id, 12)  # I can change the number to continue execution from the node before it.
# I need optimize_kcfg to make the kcfg small
krpovex.advance_proof(proof_id, max_depth = 100, max_iterations = 100, optimize_kcfg = True) 
# It can be helpful if I can also minimize the proof whenever I want. It can be better if the minimization generate a new proof.
# kprovex.minimize_proof()
# it can be better if I can see the difference between particular nodes and eliminate specific cells that I don't care.
(symtool.proof_dir / f'{test_id.upper()}-proof-result.txt').write_text(kprovex.show(proof_id))

I think this PR is good enough to use , and I can provide further modification if I need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants