Skip to content

RS codes cleanup and refactor#375

Open
katyhr wants to merge 32 commits intomainfrom
Katy/ProximityRefactor
Open

RS codes cleanup and refactor#375
katyhr wants to merge 32 commits intomainfrom
Katy/ProximityRefactor

Conversation

@katyhr
Copy link
Copy Markdown
Collaborator

@katyhr katyhr commented Feb 25, 2026

Cleanup of some Coding Theory

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 25, 2026

🤖 Gemini PR Summary

Refactor of the Coding Theory library focusing on Reed-Solomon (RS) codes and proximity gap results.

Refactoring and API Standardization

  • Namespace Migration: Replaced the ReedSolomonCode naming convention with a unified ReedSolomon namespace across BCIKS20, AHIV22, and DG25 modules.
  • Namespace Management: Replaced global namespace openings with scoped or explicit qualifications in modules such as BinaryBasefold/Prelude.lean to prevent symbol collisions.
  • Proof Optimization: Refined existing proofs by replacing generic simp calls with simp only and removing redundant classical tactics.

Mathematical Formalization

  • Vandermonde Matrices: Added a dedicated module for non-square Vandermonde matrices, including rank characterization and their relationship to polynomial evaluation.
  • Trivariate Polynomials: Introduced Trivariate.lean defining polynomials in three variables as nested univariate polynomials ($F[X][Y][Z]$). Includes definitions for $Y$-degree, $YZ$-degree, and infrastructure for mapping to bivariate polynomials over rational function fields.
  • Typeclass Generalization: Removed unnecessary DecidableEq requirements on field elements and index types in theorems related to proximity gaps and Reed-Solomon rates.

Incomplete Proofs (Critical)

The following modules contain sorry or admit placeholders:

  • Proximity Gaps: Theorems in BCIKS20/AffineSpaces.lean and BCIKS20/Curves.lean.
  • List Decoding: Claims 5.7 through 5.11 in BCIKS20/ListDecoding/Agreement.lean.
  • Guruswami-Sudan: Existence proofs for solutions in BCIKS20/ListDecoding/Guruswami.lean and non-zero discriminant proofs in BCIKS20/ListDecoding/Extraction.lean.

Documentation and Metadata

  • Reference Alignment: Updated module documentation with formal citations to the [BCIKS20] and [AHIV22] papers.
  • Docstrings: Enhanced documentation for core definitions in CodingTheory/Basic.lean and ReedSolomon.lean.

Statistics

Metric Count
📝 Files Changed 26
Lines Added 563
Lines Removed 794

Lean Declarations

✏️ **Removed:** 16 declaration(s)
  • def toFinset (domain : ι ↪ F) (deg : ℕ) : Finset (ι → F) in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • lemma subLeftFull_of_vandermonde_is_vandermonde (h : m ≤ n) : in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • abbrev sqrtRate [Fintype ι] (deg : ℕ) (domain : ι ↪ F) : ℝ≥0 in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • instance instFintypeInterleavedModuleCode [Fintype A] : Fintype (MC ^⋈ κ) in ArkLib/Data/CodingTheory/InterleavedCode.lean
  • instance interleavedCodeSet_fintype {A : Type*} {κ ι : Type*} in ArkLib/Data/CodingTheory/InterleavedCode.lean
  • def nonsquare [Semiring F] (ι' : ℕ) (α : ι → F) : Matrix ι (Fin ι') F in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • lemma nonsquare_mulVecLin [CommSemiring F] {ι' : ℕ} {α₁ : ι ↪ F} {α₂ : Fin ι' → F} {i : ι} : in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • lemma subUpFull_of_vandermonde_is_vandermonde (h : n ≤ m) : in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • def D_Y (Q : F[Z][X][Y]) : ℕ in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Guruswami.lean
  • lemma rank_nonsquare_eq_deg_of_deg_le (inj : Function.Injective α) (h : n ≤ m) : in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • def nonsquareTranspose [Field F] (ι' : ℕ) (α : ι ↪ F) : Matrix (Fin ι') ι F in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • theorem mulVecLin_coeff_vandermondens_eq_eval_matrixOfPolynomials in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • lemma rank_nonsquare_eq_deg_of_ι_le (inj : Function.Injective α) (h : m ≤ n) : in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • lemma rank_nonsquare_rows_eq_min (inj : Function.Injective α) : in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • def finCarrier {ι : Type} [Fintype ι] in ArkLib/Data/CodingTheory/ReedSolomon.lean
  • def D_YZ (Q : F[Z][X][Y]) : ℕ in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Guruswami.lean
✏️ **Added:** 11 declaration(s)
  • lemma rank_nonsquare_eq_deg_of_deg_le (inj : Function.Injective α) (h : n ≤ m) : in ArkLib/Data/Matrix/Vandermonde.lean
  • lemma nonsquare_mulVecLin [CommSemiring F] {ι' : ℕ} {α₁ : ι ↪ F} {α₂ : Fin ι' → F} {i : ι} : in ArkLib/Data/Matrix/Vandermonde.lean
  • lemma rank_nonsquare_eq_deg_of_ι_le (inj : Function.Injective α) (h : m ≤ n) : in ArkLib/Data/Matrix/Vandermonde.lean
  • lemma rank_nonsquare_rows_eq_min (inj : Function.Injective α) : in ArkLib/Data/Matrix/Vandermonde.lean
  • lemma subUpFull_of_vandermonde_is_vandermonde (h : n ≤ m) : in ArkLib/Data/Matrix/Vandermonde.lean
  • def nonsquareTranspose [Field F] (ι' : ℕ) (α : ι ↪ F) : Matrix (Fin ι') ι F in ArkLib/Data/Matrix/Vandermonde.lean
  • lemma subLeftFull_of_vandermonde_is_vandermonde (h : m ≤ n) : in ArkLib/Data/Matrix/Vandermonde.lean
  • def D_YZ (Q : F[Z][X][Y]) : ℕ in ArkLib/Data/Polynomial/Trivariate.lean
  • def nonsquare [Semiring F] (ι' : ℕ) (α : ι → F) : Matrix ι (Fin ι') F in ArkLib/Data/Matrix/Vandermonde.lean
  • theorem mulVecLin_coeff_vandermondens_eq_eval_matrixOfPolynomials in ArkLib/Data/Matrix/Vandermonde.lean
  • def D_Y (Q : F[Z][X][Y]) : ℕ in ArkLib/Data/Polynomial/Trivariate.lean
✏️ **Affected:** 8 declaration(s) (line number changed)
  • lemma rateOfLinearCode_eq_div' {ι : Type*} [Fintype ι] {F : Type*} [Field F] in ArkLib/Data/CodingTheory/ReedSolomon.lean moved from L303 to L195
  • def δ_ε_multilinearCorrelatedAgreement [CommRing F] [Module F A] in ArkLib/Data/CodingTheory/ProximityGap/Basic.lean moved from L141 to L119
  • lemma dim_eq_deg_of_le' {ι : Type*} [Fintype ι] {F : Type*} [Field F] {n : ℕ} {α : ι ↪ F} [NeZero n] in ArkLib/Data/CodingTheory/ReedSolomon.lean moved from L257 to L149
  • lemma gamma_eq_P (h_gs : ModifiedGuruswami m n k ωs Q u₀ u₁) : in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Agreement.lean moved from L146 to L114
  • lemma discr_of_irred_components_nonzero (_h_gs : ModifiedGuruswami m n k ωs Q u₀ u₁) : in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Extraction.lean moved from L47 to L42
  • lemma exists_factors_with_large_common_root_set (δ : ℚ) (x₀ : F) in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Agreement.lean moved from L32 to L28
  • lemma irreducible_H (h_gs : ModifiedGuruswami m n k ωs Q u₀ u₁) : Irreducible (H k δ x₀ h_gs) in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Agreement.lean moved from L66 to L51
  • lemma modified_guruswami_has_a_solution {m n k : ℕ} {ωs : Fin n ↪ F} {u₀ u₁ : Fin n → F} : in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Guruswami.lean moved from L177 to L89

sorry Tracking

✏️ **Affected:** 2 `sorry`(s) (line number changed)
  • lemma exists_a_set_and_a_matching_polynomial in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Guruswami.lean moved from L243 to L142
  • lemma modified_guruswami_has_a_solution {m n k : ℕ} {ωs : Fin n ↪ F} {u₀ u₁ : Fin n → F} : in ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Guruswami.lean moved from L182 to L91

🎨 **Style Guide Adherence**

There are more than 20 violations of the style guide. They are grouped by rule below:

  • Documentation Standards (29 violations)

    • Every definition and major theorem should have a docstring.
    • Representative Examples:
      • def relHammingDistRange (ArkLib/Data/CodingTheory/Basic.lean:1369) is missing a docstring.
      • noncomputable def fromRowGenMat (ArkLib/Data/CodingTheory/Basic.lean:1898) is missing a docstring.
      • def evalOnPoints (ArkLib/Data/CodingTheory/ReedSolomon.lean:36) is missing a docstring.
  • Symbol Naming (10 violations)

    • "Avoid (ge) and > (gt) in theorem statements unless necessary for argument ordering."
    • Representative Examples:
      • lemma dist_pos_of_Nontrivial ... : Code.dist C > 0 := by (ArkLib/Data/CodingTheory/Basic.lean:282) uses >.
      • theorem projection_injective ... : hammingDist u v ≥ ‖C‖₀ := by (ArkLib/Data/CodingTheory/Basic.lean:1775) uses .
      • noncomputable def δ_ε_multilinearCorrelatedAgreement ... Pr_{...} > ε (ArkLib/Data/CodingTheory/ProximityGap/Basic.lean:125) uses >.
  • Syntax and Formatting - Empty Lines (12 violations)

    • "Avoid empty lines inside definitions or proofs."
    • Representative Examples:
      • theorem dist'_eq_dist contains multiple empty lines inside the proof block (ArkLib/Data/CodingTheory/Basic.lean, e.g., lines 681, 685, 691).
      • theorem closeToWord_iff_exists_possibleDisagreeCols contains empty lines inside the proof (ArkLib/Data/CodingTheory/Basic.lean:410, 415).
  • Naming Conventions - Acronyms (5 violations)

    • "Acronyms: Treat as words (e.g., HtmlParser not HTMLParser)."
    • Representative Examples:
      • theorem UDR_close_iff_relURD_close (ArkLib/Data/CodingTheory/Basic.lean:1704) uses all-caps acronyms; should be Udr and Urd.
      • abbrev RScodeSet (ArkLib/Data/CodingTheory/ReedSolomon.lean:76) uses all-caps RS; should be RsCodeSet.
  • Variable Conventions (3 violations)

    • Use specific single-letter naming conventions (e.g., m, n, k for natural numbers).
    • Representative Examples:
      • lemma relDist_floor_bound_iff_complement_bound (n upperBound : ℕ) (ArkLib/Data/CodingTheory/Basic.lean:1340) uses upperBound as a variable name.
      • lemma relDist_floor_bound_iff_complement_bound ... (n : ENNReal) - r ≤ k (ArkLib/Data/CodingTheory/Basic.lean:1350) uses r for a natural number/real, which is reserved for predicates.
  • Syntax and Formatting - Line Length (4 violations)

    • "Keep lines under 100 characters."
    • Representative Examples:
      • theorem poly_eq_zero_of_dist_lt (ArkLib/Data/CodingTheory/Basic.lean:2273) exceeds 100 characters.
      • theorem proximity_gap_affineSubspace (ArkLib/Data/CodingTheory/DivergenceOfSets.lean:181) exceeds 100 characters.

📄 **Per-File Summaries**
  • ArkLib.lean: This update expands the library's accessible modules by importing newly added support for Vandermonde matrices and trivariate polynomials. The changes do not directly introduce new theorems or definitions within this file, nor do they include any sorry placeholders.
  • ArkLib/Data/CodingTheory/Basic.lean: This PR refines the coding theory library by adding comprehensive docstrings and reformatting module documentation for improved readability. It optimizes numerous proofs by replacing generic simp calls with explicit simp only statements and removing redundant classical tactics, while introducing no new theorems or sorry placeholders.
  • ArkLib/Data/CodingTheory/DivergenceOfSets.lean: This file updates references and namespaces by renaming ReedSolomonCode to ReedSolomon across several theorems and proofs related to proximity gaps and error bounds. These changes are strictly refactors to align with library naming conventions and do not introduce new definitions, theorems, or sorry placeholders.
  • ArkLib/Data/CodingTheory/InterleavedCode.lean: The changes streamline the file by removing redundant imports and narrowing the scope of the noncomputable modifier to specific instances.
  • ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean: This update adjusts Reed-Solomon library references and generalizes the combinatorial proximity gap lemma for affine lines by removing an unnecessary DecidableEq typeclass requirement.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/BWMatrix.lean: This PR updates the BWMatrix.lean file to align with a refactoring of the Reed-Solomon code library, specifically updating theorem references from ReedSolomonCode to ReedSolomon. The changes modify existing proofs and clean up imports and namespace openings without introducing new theorems or sorry placeholders.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/GoodCoeffs.lean: This update refactors the file to align with library-wide changes to the Reed-Solomon code implementation, specifically renaming references from ReedSolomonCode to ReedSolomon. It modifies an existing proof to use the updated theorem name and streamlines the imports and namespace declarations.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/JointAgreement.lean: This change refines the file's configuration by adding a dependency on Reed-Solomon codes and streamlining namespace and scoped openings. No new definitions, theorems, or sorry placeholders are introduced in this diff.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/Main.lean: This change refactors the main theorem for correlated agreement over affine lines by updating references to the Reed-Solomon rate calculation and relaxing the DecidableEq requirement for the index type. It primarily serves to align the theorem's statement with recent API changes in the library's Reed-Solomon code definitions.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/UniqueDecoding.lean: This update refines the module's namespace declarations and relaxes the typeclass constraints for the RS_correlatedAgreement_affineLines_uniqueDecodingRegime theorem by omitting an unnecessary DecidableEq requirement. No new theorems or sorry placeholders were introduced.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineSpaces.lean: This file updates imports and renames references from ReedSolomonCode to ReedSolomon to align with a refactored library structure. The changes modify existing theorem statements to use the updated nomenclature, and the correlatedAgreement_affine_spaces theorem continues to include a sorry placeholder.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/Curves.lean: This file updates theorem statements and documentation to align with naming conventions in the ReedSolomon library and formal references to [BCIKS20]. The changes include a renamed function and a new import, though sorry placeholders remain for the primary proximity gap theorems.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ErrorBound.lean: This change updates the file's dependencies by importing Reed-Solomon code definitions and refactors namespace management for improved access to coding theory symbols. It does not introduce new theorems, definitions, or sorry placeholders.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Agreement.lean: This change refines the formalization of Claims 5.7 through 5.11 from the BCIKS20 paper regarding proximity gaps and list decoding solutions. It introduces new dependencies on trivariate polynomials and rational functions, updates lemma statements with Finite F constraints, and provides definitions for extracting polynomials used in the proofs. All major theorems in this file currently contain sorry placeholders.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Extraction.lean: This file refactors imports and formatting for lemmas and definitions related to Guruswami list decoding extraction within the BCIKS20 proximity gap framework. The changes include the definition of the set of irreducible factors pg_Rset and a statement for the non-zero discriminant of these components, though the latter currently contains a sorry placeholder.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ListDecoding/Guruswami.lean: This refactor streamlines the file by consolidating imports and moving definitions for trivariate polynomial degrees (D_Y and D_YZ) to a dedicated module. The changes also update docstrings and lemma structures for consistency with the [BCIKS20] paper, while maintaining sorry placeholders for the existence of Guruswami-Sudan solutions and matching polynomials.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/Prelude.lean: This change adds an import for trivariate polynomials to the BCIKS20 proximity gap prelude to support forthcoming developments in this theory. No new definitions, theorems, or sorry placeholders are introduced in this file.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/ReedSolomonGap.lean: This change refactors the proximity_gap_RSCodes theorem by updating internal references from ReedSolomonCode to ReedSolomon and refining namespace imports. No new theorems are introduced, and no sorry or admit placeholders were added.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/WeightedAgreement.lean: This refactor updates the file to use the ReedSolomon namespace and definitions, such as sqrtRate and finCarrier, replacing the previous ReedSolomonCode names. The changes maintain existing theorems for weighted correlated agreement over curves and affine spaces without introducing new logic or sorry placeholders.
  • ArkLib/Data/CodingTheory/ProximityGap/Basic.lean: This change streamlines the basic definitions for proximity gaps and correlated agreement by removing unused imports and the entire Trivariate namespace. It also refines several definition signatures by removing redundant typeclass constraints (such as DecidableEq F), without introducing any new theorems, proofs, or sorry placeholders.
  • ArkLib/Data/CodingTheory/ProximityGap/DG25.lean: This PR refactors ArkLib/Data/CodingTheory/ProximityGap/DG25.lean by streamlining imports and updating existing proofs to use the ReedSolomon namespace instead of ReedSolomonCode. No new theorems, definitions, or sorry placeholders are introduced.
  • ArkLib/Data/CodingTheory/ReedSolomon.lean: This refactor moves general Vandermonde matrix definitions and rank properties to a dedicated module, updating ReedSolomon.lean to use these external components. The changes standardize the ReedSolomon namespace, update proofs for minimal distance and dimension results to use more efficient library lemmas, and improve documentation consistency. No sorry or admit placeholders were introduced.
  • ArkLib/Data/Matrix/Vandermonde.lean: This new file introduces definitions and theorems for non-square Vandermonde matrices, specifically characterizing their rank and establishing their relationship to polynomial evaluation.
  • ArkLib/Data/Polynomial/RationalFunctions.lean: Streamlined imports and improved code formatting and documentation for better readability.
  • ArkLib/Data/Polynomial/Trivariate.lean: This new file defines trivariate polynomials as nested univariate polynomials and introduces notation for variables $X$, $Y$, and $Z$ alongside operations for evaluation and degree calculation. It provides specific definitions for mapping trivariate polynomials to bivariate polynomials over rational functions and computing the $Y$ and $YZ$ degrees required for formalizing proximity gap results.
  • ArkLib/ProofSystem/Binius/BinaryBasefold/Prelude.lean: This change removes the ReedSolomon namespace from the global scope and modifies a proof by explicitly qualifying references to ReedSolomon.code and ReedSolomon.evalOnPoints.

Last updated: 2026-03-13 15:50 UTC.

@katyhr katyhr changed the title Proximity Gaps + Coding Theory Cleanup and Refactor RS codes cleanup and refactor Mar 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 13, 2026

Build Timing Report

  • Commit: 348f198
  • Message: Merge 4e7d592 into f6b4144
  • Ref: Katy/ProximityRefactor
  • Comparison baseline: b03737b from the previous successful PR update.
  • Measured on ubuntu-latest with /usr/bin/time -p.
  • Commands: clean build rm -rf .lake/build && lake build; warm rebuild lake build; validation wrapper ./scripts/validate.sh.
Measurement Baseline (s) Current (s) Delta (s) Status
Clean build 657.23 652.08 -5.15 ok
Warm rebuild 5.97 5.74 -0.23 ok
Validation wrapper 4.06 4.18 +0.12 ok

Incremental Rebuild Signal

  • Warm rebuild saved 646.34s vs clean (113.60x faster).

This compares a clean project build against an incremental rebuild in the same CI job; it is a lightweight variability signal, not a full cross-run benchmark.

Slowest Current Clean-Build Files

Showing 20 slowest current targets, with comparison against the selected baseline when available.

Current (s) Baseline (s) Delta (s) Path
87.00 85.00 +2.00 ArkLib/Data/CodingTheory/JohnsonBound/Lemmas.lean
69.00 69.00 +0.00 ArkLib/ProofSystem/Fri/Spec/SingleRound.lean
60.00 59.00 +1.00 ArkLib/Data/CodingTheory/BerlekampWelch/Condition.lean
58.00 57.00 +1.00 ArkLib/Data/CodingTheory/GuruswamiSudan/Basic.lean
52.00 53.00 -1.00 ArkLib/OracleReduction/Security/RoundByRound.lean
52.00 52.00 +0.00 ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/BWMatrix.lean
47.00 47.00 +0.00 ArkLib/Data/CodingTheory/JohnsonBound/Basic.lean
45.00 45.00 +0.00 ArkLib/OracleReduction/LiftContext/Reduction.lean
43.00 40.00 +3.00 ArkLib/Data/CodingTheory/Basic.lean
33.00 34.00 -1.00 ArkLib/Data/CodingTheory/DivergenceOfSets.lean
32.00 30.00 +2.00 ArkLib/ProofSystem/Fri/Domain.lean
30.00 28.00 +2.00 ArkLib/Data/CodingTheory/PolishchukSpielman/Resultant.lean
30.00 29.00 +1.00 ArkLib/Data/CodingTheory/ProximityGap/DG25.lean
30.00 30.00 +0.00 ArkLib/ProofSystem/Binius/BinaryBasefold/Prelude.lean
26.00 26.00 +0.00 ArkLib/OracleReduction/Composition/Sequential/Append.lean
25.00 28.00 -3.00 ArkLib/Data/CodingTheory/PolishchukSpielman/Existence.lean
24.00 24.00 +0.00 ArkLib/Data/CodingTheory/ProximityGap/BCIKS20/AffineLines/GoodCoeffs.lean
23.00 22.00 +1.00 ArkLib/OracleReduction/ProtocolSpec/SeqCompose.lean
23.00 24.00 -1.00 ArkLib/ProofSystem/Binius/BinaryBasefold/Steps.lean
22.00 20.00 +2.00 ArkLib/OracleReduction/OracleInterface.lean

@katyhr katyhr marked this pull request as ready for review March 13, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant