Skip to content

[Submission] LangLint — String-Level Translation and Linting of Code Comments and Docstrings for Research Software #268

@HzaCode

Description

@HzaCode

Submitting Author: @HzaCode
All current maintainers: @HzaCode
Package Name: LangLint
One-Line Description of Package: A Rust-powered, code-aware toolkit that extracts, validates, and translates multilingual strings (comments, docstrings, string literals) in scientific software.
Repository Link: https://github.com/HzaCode/Langlint
Version submitted: v1.0.0
EiC: @yeelauren
Editor: TBD
Reviewer 1: TBD
Reviewer 2: TBD
Archive: TBD
JOSS DOI: TBD
Version accepted: TBD
Date accepted (month/day/year): TBD


Code of Conduct & Commitment to Maintain Package

  • I agree to abide by [pyOpenSci's Code of Conduct][PyOpenSciCodeOfConduct] during the review process and in future interactions in spaces supported by pyOpenSci should it be accepted.
  • I have read and will commit to package maintenance after the review as per the [pyOpenSci Policies Guidelines][Commitment].

Description

LangLint ensures multilingual consistency at the string level inside code—comments, docstrings, and string literals—rather than full documents. It safely extracts human-language units from parsed code, validates language consistency, and optionally translates them while protecting executable syntax.
The core is implemented in Rust (PyO3/maturin) for significant speedups (observed 10–50× vs. a prior Python implementation). The test suite runs offline by default (using a mock translator) for reproducibility. Optional remote providers (e.g., OpenAI, DeepL, Google Cloud Translate, LibreTranslate) are opt-in and require explicit user configuration; usage is documented to comply with each provider’s Terms of Service.
LangLint integrates cleanly with CI/CD (GitHub Actions, pre-commit) so teams can enforce multilingual consistency much like they enforce style with Ruff.


Scope

  • Data extraction
  • Data processing/munging
  • Data validation and testing
  • Workflow automation

Domain Specific

  • Geospatial
  • Education

Community Partnerships

  • Astropy
  • Pangeo

Why it fits the scope

  • Target audience: Research software engineers and scientists maintaining multilingual codebases where comments/docstrings/strings are not all in English.
  • Scientific applications: Improves readability, reproducibility, and FAIR compliance of research code by standardizing the linguistic layer that explains methods and assumptions.
  • Other packages: Linters (Ruff/Flake8) focus on syntax/style; translation libraries focus on free text. LangLint uniquely bridges both—it is code-aware, extracting only translatable string units and validating/transforming them without altering code semantics.
  • Pre-submission enquiry: N/A (initial submission).

Technical checks

This package:

  • does not violate the Terms of Service of any service it interacts with.
    Notes: Tests/CI run offline using a mock translator. Any remote translators are opt-in, require explicit configuration (e.g., API keys), and documented to be used in accordance with each provider’s ToS.
  • uses an [OSI approved license][OsiApprovedLicense] (MIT).
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a tutorial with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration setup (GitHub Actions).
    Benchmarking: We provide (or will provide) a simple, reproducible benchmark script (e.g., bench/) to substantiate performance claims.

Publication Options

Note: JOSS accepts pyOpenSci’s review. We will link this issue in the JOSS submission and indicate that the package has undergone pyOpenSci review.


Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

  • Yes I am OK with reviewers submitting requested changes as issues/PRs to my repo. Reviewers will then link to the issues in their submitted review.

Confirm each of the following:


Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    pre-review-checks

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions