Skip to content

convert.py is non-functional from the CLI #39

@Doomsbay

Description

@Doomsbay

Summary

The README advertises TSV → PROTAX-GPU conversion via scripts/convert.py, but the script cannot be run from the command line. It crashes on import due to hardcoded paths, and its CLI entry point performs no conversion.

Affected files

  • README.md L29 ("Compatible with TSV and PROTAX input format")
  • README.md L38 (convert.py — Converts .TSV to PROTAX-GPU format)
  • scripts/convert.py

Details

Crashes on import. Lines 398–404 run at module top level with hardcoded developer paths:

test = Path("/home/roy/Downloads/taxonomy.tsv")
test_ref = Path("/home/roy/Downloads/sequences.tsv")
read_jax_model(test_ref)  # crashes here → np.load("8M_tax.npz") at L302

The crash occurs inside read_jax_model at L302, which tries to load "8M_tax.npz" — a hardcoded intermediate file that doesn't exist outside the original developer's environment. The test_ref path is passed but never reached before this failure.

CLI entry point does nothing. The argparse block (L409–424) only prints "converting taxonomy..." / "converting model..."; it never calls convert_tsv(), assign_tax(), or convert_sequences().

Unfinished pieces.

  • trim_subtaxa()# TODO doesn't work yet (L106–107)
  • convert_sequences()# TODO remove hardcoded values (L260)
  • read_jax_model()# TODO: remove this leftover test code (L343)
  • assign_tax() depends on a hardcoded refs.npz intermediate file (L218)

Net effect: the advertised TSV pipeline is both undocumented and non-functional — there is no working path to convert your own TSV data.

Steps to reproduce

  1. git clone the repo and install per the README.
  2. Run python scripts/convert.py --taxonomy <some.tsv> (or run it with no args).

Expected behavior

The script converts the given TSV taxonomy/sequence files into the .npz format the package consumes.

Actual behavior

FileNotFoundError: [Errno 2] No such file or directory: '8M_tax.npz'

(raised at import time from the top-level read_jax_model(test_ref) call at L404, which tries to load the hardcoded "8M_tax.npz" intermediate file at L302). Even if that file existed, the --taxonomy/--model flags only print a message and perform no conversion.

Environment

  • Repo commit: 392614a
  • Python: 3.12

Proposed fix

  1. Short term: remove or gate the top-level execution code so the script does not crash on import, and note in the README that convert.py is a work in progress.
  2. Long term: wire the argparse block to the conversion functions, remove the hardcoded paths, and document the expected TSV format.
  3. Either way: clarify whether convert.py is the intended way to produce the taxonomy .npz, or whether it is deprecated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions