Skip to content

Conversation

@davidbaines
Copy link
Collaborator

@davidbaines davidbaines commented Nov 13, 2025

Add filtering so that XRI and other files that usually aren't published scriptures are excluded from the results.


This change is Reviewable

Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjaminking reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @davidbaines)


silnlp/common/bulk_extract_local.py line 28 at r1 (raw file):

        project_settings = parser.parse()

        # project_settings.name

Since this is under version control, you should be able to safely remove these commented-out lines and recover them via version control if they're needed later.


silnlp/common/bulk_extract_local.py line 88 at r1 (raw file):

    )
    parser.add_argument(
        "--exclude", metavar="books", nargs="+", default=[], help="The books to exclude; e.g., 'NT', 'OT', 'GEN'"

It might be good to produce an error or warning when a user uses both the --include and --exclude option, since it's not immediately obvious how they might interact with each other.


silnlp/common/find_by_iso.py line 90 at r1 (raw file):

    return {code for iso_code in iso_codes for code in (iso_code, ALT_ISO.get_alternative(iso_code)) if code}

def filter_files(files: List[Path], excluded_patterns:List[str]) -> List[Path]:

Maybe you could have this function accept regular expressions, so that you can pass in the date pattern -- then the function would only filter out the patterns that you pass.


silnlp/common/find_by_iso2.py line 1 at r1 (raw file):

import argparse

Is this intended to replace find_by_iso.py?


silnlp/common/usfm_utils.py line 19 at r1 (raw file):

    book = "PRO"
    fpath = Path(r"M:/Paratext/projects/NIV11/20PRONIV11.SFM")

It's probably better to leave these unchanged.


debug.log line 1 at r1 (raw file):

[0529/225757.623:ERROR:registration_protocol_win.cc(108)] CreateFile: The system cannot find the file specified. (0x2)

This file probably shouldn't be included in the PR.

@davidbaines
Copy link
Collaborator Author

Hi Ben. Thanks for that review. I didn't pay attention to all the files that were added and changed by this PR. I was only intending to change find_by_iso.py and no other file. I hope that the recent push has removed the spurious files from the commit.
If not. I might need to delete this branch and start a new branch from master.

Copy link
Collaborator

@benjaminking benjaminking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjaminking reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @davidbaines)


silnlp/common/usfm_utils.py line 1 at r1 (raw file):

from pathlib import Path

You accidentally deleted this file from your branch rather than removing it from the PR. No harm has been done to the main branch, but if we did merge this, it would delete the file from the main branch as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants