Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b0869ac
🪡 feat(download): add download and last update date tracking to schem…
himal2007 Jan 7, 2026
85cd2e6
➕ feat(cli): add connect subcommand for initial database registration
himal2007 Jan 7, 2026
3a351bc
🔌 feat(connect): add connect validation with test API requests
himal2007 Jan 7, 2026
86086e1
♻️ feat(auth): extract session token refresh logic into reusable func…
himal2007 Jan 7, 2026
42cbe99
🪡 feat(update): set default input to curated schemes file and update …
himal2007 Jan 7, 2026
9f534ea
⚠️ feat(fetch): deprecate fetch command in favour of connect and upda…
himal2007 Jan 7, 2026
1761dee
🔼 feat(download): add description field to scheme info JSON
himal2007 Jan 16, 2026
d1e4f21
chore(docs): update messages
himal2007 Jan 16, 2026
84dfc6e
feat(docs): add MkDocs documentation
himal2007 Mar 4, 2026
d5efbf7
✂️ fix(connect): remove automatic fetch prompt from connect command
himal2007 Mar 5, 2026
58ac401
fix(update): refactor authentication error handling and messages in u…
himal2007 Mar 5, 2026
622378e
⏫ feat(update): collect skipped schemes instead of failing on first a…
himal2007 Mar 8, 2026
1f95b44
chore(update): clean update.py
himal2007 Mar 8, 2026
01c0471
⏫ feat(update): add unauthenticated access for mlstdb update command
himal2007 Mar 10, 2026
c20e0e8
⏫ feat(update): add unauthenticated access for mlstdb update command…
himal2007 Mar 10, 2026
dcd187c
⏫ fix(fetch): handle unregistered schemes gracefully
himal2007 Mar 12, 2026
85b31fd
⏫ feat(fetch): add unauthenticated access, session reuse, and parall…
himal2007 Mar 12, 2026
7750c51
⏫ feat(update): add resume and parallelisation options for improved …
himal2007 Mar 12, 2026
8db171d
Merge branch 'mkdocs-setup' into mlstdb_connect_fetch_refactor
himal2007 Mar 13, 2026
e58d035
⏫ docs: update README and add detailed usage documentation for mkdoc…
himal2007 Mar 13, 2026
d4fafcc
🛡 fix(auth): set restrictive permissions for credential files
himal2007 Mar 13, 2026
96dae68
⏫ release: prepare v1.0.0 with changelog updates and version bump
himal2007 Mar 13, 2026
5b4bc0a
Merge branch 'main' into mlstdb_connect_fetch_refactor
himal2007 Mar 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: mkdocs

on:
push:
branches:
- main
- master

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: 3.x
- uses: actions/cache@v4
with:
key: ${{ github.ref }}
path: .cache
- run: pip install "mkdocs_autorefs==1.3.1" "mkdocstrings==0.22.0" "mkdocstrings-python==1.3.*" "mkdocs-material"
- run: mkdocs gh-deploy --force
42 changes: 34 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,27 @@
# Changelog

## [0.1.7] - 2025-11-18
## [1.0.0] - 2026-03-13

- **License**: Changed from MIT to GPL v3. Original MIT-licensed code is preserved and attributed according to MIT terms.
- Added `get_db_type_from_url()` helper function to determine database type from URL, eliminating code duplication
- Removed redundant `fetch_resources()` function - now using `fetch_json()` directly
- Added acknowledgements section in README.md crediting BIGSdb_downloader and pyMLST projects
- Added CHANGELOG.md file to document version history
### Added
- New `mlstdb connect` command for streamlined OAuth credential registration ([#25](https://github.com/MDU-PHL/mlstdb/issues/25))
- Curated built-in scheme list (`mlst_schemes_all.tab`) — `mlstdb update` works out of the box without `fetch` ([#10](https://github.com/MDU-PHL/mlstdb/issues/10))
- `--no-auth` flag for unauthenticated access to public APIs on both `fetch` and `update`
- `--resume` flag for `update` to skip already-downloaded schemes
- `--threads` option for parallel downloads on both `fetch` and `update`
- Session reuse and HTTP connection pooling for improved performance
- Restrictive file permissions (`0600`) on stored credential files
- Comprehensive MkDocs documentation site with detailed guides for all commands

### Fixed
- Fetch looping error at 76% when processing databases ([#19](https://github.com/MDU-PHL/mlstdb/issues/19))
- Missing scheme URI resolution errors during fetch ([#18](https://github.com/MDU-PHL/mlstdb/issues/18))
- 401 errors on unregistered databases no longer terminate the process — skipped databases are reported at the end ([#17](https://github.com/MDU-PHL/mlstdb/issues/17))
- Automatic token refresh on expired session tokens

[0.1.7]: https://github.com/himal2007/mlstdb/releases/tag/v0.1.7
### Changed
- `fetch` command deprecated in favour of `connect` + `update` workflow ([#25](https://github.com/MDU-PHL/mlstdb/issues/25))
- `update` now uses the built-in curated scheme list by default (no `--input` required)
- Simplified README focused on the two-command workflow

## [0.2.0] - 2026-01-05

Expand All @@ -22,9 +35,22 @@
### Changed
- Installation instructions to recommend conda-forge channel and pip installation method

## [0.1.7] - 2025-11-18

[0.1.7]: https://github.com/MDU-PHL/mlstdb/releases/tag/v0.1.7
### Changed
- **License**: Changed from MIT to GPL v3. Original MIT-licensed code is preserved and attributed according to MIT terms.

### Added
- `get_db_type_from_url()` helper function to determine database type from URL
- Acknowledgements section in README.md crediting BIGSdb_downloader and pyMLST projects
- CHANGELOG.md file

### Improved
- Removed redundant `fetch_resources()` function — now using `fetch_json()` directly

[1.0.0]: https://github.com/MDU-PHL/mlstdb/releases/tag/v1.0.0
[0.2.0]: https://github.com/MDU-PHL/mlstdb/releases/tag/v0.2.0
[0.1.7]: https://github.com/MDU-PHL/mlstdb/releases/tag/v0.1.7
[#11]: https://github.com/MDU-PHL/mlstdb/issues/11
[#16]: https://github.com/MDU-PHL/mlstdb/issues/16
[#20]: https://github.com/MDU-PHL/mlstdb/issues/20
148 changes: 28 additions & 120 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,165 +8,73 @@
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/mlstdb/badges/downloads.svg)](https://anaconda.org/bioconda/mlstdb)

`mlstdb` is a Python package to update and manage the MLST database for the `mlst` tool using the PubMLST and BIGSdb Pasteur APIs. It is written to handle the OAuth2 authentication process that's required to access up-to-date MLST schemes available on these databases. This tool allows user to fetch MLST schemes, filter the schemes, and update the MLST database for the `mlst` tool.
Keep your [`mlst`](https://github.com/tseemann/mlst) databases up to date. `mlstdb` handles OAuth authentication with [PubMLST](https://pubmlst.org/) and [BIGSdb Pasteur](https://bigsdb.pasteur.fr/) so you can download the latest MLST schemes and build a BLAST database, in two commands.

-----
**[Full Documentation](https://MDU-PHL.github.io/mlstdb)**

## Table of Contents
## Install

- [mlstdb](#mlstdb)
- [Table of Contents](#table-of-contents)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [⚠️ Disclaimer / Caution](#️-disclaimer--caution)
- [Usage](#usage)
- [Final Steps](#final-steps)
- [Acknowledgements](#acknowledgements)
- [License](#license)

## Prerequisites

Should install `mlst` for the use of this tool.

## Installation

**Recommended installation method:**

First, create a conda environment with `mlst` installed:
```sh
conda create -n mlst -c bioconda mlst
conda activate mlst
```

Then install `mlstdb` using pip:
```sh
pip install mlstdb
```

<details>
<summary>Other installation methods</summary>

**Alternative installation methods:**

From bioconda (note: include `conda-forge` channel to resolve dependencies):
```sh
# From bioconda (include conda-forge for dependencies)
conda install -c conda-forge -c bioconda mlstdb
```

Or install both tools together:
```sh
# Or install both tools together
conda create -n mlst -c conda-forge -c bioconda mlst mlstdb
```

From PyPI only:
```sh
# From PyPI only
pip install mlstdb
```

> **Note:** If you encounter dependency errors when installing from bioconda (e.g., `nothing provides rauth >=0.7.3`), ensure you include the `-c conda-forge` channel in your installation command, or use the recommended pip installation method instead.

## ⚠️ Disclaimer / Caution
Please read before using `mlstdb`:

* Backup your original MLST databases before running any updates to avoid accidental overwrites or deletions.

* Do not blindly update all the schemes obtained from `mlstdb fetch`. Not all downloaded schemes are suitable or validated for the `mlst` tool.

* Carefully curate your list of schemes before running `mlstdb update`. Overwriting core MLST data with unverified schemes may cause downstream issues with tools like `mlst`.

## Usage
</details>

`mlstdb` uses a simple two step process to update the MLST database for the `mlst` tool. It has two main subcommands: `fetch` and `update`.
## Quick Start

1. **Fetch MLST schemes**
**1. Register with each database (one-time setup):**

```sh
mlstdb fetch --help
```

```console
Usage: mlstdb fetch [OPTIONS]

BIGSdb Scheme Fetcher Tool

This tool downloads MLST scheme information from BIGSdb databases. It will
automatically handle authentication and save the results.

Options:
-h, --help Show this message and exit.
-d, --db [pubmlst|pasteur] Database to use (pubmlst or pasteur)
-e, --exclude TEXT Scheme name must not include provided term
(default: cgMLST)
-m, --match TEXT Scheme name must include provided term (default:
MLST)
-s, --scheme-uris TEXT Optional: Path to custom scheme_uris.tab file
-f, --filter TEXT Filter species or schemes using a wildcard
pattern
-r, --resume Resume processing from where it stopped
-v, --verbose Enable verbose logging for debugging
mlstdb connect --db pubmlst
mlstdb connect --db pasteur
```

Use the `fetch` command to download MLST schemes from the BIGSdb databases. The `--db` argument specifies the database to use, which can be either `pubmlst` or `pasteur`. The `--exclude` and `--match` arguments can be used to filter the schemes based on the scheme name. The `--scheme-uris` argument can be used to provide a custom scheme URIs file. The `--filter` argument can be used to filter species or schemes using a wildcard pattern. The `--resume` flag can be used to resume processing from where it stopped. The `--verbose` flag can be used to enable verbose logging for debugging. This will create a `mlst_schemes_<db>.txt` file with the MLST schemes.

We can just use `mlstdb fetch` to download the MLST schemes from the BIGSdb databases. The command will prompt for the `db` (either `pubmlst` or `pasteur`) to fetch. If the registration is not done, it will prompt the user to register the client credentials. This will save the client credentials to the `~/.config/mlstdb` directory.

In cases where the tool does not find an appropriate scheme name, it will prompt the user to either set the missing schemes as 'missing' or auto-generate them. The user can choose the appropriate option as they are prompted.

<details>
<summary>Auto extraction of scheme?🤔</summary>

First, the script automatically tries to extract the scheme names from the `dbases.sh` file. If the scheme name is not found, it will prompt the user to either print `missing` in the output file or automatically create a scheme name based on the URL. For eg, for URL `https://rest.pubmlst.org/db/pubmlst_borrelia_seqdef/schemes/1`, the scheme name will be `borrelia`. If there are multiple schemes, it will append a number to the scheme name. For eg, for URLs `https://rest.pubmlst.org/db/pubmlst_chlamydiales_seqdef/schemes/38` and `https://rest.pubmlst.org/db/pubmlst_chlamydiales_seqdef/schemes/41`, the scheme names will be `chlamydiales_38` and `chlamydiales_41` respectively.

</details>
This opens a browser for OAuth registration. Follow the prompts to authorise `mlstdb`.


The script offers feature to filter for particular species/schemes. It is recommended to run with filter option and thus, download only the required schemes so as not to tamper with the existing DBs and schemes.

**📝Important**: `mlst` tool is designed for typing bacterial species only. Please make sure to filter the non-bacterial schemes from your schemes file.


2. **Update MLST database**
**2. Download schemes and build the BLAST database:**

```sh
mlstdb update --help
mlstdb update
```

```console
Usage: mlstdb update [OPTIONS]

Update MLST schemes and create BLAST database.
This downloads the curated MLST schemes from both PubMLST and Pasteur and creates a BLAST database.

Downloads MLST schemes from the specified input file and creates a BLAST
database from the downloaded sequences. Authentication tokens should be set
up using fetch.py.
**3. Use with `mlst`:**

Options:
-h, --help Show this message and exit.
-i, --input TEXT Path to mlst_schemes_<db>.tab containing MLST
scheme URLs [required]
-d, --directory TEXT Directory to save the downloaded MLST schemes
(default: pubmlst)
-b, --blast-directory TEXT Directory for BLAST database (default: blast)
-v, --verbose Enable verbose logging for debugging
```sh
mlst --blastdb blast/mlst.fa --datadir pubmlst your_assembly.fasta
```

Use the `update` command to update the MLST database and create a BLAST database. The `--input` argument specifies the path to the `mlst_schemes_<db>.tab` file containing MLST scheme URLs. The `--directory` argument specifies the directory to save the downloaded MLST schemes. The `--blast-directory` argument specifies the directory for the BLAST database. The `--verbose` flag can be used to enable verbose logging for debugging.

We can prepare a custom `mlst_schemes_<db>.tab` file with headers `database species scheme_description scheme URI`
and use `mlstdb update` to update the MLST database for select species and schemes. This will automatically create a BLAST database from the downloaded sequences.
That's it. For advanced scheme exploration, custom filtering, and detailed option reference, see the [full documentation](https://MDU-PHL.github.io/mlstdb).

## Final Steps
## Caution

After running all scripts, verify the database setup by running the `mlst` tool with the updated database:
```bash
mlst --blastdb <path_to_blast/mlst.fa> --datadir <path_to_pubmlst_dir>
```
- **Back up** your existing MLST databases before running updates.
- **Curate** your scheme list before updating — not all schemes are validated for the `mlst` tool.
- The `mlst` tool is designed for **bacterial species only**.

## Acknowledgements

This tool was inspired by and builds upon the work of:

- [BIGSdb_downloader](https://github.com/kjolley/BIGSdb_downloader) by Keith Jolley - The original OAuth-based downloader for BIGSdb databases
- [pyMLST](https://github.com/bvalot/pyMLST) - Python implementation for MLST with database management
Built upon the work of:

- [BIGSdb_downloader](https://github.com/kjolley/BIGSdb_downloader) by Keith Jolley
- [pyMLST](https://github.com/bvalot/pyMLST) by Benoit Valot

## License

Expand Down
1 change: 1 addition & 0 deletions docs/changelog.md
41 changes: 41 additions & 0 deletions docs/disclaimer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Disclaimer

!!! warning "Please read before using mlstdb"

Please take note of the following before using `mlstdb` to update your MLST databases.

## Back up your databases

Always back up your original MLST databases before running any updates. This protects against accidental overwrites or deletions.

```sh
cp -r /path/to/existing/pubmlst /path/to/backup/pubmlst_backup
cp -r /path/to/existing/blast /path/to/backup/blast_backup
```

## Curate your scheme list

Not all MLST schemes available on PubMLST and Pasteur are validated for use with the `mlst` tool. Before updating:

- Review the scheme list if using a custom input file
- Don't blindly download and apply all available schemes
- Overwriting core MLST data with unverified schemes may cause downstream issues

## Bacterial species only

The `mlst` tool is designed for typing **bacterial species only**. If you use the advanced `fetch` command to explore schemes, make sure to filter out non-bacterial schemes before updating.

## Authentication requirements

Some schemes require registration with the respective database. If you encounter authentication errors during `update`, check that you have:

1. Registered with both PubMLST and Pasteur (via `mlstdb connect`)
2. Enrolled in the specific databases you need within each platform

## Network considerations

`mlstdb` makes API calls to external servers (PubMLST and Pasteur). Be aware of:

- Network interruptions — use `--resume` to recover
- Rate limiting — keep `--threads` at 4 or below
- Firewall restrictions — ensure outbound HTTPS access to `rest.pubmlst.org` and `bigsdb.pasteur.fr`
Loading
Loading