This repository provides Python scripts for augmenting AlphaFold 3 input JSON files with Multiple Sequence Alignments (MSAs) generated by either MMseqs2 or plmMSA. These scripts streamline the process of preparing input data for AlphaFold 3 by automating the integration of crucial MSA information.
To set up the necessary environment, execute the following command to install the required Python packages:
pip install .These scripts are designed to process existing AlphaFold 3 input JSON files. An example of a basic input JSON structure is shown below:
{
"name": "2PV7",
"sequences": [
{
"protein": {
"id": ["A", "B"],
"sequence": "GMRESYANENQFGFKTINSDIHKIVIVGGYGKLGGLFARYLRASGYPISILDREDWAVAESILANADVVIVSVPINLTLETIERLKPYLTENMLLADLTSVKREPLAKMLEVHTGAVLGLHPMFGADIASMAKQVVVRCDGRFPERYEWLLEQIQIWGAKIYQTNATEHDHNMTYIQALRHFSTFANGLHLSKQPINLANLLALSSPIYRLELAMIGRLFAQDAELYADIIMDKSENLAVIETLKQTYDEALTFFENNDRQGFIDAFHKVRDWFGDYSEQFLKESRQLLQQANDLKQG"
}
}
],
"modelSeeds": [1],
"dialect": "alphafold3",
"version": 1
}The mmseqs.py script facilitates the integration of MSAs generated using MMseqs2 into your AlphaFold 3 input JSON.
Command-line usage:
af3-mmseqs <input_json> \
[--output_json <output_json>] \
[--host_url <host_url>]Arguments:
<input_json>: Specifies the path to the AlphaFold 3 input JSON file you wish to modify.[--output_json <output_json>]: (Optional) Defines the path for the output JSON file with the added MSA. If omitted, the output will be directed to standard output (/dev/stdout).[--host_url <host_url>]: (Optional) Sets the URL for the MMseqs API server. The default value ishttps://api.colabfold.com/.
The add_plmmsa_msa.py script enables the addition of MSAs generated using plmMSA to your AlphaFold 3 input JSON.
Command-line usage:
af3-plmmsa --input_json <input_json> \
[--output_json <output_json>] \
[--output_a3m <output_a3m>] \
[--use_pairing]Arguemnts:
--input_json <input_json>: Specifies the path to the AlphaFold 3 input JSON file.[--output_json <output_json>]: (Optional) Sets the output path for the modified JSON file. Defaults to standard output (/dev/stdout).[--host_url <host_url>]: (Optional) Specifies the URL of the MMseqs API server. The default ishttps://deepfold.com/api/colab.[--use_pairing]: (Optional) A flag to indicate whether paired MSA data should be used.
af3-plmmsa examples/2PV7.json > input.jsonWe welcome contributions to enhance this project! If you'd like to contribute, please follow these guidelines:
- Fork the repository: Create your own fork of this repository.
- Create a branch: Make your changes in a dedicated branch (e.g.,
feature/new-functionalityorbugfix/issue-123). - Follow coding standards: Adhere to the existing Python coding style (PEP 8 is recommended).
- Write tests: If you add new features or fix bugs, please include relevant unit tests to ensure the functionality works as expected.
- Document your changes: Update the README.md and any relevant documentation to reflect your contributions.
- Submit a pull request: Once you've made your changes and are satisfied, submit a pull request to the main repository. Clearly describe the changes you've made and why they are beneficial.
We appreciate your contributions!
MIT License