Skip to content

Map amino acid sites of post-translational modifications from models or data to reference positions.

License

Notifications You must be signed in to change notification settings

gyorilab/protmapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8158098 · Jun 26, 2024
Jun 26, 2024
Oct 25, 2021
Jun 26, 2024
Dec 11, 2018
Jun 25, 2024
Jun 26, 2024
Dec 19, 2018
Jun 26, 2024
Jun 26, 2024
Jun 26, 2024

Repository files navigation

Protmapper

The Protmapper maps protein sites to the human reference sequence based on UniProt, PhosphoSitePlus, and manual curation.

Installation and usage

Python package

The Protmapper is a Python package that is available on PyPI and can be installed as:

pip install protmapper

Docker container

Protmapper can be run as a local service via a Docker container exposing a REST API as:

docker run -d -p 8008:8008 gyorilab/protmapper:latest

Example: once the container is running, you can send requests to the REST API

curl -X POST -H "Content-Type: application/json" -d '{"site_list": [["P28482", "uniprot", "T", "184"]]}' http://localhost:8008/map_sitelist_to_human_ref

which is equivalent to the following Python code using the requests package

import requests
url = 'http://localhost:8008/map_sitelist_to_human_ref'
data = {'site_list': [['P28482', 'uniprot', 'T', '184']]}
response = requests.post(url, json=data)
print(response.json())

Command line interface

In addition to supporting usage via a Python API and a REST service, Protmapper also provides a command line interface that can be used as follows.

Run Protmapper on a list of proteins with residues and sites provided in a
text file.

positional arguments:
  input                 Path to an input file. The input file is a text file
                        in which each row consists of four comma separated
                        values, with the first element being a protein ID, the
                        second, the namespace in which that ID is valid
                        (uniprot or hgnc),third, an amino acid represented as
                        a single capital letter, and fourth, a site position
                        on the protein.
  output                Path to the output file to be generated. Each line of
                        the output file corresponds to a line in the input
                        file. Each line represents a mapped site produced by
                        Protmapper.

optional arguments:
  -h, --help            show this help message and exit
  --peptide             If given, the third element of each row of the input
                        file is a peptide (amino acid sequence) rather than a
                        single amino acid residue. In this case, peptide-
                        oriented mappings are applied. In this mode the
                        following boolean arguments are ignored.
  --no_methionine_offset
                        If given, will not check for off-by-one errors in site
                        position (possibly) attributable to site numbering
                        from mature proteins after cleavage of the initial
                        methionine.
  --no_orthology_mapping
                        If given, will not check sequence positions for known
                        modification sites in mouse or rat sequences (based on
                        PhosphoSitePlus data).
  --no_isoform_mapping  If given, will not check sequence positions for known
                        modifications in other human isoforms of the protein
                        (based on PhosphoSitePlus data).

Example: the sample file cli_input.csv has the following content

MAPK1,hgnc,T,183
MAPK1,hgnc,T,184
MAPK1,hgnc,T,185
MAPK1,hgnc,T,186

By running the following command

protmapper cli_input.csv output.csv

we get output.csv which has the following content

up_id,error_code,valid,orig_res,orig_pos,mapped_id,mapped_res,mapped_pos,description,gene_name
P28482,,False,T,183,P28482,T,185,INFERRED_MOUSE_SITE,MAPK1
P28482,,False,T,184,P28482,T,185,INFERRED_METHIONINE_CLEAVAGE,MAPK1
P28482,,True,T,185,,,,VALID,MAPK1
P28482,,False,T,186,,,,NO_MAPPING_FOUND,MAPK1

Documentation

For a detailed documentation of the Protmapper, visit http://protmapper.readthedocs.io

Funding

The development of Protmapper is funded under the DARPA grants W911NF018-1-0124 and HR00112220036.

Citation

@article{bachman2022protmapper,
  author = {Bachman, John A and Sorger, Peter K and Gyori, Benjamin M},
  doi = {10.1101/822668},
  journal = {bioRxiv},
  publisher = {Cold Spring Harbor Laboratory},
  title = {{Assembling a corpus of phosphoproteomic annotations using ProtMapper to normalize site information from databases and text mining}},
  url = {https://www.biorxiv.org/content/10.1101/822668v4},
  year = {2022}
}

About

Map amino acid sites of post-translational modifications from models or data to reference positions.

Resources

License

Stars

Watchers

Forks

Packages

No packages published