-
Notifications
You must be signed in to change notification settings - Fork 32
Developer Setup
To contribute to HATCHet, we recommend the following steps:
-
Clone and check out the
develop
branch of HATCHet -
Install the Gurobi solver and academic license (for fastest testing times). Set the
GUROBI_HOME
andGRB_LICENSE_FILE
environment variables. If you cannot install Gurobi, see http://compbio.cs.brown.edu/hatchet/README.html#using-a-solver for alternate approaches. The steps that currently run in our CI to install Gurobi are roughly:
wget https://packages.gurobi.com/9.0/gurobi9.0.2_linux64.tar.gz -O gurobi9.0.2_linux64.tar.gz
tar xvzf gurobi9.0.2_linux64.tar.gz
(cd gurobi902/linux64/src/build && make)
(cd gurobi902/linux64/lib && ln -f -s ../src/build/libgurobi_c++.a libgurobi_c++.a)
export GUROBI_HOME=$(realpath gurobi902)
(cd gurobi902/linux64/bin && ./grbgetkey -q <your_gurobi_key_here> --path ${GUROBI_HOME})
export GRB_LICENSE_FILE=${GUROBI_HOME}/gurobi.lic
- Install commonly used BioInformatics tools that HATCHet relies on. You will have to set certain environment variables to tell HATCHet where it can find these tools. See https://github.com/raphael-group/hatchet/blob/develop/.github/workflows/main.yml to see how we're doing all this in our CI. Instead of specifying environment variables, you may choose to modify the included
hatchet.ini
(thepaths
section).
This list currently includes:
- SAMtools
- set HATCHET_PATHS_SAMTOOLS to the folder where the samtools executable can be found.
- BCFtools
- set HATCHET_PATHS_BCFTOOLS to the folder where the bcftools executable can be found.
- Tabix
- set HATCHET_PATHS_TABIX and HATCHET_PATHS_BGZIP to the folder where the tabix
(and bgzip) can be found.
- Mosdepth
- set HATCHET_PATHS_MOSDEPTH to the folder where the mosdepth executable can be found.
- Picard Tools
- set HATCHET_PATHS_PICARD to the folder where picard.jar
(or picard if you installed picard tools from conda) can be found.
- Shapeit 2
- set HATCHET_PATHS_SHAPEIT to the folder where the shapeit executable can be found.
- Phasing reference panel files
- These files can be found at https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz.
Set HATCHET_DOWNLOAD_PANEL_REFPANELDIR to the folder where you decompress+untar this file.
Please note that HATCHet may download additional chain files inside this folder, if needed.
So make sure that this is a writable location.
You should see a 1000GP_Phase3 folder inside this folder.
- Reference human genome
- We recommend the one at http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz.
Set HATCHET_PATHS_REFERENCE to the full path of the decompressed .fa file.
- Testing data for HATCHet
- We provide some testing data for HATCHet at https://zenodo.org/record/4046906.
You will want to set HATCHET_TESTS_BAM_DIRECTORY to the folder where you extract all those files.
One possible way to do this is:
pip3 install zenodo-get
python3 -m zenodo_get 10.5281/zenodo.4046906 --output-dir=testdata
export HATCHET_TESTS_BAM_DIRECTORY=$(realpath testdata)
- Create an activate a new conda environment with Python 3.8 or 3.9 (preferred)
conda create --name hatchet python=3.9 && conda activate hatchet
-
Start a new branch, install HATCHet in developer mode, with the
dev
extras.
cd <path_to_hatchet_repo>
git checkout -b <your_awesome_branch_name>
pip install -e .[dev]
If the pip install
step fails because of an error in C++ compilation, you may need to set the environment variable CXXFLAGS
to -pthread
.
- Install the pre-commit hook. This will allow you to identify style/formatting/coding issues every time you commit your code. Pre-commit automatically formats the files in your repository according to certain standards, and/or warns you if certain best practices are not followed.
pre-commit install
- Run
HATCHet Check
. This is crucial to (quickly) see ifHATCHet
is likely to work for your setup or not.
hatchet check
- Run the unit tests. This step may take up to an hour, but this is crucial to see if
HATCHet
is working correctly.
pytest tests
If any tests fail, do not proceed, but carefully go through the above procedure. Contact us on Github issues if you still can't figure it out.
NOTE: some of the steps in test_steps.py will fail with newer samtools/bcftools versions (e.g., 1.9). Try using version 1.7 of each as used in the GitHub Actions YAML file.
-
Tweak/modify the code, make HATCHet better!
-
Add new tests for any features you add. Re-run the unit tests to make sure you didn't break anything.
pytest tests
- Push your code to Github; send a PR towards the
develop
branch. We intend to follow the Gitflow workflow to accept contributions to HATCHet and release new versions.
Our CI will automatically run the pre-commit
and pytest
steps for PRs towards the protected branches, so running these steps on your local installation will prevent surprises for you later.
Thank you for contributing to HATCHet!