Skip to content
/ hafoe Public

A computational tool for the exploratory analysis of chimeric AAV libraries and identification of novel tissue-specific variants

License

Notifications You must be signed in to change notification settings

abi-am/hafoe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hafoe

hafoe is a command-line-based tool for the automated exploratory analysis of AAV chimeric libraries and identification of enriched variants in desired tissues using long-read sequencing datasets.

Citation

Jalatyan, T., Aznauryan, E., Hasan, R. et al. hafoe: an interactive tool for the analysis of chimeric AAV libraries after random mutagenesis. Gene Ther (2025). https://doi.org/10.1038/s41434-025-00548-3

Operating Systems

hafoe works with Unix operating system (tested for Ubuntu Linux).

Installation

Clone the project from github and make hafoe.sh file executable:

git clone https://github.com/abi-am/hafoe.git
cd hafoe
chmod +x hafoe.sh

Usage

The common hafoe usage is:

./hafoe.sh \
    --explore \
    --identify \
    -parentlib <path_to_parental_sequences_fasta_file> \
    -chimericlib <path_to_chimeric_sequences_csv/fastq_file> \
    -enrichedlib1 <path_to_dir_with_enriched_sequences_fastq_file(s)> \
    -o <output_directory> \
    -title_of_the_run <title> 

Required arguments

hafoe works with one of the --explore, --identify options or both

--explore option should be specified for exploratory analysis of chimeric sequences. When using this option, the required arguments are:

-parentlib      the full path to the fasta file of parental sequences
-chimericlib    the full path to the csv file of chimeric sequences and their abundances or fastq file of chimeric sequences

--identify option should be specified for identification of novel tissue-specific variants. When using this option, the required argument is:

-enrichedlib1   the full path to directory containing fastq file(s) of sequences obtained after enrichment in one or more tissue samples

When using --identify option without --explore, the required argument is:

-exploreout     the full path to output directory generated by running _hafoe_ with only --explore option

Additional arguments

Run ./hafoe.sh to see additional arguments.

-o                 output directory (optional: the default is hafoe_out)
-samtools          samtools path (optional: if not supplied, hafoe will use the samtools installed on the system)
-bowal             bowtie2 path (optional: hafoe will use the default installation path in the user's directory)
-bowb              bowtie2-build path (optional: hafoe will use the default installation path in the user's directory)
-cdhitest          cd-hit-est path (optional: hafoe will use the default installation path in the user's directory)
-cdhitest2d        cd-hit-est-2d path (optional: hafoe will use the default installation path in the user's directory)
-clustalo          clustalo path (optional: hafoe will use the default installation path in the user's directory)
-rlib              path to the directory where newly installed R libraries should be stored (optional: hafoe will create rlib directory in the output directory by default)
-readlength        fragment size used for neighbor-aware serotype identification (optional: default is 100)
-stepsize          the distance between consecutive fragments (optional: the default is 10)
-vd_criterion      option used for filtering out parental serotypes of low quality, values include sum, avg (optional: default is sum)
-title_of_the_run  title of the run (optional: chimericlib filename is used as default)

Test run

To test how hafoe works navigate to hafoe/ directory and run chmod +x example/run_hafoe.sh, example/run_hafoe.sh. All required parameters are specified, however, you may need to add -cdhitest, -cdhitest2d, -rlib options to point to corresponding tools in your system.

A successful test run should generate the example/hafoe_out/ output directory with output files and reports.

Output

hafoe's main outputs are interactive HTML reports with multiple plots describing the diversity of the chimeric library, the prevalence of parental serotypes, the serotype composition of representative variants, and variants enriched in target tissues. HTML reports are located in <output_directory>/reports/main and <output_directory>/reports/supplementary directories.

System requirements

The following software and packages should be pre-installed in your system:

R (v4.1.3) with the following packages: dplyr (v1.0.9), ORFik (v1.12.13), plotly (v4.10.0), ggplot2 (v3.3.6), gplots (v3.1.3), microseq (v2.1.4), Biostrings (v2.60.2), string (v1.4.0), cowplot (v1.1.1), seqinr (v4.2.8),

Python (v3.9.5) with the following packages: numpy (v1.24.3), pandas (v1.3.4), Bio (v1.5.3), bokeh (v2.4.3), seaborn (v0.12.2), selenium (v4.10.0),

Bowtie 2 (v2.4.2), CD-HIT (v4.8.1), Clustal Omega (v1.2.4), and SAMtools (v1.9).

About

A computational tool for the exploratory analysis of chimeric AAV libraries and identification of novel tissue-specific variants

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •