hafoe is a command-line-based tool for the automated exploratory analysis of AAV chimeric libraries and identification of enriched variants in desired tissues using long-read sequencing datasets.
Jalatyan, T., Aznauryan, E., Hasan, R. et al. hafoe: an interactive tool for the analysis of chimeric AAV libraries after random mutagenesis. Gene Ther (2025). https://doi.org/10.1038/s41434-025-00548-3
hafoe works with Unix operating system (tested for Ubuntu Linux).
Clone the project from github and make hafoe.sh file executable:
git clone https://github.com/abi-am/hafoe.git
cd hafoe
chmod +x hafoe.sh
The common hafoe usage is:
./hafoe.sh \
--explore \
--identify \
-parentlib <path_to_parental_sequences_fasta_file> \
-chimericlib <path_to_chimeric_sequences_csv/fastq_file> \
-enrichedlib1 <path_to_dir_with_enriched_sequences_fastq_file(s)> \
-o <output_directory> \
-title_of_the_run <title>
hafoe works with one of the --explore, --identify options or both
--explore option should be specified for exploratory analysis of chimeric sequences. When using this option, the required arguments are:
-parentlib the full path to the fasta file of parental sequences
-chimericlib the full path to the csv file of chimeric sequences and their abundances or fastq file of chimeric sequences
--identify option should be specified for identification of novel tissue-specific variants. When using this option, the required argument is:
-enrichedlib1 the full path to directory containing fastq file(s) of sequences obtained after enrichment in one or more tissue samples
When using --identify option without --explore, the required argument is:
-exploreout the full path to output directory generated by running _hafoe_ with only --explore option
Run ./hafoe.sh to see additional arguments.
-o output directory (optional: the default is hafoe_out)
-samtools samtools path (optional: if not supplied, hafoe will use the samtools installed on the system)
-bowal bowtie2 path (optional: hafoe will use the default installation path in the user's directory)
-bowb bowtie2-build path (optional: hafoe will use the default installation path in the user's directory)
-cdhitest cd-hit-est path (optional: hafoe will use the default installation path in the user's directory)
-cdhitest2d cd-hit-est-2d path (optional: hafoe will use the default installation path in the user's directory)
-clustalo clustalo path (optional: hafoe will use the default installation path in the user's directory)
-rlib path to the directory where newly installed R libraries should be stored (optional: hafoe will create rlib directory in the output directory by default)
-readlength fragment size used for neighbor-aware serotype identification (optional: default is 100)
-stepsize the distance between consecutive fragments (optional: the default is 10)
-vd_criterion option used for filtering out parental serotypes of low quality, values include sum, avg (optional: default is sum)
-title_of_the_run title of the run (optional: chimericlib filename is used as default)
To test how hafoe works navigate to hafoe/ directory and run chmod +x example/run_hafoe.sh, example/run_hafoe.sh. All required parameters are specified, however, you may need to add -cdhitest, -cdhitest2d, -rlib options to point to corresponding tools in your system.
A successful test run should generate the example/hafoe_out/ output directory with output files and reports.
hafoe's main outputs are interactive HTML reports with multiple plots describing the diversity of the chimeric library, the prevalence of parental serotypes, the serotype composition of representative variants, and variants enriched in target tissues. HTML reports are located in <output_directory>/reports/main and <output_directory>/reports/supplementary directories.
The following software and packages should be pre-installed in your system:
R (v4.1.3) with the following packages: dplyr (v1.0.9), ORFik (v1.12.13), plotly (v4.10.0), ggplot2 (v3.3.6), gplots (v3.1.3), microseq (v2.1.4), Biostrings (v2.60.2), string (v1.4.0), cowplot (v1.1.1), seqinr (v4.2.8),
Python (v3.9.5) with the following packages: numpy (v1.24.3), pandas (v1.3.4), Bio (v1.5.3), bokeh (v2.4.3), seaborn (v0.12.2), selenium (v4.10.0),
Bowtie 2 (v2.4.2), CD-HIT (v4.8.1), Clustal Omega (v1.2.4), and SAMtools (v1.9).