Here I collect many R
functions that I’ve created over time. The primary purpose of this package is to neatly organise my R code in one place and have it ready to use whenever possible. Plus this forces me to document my functions and it's easier to share the analysis code. The code could be subject to frequent changes and is "always" in development. Feedback is welcome!
If you find an error please open a GitHub issue.
R
>=4.1.0- Mac or Linux operating system (not tested on Windows)
devtools::install_github("Ni-Ar/niar")
If you encounter installation issues try installing the R package dependencies before with:
install.packages(c('devtools', 'matrixStats', 'BiocManager', 'XICOR',
'ggplot2', 'ggrepel', 'scales', 'patchwork',
'MetBrewer', 'ggalluvial', 'ggfittext', 'ggseqlogo', 'seqinr',
'dplyr', 'tidyr', 'tibble', 'forcats', 'stringr'))
That step takes a while.
install.packages('Cairo')
If you get an error installing Cairo
you might need to first install the cairographics C library on your operating system from here.
The following Bioconductor packages:
BiocManager::install("Biostrings")
BiocManager::install("biomaRt")
BiocManager::install("DESeq2")
BiocManager::install("csaw")
BiocManager::install("msa")
To visualise the plots you might need to select the right graphics device, especially if you get an error that says something like:
Error in diff.default(from) :
Shadow graphics device error: r error 4 (R code execution error)
In grDevices:::png("/tmp/Rtmp....", :
unable to open connection to X11 display ''
To solve this, go to the Tools Menu (at the top of the window) > Global Options > General section > Graphics tab > and select Cairo from the Graphic Device Backend drop-down menu. Then click “Apply.” The plots should now be correctly displayed.
Currently, this package contains:
- one function to perform Principal Component Analysis (PCA) in 2D with lots of options to enrich visualisation and exploration. See the vignette below for more details.
- several functions to fetch and parse data analysed with vast-tools for alternatively spliced events and gene expression. There are also plotting functions to quickly glimpse into the data (e.g.
plot_corr_gene_expr_psi()
). - Some publically available datasets have been packaged in ad-hoc functions to quickly plot and explore the data:
- Mouse Development ENCODE data
plot_mouse_tissue_devel()
which uses data I preprocessed fetched withget_mouse_tissue_devel_tbl()
. See the vignette below for more details.
- Mouse Development ENCODE data
- Some Biomart handy functions for quick gene IDs conversions (e.g.
ensembl_id_2_gene_name()
). - Some DESeq2 wrappers.
- Some rMATS wrappers.
- EpiProfile (Histone Mass Spectrometry) post-processing functions.
- Multiple sequence analysis from fasta format to generate PWMs, visualise logos or Jensen-Shannon divergence.
- Generic genomics files handling (e.g. bed, gtf).
More examples grouped by topic are listed below:
The easiest way to make a PCA assuming mat
is your numerical matrix is:
showme_PCA2D(mat)
To know more you can type:
?showme_PCA2D()
The underlying function is prcomp
and you can pass extra arguments with ...
for example:
showme_PCA2D(mat, scale. = T, center = F)
Extra info can be passed from a metadata dataframe with mt =
. To specify which column of the dataframe contains the colnames
of the matrix mat
use mcol
. In the following example the mt
contains a column called sample_name
:
showme_PCA2D(mat = mat, mt = mt, mcol = "sample_name", show_variance = T, show_stats = T)
To show the PCA loadings:
showme_PCA2D(mat = mat, n_loadings = 12)
More details can be found in the vignette below.
Since I use vast-tools
quite often I made functions to easily import the output tables into R
. Namely, grep_psi()
or grep_gene_expression()
import the PSI of an AS events or gene expression levels respectively and parse the data into a long-format dataframe with the accompanying tidy functions tidy_vst_psi()
or tidy_vst_expr()
. These functions work great with the magrittr
pipe (%>%
) or the base R
pipe operator (|>
) as in:
grep_psi(inclusion_tbl = file.path(dir_location, "INCLUSION_LEVELS_FULL-hg38-n-v251.tab"),
vst_id = c("HsaEX0000001", "HsaEX0000002")) |>
tidy_vst_psi()
These functions are basically “hacks” that call the system grep
command, and write to a temporary file that is then read into R and removed from the system. Maybe a better way would probably be to implement the functions in Rcpp
.
Link for mouse ENCODE AS data AS exploration.
- Make a vignette for
biomaRt
functions - Make a vignette for vast-tools utility and plotting functions, especially correlations.
- Maybe add the mouse ENCODE data (fetched with
get_mouse_tissue_devel_tbl
) to the package? - Make a vignette for logo analysis