HiCcompare
provides functions for joint normalization and difference detection in multiple Hi-C datasets. HiCcompare
operates on processed Hi-C data in the form of chromosome-specific chromatin interaction matrices. HiCcompare is available on Bioconductor.
If you have more than two Hi-C datasets, please see our other package, multiHiCcompare.
HiCcompare
accepts three-column tab-separated text files storing chromatin interaction matrices in a sparse matrix format which are available from several sources such as the Aiden Lab (.hic
files) and the Mirnylab FTP site (.cool
files). HiCcompare
performs differential chromatin interaction analysis between two biological conditions, one Hi-C matrix per condition.
First, HiCcompare
jointly normalizes two Hi-C datasets to remove biases between them. Then, it can detect significant differences between the datasets using a genomic distance-stratified permutation test. The novel concept of the MD plot, based on the commonly used MA plot or Bland-Altman plot, is the basis for these methods. The log Minus (difference) between chromatin interaction frequencies is plotted on the Y-axis. The genomic Distance is plotted on the X-axis. The MD plot allows for visualization, normalization, and comparing the differences between the Hi-C datasets in a distance-stratified manner.
The main functions are:
hic_loess()
- performs joint loess
normalization, minimizing global and local biases between Hi-C datasetshic_compare()
- performs the difference detection process to detect significant changes between Hi-C datasets and assist in comparative analysisSeveral demo Hi-C datasets are also included in the package. Refer to the HiCcompare
vignette for full usage instructions, vignette("HiCcompare-vignette")
First, make sure you have all dependencies installed in R.
install.packages(c('dplyr', 'data.table', 'ggplot2', 'gridExtra', 'mgcv', 'parallel', 'devtools')) if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install(c("InteractionSet", "GenomicRanges", "IRanges", "BiocParallel", "QDNAseq", "GenomeInfoDbData"))
To install HiCcompare
from Bioconductor, use the following commands.
# Bioconductor development version and GitHub Release contain major changes for difference detection # it is recommended to use the GitHub release until the next Bioconductor update if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("HiCcompare") library(HiCcompare)
Or, install the latest version of HiCcompare
directly from the GitHub.
library(devtools) install_github('dozmorovlab/HiCcompare', build_vignettes = TRUE) library(HiCcompare)
First, you will need to obtain some Hi-C data. Data is available from the sources listed in the overview, along with many others. You will need to extract the data and read it into R as either a 3 column sparse upper triangular matrix or a 7-column BEDPE file. For more details on data extraction, see the HiCcompare
’s vignette.
Below is an example analysis using HiCcompare
. The data in 3-column sparse upper triangular matrix format is loaded, and the first step is to create a hic.table
object using the create.hic.table()
function. Next, the two Hi-C matrices are jointly normalized using the hic_loess()
function. Finally, difference detection can be performed using the hic_compare()
function. The hic_loess()
and hic_compare()
functions will also produce an MD plot for visualizing the differences between the datasets.
# load data library(HiCcompare) data("HMEC.chr22") data("NHEK.chr22") # create the `hic.table` object chr22.table = create.hic.table(HMEC.chr22, NHEK.chr22, chr = 'chr22') head(chr22.table) # Jointly normalize data for a single chromosome hic.table = hic_loess(chr22.table, Plot = TRUE) head(hic.table) # input hic.table object into hic_compare hic.table = hic_compare(hic.table, Plot = TRUE) head(hic.table)
Stansfield, John C., Kellen G. Cresswell, Vladimir I. Vladimirov, and Mikhail G. Dozmorov. HiCcompare: An R-Package for Joint Normalization and Comparison of HI-C Datasets. BMC Bioinformatics 19, no. 1 (December 2018).
Stansfield, John C., Duc Tran, Tin Nguyen, and Mikhail G. Dozmorov. R Tutorial: Detection of Differentially Interacting Chromatin Regions From Multiple Hi-C Datasets. Current Protocols in Bioinformatics, May 2019
HiCcompareWorkshop - “Detection of Differentially Interacting Chromatin Regions From Multiple Hi-C Datasets” workshop presented on Bioconductor 2020 conference
The HiCcompare
paper included several supplemental files that showcase some of the methods’ usage and reasoning. Below are the titles and brief descriptions of each of these vignettes along with links to the compiled .pdf
and the source .Rmd
files.
Normalization method comparison - Comparison of several Hi-C normalization techniques to display the persistence of bias in individually normalized chromatin interaction matrices, and its effect on the detection of differential chromatin interactions. Source
Estimation of the IF power-law depencence - Estimation of the power-law dependence between the log10 − log10 interaction frequencies and distance between interacting regions. This vignette displays the reasoning behind using a power-law function to simulate the signal portion of Hi-C matrices. Source
Estimation of the SD power-law dependence - Estimation of the power-law dependence between the log10 − log10 SD of interaction frequencies and distance between interacting regions. This vignette displays the reasoning behind using a power-law function to simulate the noise component of Hi-C matrices. Source
Estimation of proportion of zeros - Estimation of the dependence between the proportion of zeros and distance between interacting regions. This vignette shows the distribution of zeros in real Hi-C data. The results were used for modeling the proportion of zeros in simulated Hi-C matrices with a linear function. Source
Evaluation of difference detection in simulated data - Extended evaluation of differential chromatin interaction detection analysis using simulated Hi-C data. Many different classifier performance measures are presented. Note: if trying to compile the source .Rmd
this will take a long time to knit. Source
Evaluation of difference detection in real data - Extended evaluation of differential chromatin interaction detection analysis using real Hi-C data. Many different classifier performance measures are presented. Note: if trying to compile the source .Rmd
this will take a long time to knit. Source
loess at varying resolution - Visualization of the loess
joint normalization over varying resolutions. This vignette shows that increasing sparsity of Hi-C matrices with increasing resolution causes loess to become less useful for normalization at high resolutions. Source
Suggestions for new features and bug reports are welcome. Please create a new issue for any of these or contact the author directly: @jstansfield0 (stansfieldjc@vcu.edu)
Authors: @jstansfield0 (stansfieldjc@vcu.edu) & @mdozmorov (mikhail.dozmorov@vcuhealth.org)