Installation

BiocManager::install("TADCompare")

Introduction

Using the output of TADCompare and TimeCompare, we can do a range of analyses. One common one is gene ontology enrichment analysis to determine the pathways in which genes near TAD boundaries occur in. To do this, we use rGREAT an R package for performing gene ontology enrichment analysis.

Performing gene ontology analysis using TADCompare

In the first example, we show how to perform gene ontology enrichment using differential boundaries. Here, we perform the analysis on shifted boundaries detected in matrix 1.

library(rGREAT)
# Reading in data
data("rao_chr22_prim")
data("rao_chr22_rep")

# Performing differential analysis
results <- TADCompare(rao_chr22_prim, rao_chr22_rep, resolution = 50000)

# Saving the results into its own data frame
TAD_Frame <- results$TAD_Frame

# Filter data to only include complex boundaries enriched in the second
# contact matrix
TAD_Frame <- TAD_Frame %>% dplyr::filter((Type == "Shifted") & 
                                         (Enriched_In == "Matrix 2"))

# Assign a chromosome and convert to a bed format
TAD_Frame <- TAD_Frame %>% dplyr::select(Boundary) %>% mutate(chr = "chr22", 
    start = Boundary, end = Boundary) %>% dplyr::select(chr, start, end)

# Set up rGREAT job with default parameters
great_shift <- submitGreatJob(TAD_Frame, request_interval = 1, version = "2.0")

# Submit the job
enrichment_table <- getEnrichmentTables(great_shift)

# Subset to only include vital information
enrichment_table <- bind_rows(enrichment_table, .id = "source") %>% 
  dplyr::select(Ontology = source, Description = name, 
                `P-value` = Hyper_Raw_PValue)

# Print head organizaed by p-values
head(enrichment_table %>% dplyr::arrange(`P-value`))
##                Ontology                                          Description
## 1 GO Molecular Function                   gamma-glutamyltransferase activity
## 2 GO Biological Process                     glutathione biosynthetic process
## 3 GO Cellular Component         anchored to external side of plasma membrane
## 4 GO Cellular Component        intrinsic to external side of plasma membrane
## 5 GO Biological Process                         peptide biosynthetic process
## 6 GO Molecular Function transferase activity, transferring amino-acyl groups
##       P-value
## 1 0.001802360
## 2 0.002927596
## 3 0.003152529
## 4 0.004051881
## 5 0.004725996
## 6 0.004950625

The first column, “Ontology,” is simply the domain from which the corresponding ontology (“Description” column) comes from. Here, we use the default, which is the GO ontologies. For more available ontologies, see the rGREAT vignette. “Description” is the pathway itself. “P-value” is the unadjusted hypergeometric p-value, as output by rGREAT. rGREAT also provides binomial p-values (Binom_Raw_Pvalue, Binom_Adjp_BH) and adjusted hypergeometric p-values (Hyper_Adjp_BH).

Now we demonstrate how to perform the same analysis but for all boundary types simultaneously. In this case, we use time-varying data.

# Read in time course data
data("time_mats")
# Identifying boundaries
results <- TimeCompare(time_mats, resolution = 50000)

# Pulling out the frame of TADs
TAD_Frame <- results$TAD_Bounds

# Getting coordinates for TAD boundaries and converting into bed format
Bound_List <- lapply(unique(TAD_Frame$Category), function(x) {
    TAD_Frame %>% filter((Category == x)) %>% mutate(chr = "chr22") %>% 
        dplyr::select(chr, Coordinate) %>% 
        mutate(start = Coordinate, end = Coordinate) %>% 
        dplyr::select(chr, start, end)
})

# Performing rGREAT analysis for each boundary Category
TAD_Enrich <- lapply(Bound_List, function(x) {
  getEnrichmentTables(submitGreatJob(x, request_interval = 1, version = "2.0"))
})

# Name list of data frames to keep track of which enrichment belongs to which
names(TAD_Enrich) <- unique(TAD_Frame$Category)

# Bind each category of pathway and create new column for each pathway
TAD_Enrich <- lapply(names(TAD_Enrich), function(x) {
  bind_rows(lapply(TAD_Enrich[[x]], function(y) {
    y %>% mutate(Category = x)
  }), .id = "source")
})

# Bind each boundary category together and pull out important variables
enrichment_table <- bind_rows(TAD_Enrich) %>% 
  dplyr::select(Ontology = source, Description = name, 
                `P-value` = Hyper_Raw_PValue, Category)

# Get the top enriched pathways
head(enrichment_table %>% dplyr::arrange(`P-value`))
##                Ontology
## 1 GO Biological Process
## 2 GO Molecular Function
## 3 GO Biological Process
## 4 GO Biological Process
## 5 GO Biological Process
## 6 GO Molecular Function
##                                                Description      P-value
## 1 positive regulation of B cell receptor signaling pathway 0.0002254283
## 2                               lipid transporter activity 0.0003760684
## 3          regulation of B cell receptor signaling pathway 0.0004508185
## 4                               Schwann cell proliferation 0.0006761897
## 5                            lipoprotein metabolic process 0.0007139088
## 6        peptide-methionine (R)-S-oxide reductase activity 0.0009015354
##              Category
## 1         Dynamic TAD
## 2   Highly Common TAD
## 3         Dynamic TAD
## 4 Early Appearing TAD
## 5   Highly Common TAD
## 6   Highly Common TAD

These columns are the same as the differential analysis but with an extra column, “Category,” indicating the type of time-varying TAD boundary.

Session Info

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] rGREAT_1.20.0        GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 
## [4] IRanges_2.22.2       S4Vectors_0.26.1     BiocGenerics_0.34.0 
## [7] TADCompare_1.1.1     dplyr_1.0.2          BiocStyle_2.16.1    
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-0            rjson_0.2.20               
##   [3] ggsignif_0.6.0              ellipsis_0.3.1             
##   [5] rio_0.5.16                  CGHcall_2.50.0             
##   [7] rprojroot_2.0.2             DNAcopy_1.62.0             
##   [9] XVector_0.28.0              GlobalOptions_0.1.2        
##  [11] fs_1.5.0                    rstudioapi_0.13            
##  [13] ggpubr_0.4.0                listenv_0.8.0              
##  [15] HiCcompare_1.10.0           codetools_0.2-16           
##  [17] splines_4.0.2               R.methodsS3_1.8.1          
##  [19] impute_1.62.0               knitr_1.30                 
##  [21] Rsamtools_2.4.0             broom_0.7.2                
##  [23] R.oo_1.24.0                 pheatmap_1.0.12            
##  [25] BiocManager_1.30.10         compiler_4.0.2             
##  [27] backports_1.2.0             assertthat_0.2.1           
##  [29] Matrix_1.2-18               limma_3.44.3               
##  [31] htmltools_0.5.0             tools_4.0.2                
##  [33] gtable_0.3.0                glue_1.4.2                 
##  [35] GenomeInfoDbData_1.2.3      reshape2_1.4.4             
##  [37] Rcpp_1.0.5                  carData_3.0-4              
##  [39] Biobase_2.48.0              cellranger_1.1.0           
##  [41] pkgdown_1.6.1               vctrs_0.3.5                
##  [43] Biostrings_2.56.0           nlme_3.1-150               
##  [45] QDNAseq_1.24.0              xfun_0.19                  
##  [47] stringr_1.4.0               globals_0.13.1             
##  [49] openxlsx_4.2.3              lifecycle_0.2.0            
##  [51] gtools_3.8.2                rstatix_0.6.0              
##  [53] InteractionSet_1.16.0       future_1.20.1              
##  [55] zlibbioc_1.34.0             scales_1.1.1               
##  [57] ragg_0.4.0                  hms_0.5.3                  
##  [59] SummarizedExperiment_1.18.2 rhdf5_2.32.4               
##  [61] RColorBrewer_1.1-2          yaml_2.2.1                 
##  [63] curl_4.3                    memoise_1.1.0              
##  [65] gridExtra_2.3               ggplot2_3.3.2              
##  [67] CGHbase_1.48.0              stringi_1.5.3              
##  [69] desc_1.2.0                  zip_2.1.1                  
##  [71] BiocParallel_1.22.0         rlang_0.4.8                
##  [73] pkgconfig_2.0.3             systemfonts_0.3.2          
##  [75] matrixStats_0.57.0          bitops_1.0-6               
##  [77] evaluate_0.14               lattice_0.20-41            
##  [79] purrr_0.3.4                 Rhdf5lib_1.10.1            
##  [81] cowplot_1.1.0               tidyselect_1.1.0           
##  [83] parallelly_1.21.0           plyr_1.8.6                 
##  [85] magrittr_2.0.1              bookdown_0.21              
##  [87] R6_2.5.0                    generics_0.1.0             
##  [89] DelayedArray_0.14.1         pillar_1.4.7               
##  [91] haven_2.3.1                 foreign_0.8-80             
##  [93] mgcv_1.8-33                 abind_1.4-5                
##  [95] RCurl_1.98-1.2              tibble_3.0.4               
##  [97] future.apply_1.6.0          PRIMME_3.1-3               
##  [99] crayon_1.3.4                car_3.0-10                 
## [101] KernSmooth_2.23-18          rmarkdown_2.5              
## [103] GetoptLong_1.0.4            grid_4.0.2                 
## [105] readxl_1.3.1                data.table_1.13.2          
## [107] marray_1.66.0               forcats_0.5.0              
## [109] digest_0.6.27               tidyr_1.1.2                
## [111] R.utils_2.10.1              textshaping_0.2.1          
## [113] munsell_0.5.0