Make Hi-C experiment object from data

make_hicexp(
  ...,
  data_list = NA,
  groups,
  covariates = NULL,
  remove_zeros = FALSE,
  zero.p = 0.8,
  A.min = 5,
  filter = TRUE,
  remove.regions = hg19_cyto
)

Arguments

...

Hi-C data. Data must in sparse upper triangular format with 4 columns: chr, region1, region2, IF or in 7 column BEDPE format with columns chr, start1, end1, chr, start2, end2, IF.

data_list

Alternate way to enter data. If you have your Hi-C data in the form of a list already with each entry of the list representing a sample use this option.

groups

A vector of the experimental groups corresponding to each Hi-C data object entered. If it is not in factor form when entered it will be converted to a factor.

covariates

Optional data.frame containing covariate information for your Hi-C experiment. Some examples are enzyme used, batch number, etc. Should have the same number of rows as the number of Hi-C data objects entered and columns corresponding to covariates.

remove_zeros

Logical, should rows with 1 or more zero IF values be removed?

zero.p

The proportion of zeros in a row to filter by. If the proportion of zeros in a row is <= zero.p the row will be filtered out, i.e. zero.p = 1 means nothing is filtered based on zeros and zero.p = 0 will filter rows that have any zeros.

A.min

The minimum average expression value (row mean) for an interaction pair. If the interaction pair has an average expression value less than A.min the row will be filtered out.

filter

Logical, should filtering be performed? Defaults to TRUE. If TRUE it will filter out the interactions that have low average IFs or large numbers of 0 IF values. As these interactions are not very interesting and are commonly false positives during difference detection it is better to remove them from the dataset. Additionally, filtering will help speed up the run time of multiHiCcompare. Filtering can be performed before or after normalization, however the best computational speed gain will occur when filtering is done before normalization. Filtering parameters are controlled by the zero.p and A.min options.

remove.regions

A GenomicRanges object indicating specific regions to be filtered out. By default this is the hg19 centromeric, gvar, and stalk regions. Also included in the package is hg38_cyto. If your data is not hg19 you will need to substitute this file. To choose not to filter any regions set regions = NULL. NOTE: if you set filter = FALSE these regions will NOT be removed. This occurs in conjuction with the filtering step.

Value

A hicexp object.

Details

Use this function to create a hicexp object for analysis in multiHiCcompare. Filtering can also be performed in this step if the filter option is set to TRUE. Filtering parameters are controlled by the zero.p and A.min options.

Examples

# load data in sparse upper triangular format data("HCT116_r1", "HCT116_r2", "HCT116_r3", "HCT116_r4", "HCT116_r5", "HCT116_r6") # make groups & covariate input groups <- factor(c(1, 1, 1, 2, 2, 2)) covariates <- data.frame(enzyme = factor(c('mobi', 'mboi', 'mboi', 'dpnii', 'dpnii', 'dpnii')), batch = c(1, 2, 1, 2, 1, 2)) # make the hicexp object hicexp <- make_hicexp(HCT116_r1, HCT116_r2, HCT116_r3, HCT116_r4, HCT116_r5, HCT116_r6, groups = groups, covariates = covariates)