caret::rfe
to apply recursive feature
elimination (RFE) on binned domain data as a feature reduction technique for
random forests. Backward elimination is performed from p down to 2, by
powers of 2, where p is the number of features in the data.R/TADrfe.R
TADrfe.Rd
A wrapper function passed to caret::rfe
to apply recursive feature
elimination (RFE) on binned domain data as a feature reduction technique for
random forests. Backward elimination is performed from p down to 2, by
powers of 2, where p is the number of features in the data.
TADrfe(
trainData,
tuneParams = list(ntree = 500, nodesize = 1),
cvFolds = 5,
cvMetric = "Accuracy",
verbose = FALSE
)
Data frame, the binned data matrix to built a random forest
classifiers (can be obtained using createTADdata
). Required.
List, providing ntree
and nodesize
parameters to feed into randomForest
. Required.
Numeric, number of k-fold cross-validation to perform. Required.
Character, performance metric to use to choose optimal tuning parameters (one of either "Kappa", "Accuracy", "MCC","ROC","Sens", "Spec", "Pos Pred Value", "Neg Pred Value"). Default is "Accuracy".
Logical, controls whether or not details regarding modeling should be printed out. Default is TRUE.
A list containing: 1) the performances extracted at each of the k folds and, 2) Variable importances among the top features at each step of RFE. For 1) `Variables` - the best subset of features to consider at each iteration, `MCC` (Matthews Correlation Coefficient), `ROC` (Area under the receiver operating characteristic curve), `Sens` (Sensitivity), `Spec` (Specificity), `Pos Pred Value` (Positive predictive value), `Neg Pred Value` (Negative predictive value), `Accuracy`, and the corresponding standard deviations across the cross-folds. For 2) `Overall` - the variable importance, `var` - the feature name, `Variables` - the number of features that were considered at each cross-fold, and `Resample` - the cross-fold
# Read in ARROWHEAD-called TADs at 5kb
data(arrowhead_gm12878_5kb)
#Extract unique boundaries
bounds.GR <- extractBoundaries(domains.mat = arrowhead_gm12878_5kb,
filter = FALSE,
CHR = "CHR22",
resolution = 5000)
# Read in GRangesList of 26 TFBS
data(tfbsList)
# Create the binned data matrix for CHR22 using:
# 5 kb binning,
# oc-type predictors from 26 different TFBS from the GM12878 cell line, and
# random under-sampling
tadData <- createTADdata(bounds.GR = bounds.GR,
resolution = 5000,
genomicElements.GR = tfbsList,
featureType = "oc",
resampling = "rus",
trainCHR = "CHR22",
predictCHR = NULL)
# Perform RFE for fully grown random forests with 100 trees using 5-fold CV
# Evaluate performances using accuracy
rfe_res <- TADrfe(trainData = tadData[[1]],
tuneParams = list(ntree = 100, nodesize = 1),
cvFolds = 5,
cvMetric = "Accuracy",
verbose = TRUE)
#> +(rfe) fit Fold1 size: 26
#> -(rfe) fit Fold1 size: 26
#> +(rfe) imp Fold1
#> -(rfe) imp Fold1
#> +(rfe) fit Fold1 size: 16
#> -(rfe) fit Fold1 size: 16
#> +(rfe) fit Fold1 size: 8
#> -(rfe) fit Fold1 size: 8
#> +(rfe) fit Fold1 size: 4
#> -(rfe) fit Fold1 size: 4
#> +(rfe) fit Fold1 size: 2
#> -(rfe) fit Fold1 size: 2
#> +(rfe) fit Fold2 size: 26
#> -(rfe) fit Fold2 size: 26
#> +(rfe) imp Fold2
#> -(rfe) imp Fold2
#> +(rfe) fit Fold2 size: 16
#> -(rfe) fit Fold2 size: 16
#> +(rfe) fit Fold2 size: 8
#> -(rfe) fit Fold2 size: 8
#> +(rfe) fit Fold2 size: 4
#> -(rfe) fit Fold2 size: 4
#> +(rfe) fit Fold2 size: 2
#> -(rfe) fit Fold2 size: 2
#> +(rfe) fit Fold3 size: 26
#> -(rfe) fit Fold3 size: 26
#> +(rfe) imp Fold3
#> -(rfe) imp Fold3
#> +(rfe) fit Fold3 size: 16
#> -(rfe) fit Fold3 size: 16
#> +(rfe) fit Fold3 size: 8
#> -(rfe) fit Fold3 size: 8
#> +(rfe) fit Fold3 size: 4
#> -(rfe) fit Fold3 size: 4
#> +(rfe) fit Fold3 size: 2
#> -(rfe) fit Fold3 size: 2
#> +(rfe) fit Fold4 size: 26
#> -(rfe) fit Fold4 size: 26
#> +(rfe) imp Fold4
#> -(rfe) imp Fold4
#> +(rfe) fit Fold4 size: 16
#> -(rfe) fit Fold4 size: 16
#> +(rfe) fit Fold4 size: 8
#> -(rfe) fit Fold4 size: 8
#> +(rfe) fit Fold4 size: 4
#> -(rfe) fit Fold4 size: 4
#> +(rfe) fit Fold4 size: 2
#> -(rfe) fit Fold4 size: 2
#> +(rfe) fit Fold5 size: 26
#> -(rfe) fit Fold5 size: 26
#> +(rfe) imp Fold5
#> -(rfe) imp Fold5
#> +(rfe) fit Fold5 size: 16
#> -(rfe) fit Fold5 size: 16
#> +(rfe) fit Fold5 size: 8
#> -(rfe) fit Fold5 size: 8
#> +(rfe) fit Fold5 size: 4
#> -(rfe) fit Fold5 size: 4
#> +(rfe) fit Fold5 size: 2
#> -(rfe) fit Fold5 size: 2