Title: | An Ensemble Method for Interval-Censored Survival Data |
---|---|
Description: | Implements the conditional inference forest approach to modeling interval-censored survival data. It also provides functions to tune the parameters and evaluate the model fit. See Yao et al. (2019) <arXiv:1901.04599>. |
Authors: | Weichi Yao [aut, cre], Halina Frydman [aut], Jeffrey S. Simonoff [aut] |
Maintainer: | Weichi Yao <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.5.1 |
Built: | 2024-11-21 03:34:51 UTC |
Source: | https://github.com/cran/ICcforest |
Construct a conditional inference forest model for interval-censored survival data.
The main function of this package is ICcforest
.
In many situations, the survival time cannot be directly observed and it is only
known to have occurred in an interval obtained from a sequence of examination times.
Methods like the Cox proportional hazards model rely on restrictive assumptions such as
proportional hazards and a log-linear relationship between the hazard function and
covariates. Furthermore, because these methods are often parametric, nonlinear effects
of variables must be modeled by transformations or expanding the design matrix to
include specialized basis functions for more complex data structures in real world
applications. The function ICtree
in the LTRCtrees
package provides a conditional inference tree method for interval-censored survival data,
as an extension of the conditional inference tree method ctree
for right-censored data. Tree estimators are nonparametric and as such often exhibit
low bias and high variance. Ensemble methods like bagging and random forest can
reduce variance while preserving low bias.
This package implements ICcforest
, which extends the conditional inference forest
(see cforest
) to interval censored data. ICcforest
uses
conditional inference survival trees (see ICtree
) as base learners.
The main function ICcforest
fits a
conditional inference forest for interval-censored survival data, with parameter
mtry
tuned by tuneICRF
; gettree.ICcforest
extracts
the i-th individual tree from the established ICcforest
objects; and
predict.ICcforest
computes predictions from ICcforest
objects.
ICcforest, gettree.ICcforest, predict.ICcforest,
tuneICRF, sbrier_IC
Extract the i-th individual tree from the established ICcforest. The resulting object can be printed or plotted, and predictions can be made using it.
## S3 method for class 'ICcforest' gettree(object, tree = 1L, ...)
## S3 method for class 'ICcforest' gettree(object, tree = 1L, ...)
object |
an object as returned by |
tree |
an integer, the number of the tree to extract from the forest. |
... |
additional arguments. |
An object of class party
.
#### Example with dataset miceData library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. idx_inf <- (miceData$u == Inf) miceData$u[idx_inf] <- 9999999. ## First, fit an iterval-censored conditional inference forest Cforest <- ICcforest(formula = Surv(l,u,type="interval2")~grp, data = miceData, ntree = 50L) ## Extract the 50-th tree from the forest plot(gettree(Cforest, tree = 50L))
#### Example with dataset miceData library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. idx_inf <- (miceData$u == Inf) miceData$u[idx_inf] <- 9999999. ## First, fit an iterval-censored conditional inference forest Cforest <- ICcforest(formula = Surv(l,u,type="interval2")~grp, data = miceData, ntree = 50L) ## Extract the 50-th tree from the forest plot(gettree(Cforest, tree = 50L))
An implementation of the random forest and bagging ensemble algorithms utilizing conditional inference trees as base learners for interval-censored survival data.
ICcforest( formula, data, mtry = NULL, ntree = 100L, applyfun = NULL, cores = NULL, na.action = na.pass, suppress = TRUE, trace = TRUE, perturb = list(replace = FALSE, fraction = 0.632), control = partykit::ctree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, minsplit = nrow(data) * 0.15, minbucket = nrow(data) * 0.06), ... )
ICcforest( formula, data, mtry = NULL, ntree = 100L, applyfun = NULL, cores = NULL, na.action = na.pass, suppress = TRUE, trace = TRUE, perturb = list(replace = FALSE, fraction = 0.632), control = partykit::ctree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, minsplit = nrow(data) * 0.15, minbucket = nrow(data) * 0.06), ... )
formula |
a formula object, with the response being a
|
data |
a data frame containing the variables named in |
mtry |
number of input variables randomly sampled as candidates at each node for
random forest like algorithms. The default |
ntree |
an integer, the number of the trees to grow for the forest. |
applyfun |
an optional |
cores |
numeric. If set to an integer the |
na.action |
a function which indicates what should happen when the data contain missing values. |
suppress |
a logical specifying whether the messages from |
trace |
whether to print the progress of the search of the optimal value of |
perturb |
a list with arguments |
control |
a list of control parameters, see |
... |
additional arguments. |
ICcforest
returns an ICcforest
object.
The object belongs to the class ICcforest
, as a subclass of cforest
.
This function extends the conditional inference survival forest algorithm in
cforest
to fit interval-censored survival data.
An object of class ICcforest
, as a subclass of cforest
.
predict.ICcforest
for prediction, gettree.ICcforest
for individual tree extraction, and tuneICRF
for mtry
tuning.
#### Example with miceData library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. miceData$u[miceData$u == Inf] <- 9999999. ## Fit an iterval-censored conditional inference forest Cforest <- ICcforest(Surv(l, u, type = "interval2") ~ grp, data = miceData)
#### Example with miceData library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. miceData$u[miceData$u == Inf] <- 9999999. ## Fit an iterval-censored conditional inference forest Cforest <- ICcforest(Surv(l, u, type = "interval2") ~ grp, data = miceData)
Compute predictions from ICcforest objects.
## S3 method for class 'ICcforest' predict( object, newdata = NULL, OOB = FALSE, suppress = TRUE, type = c("response", "prob", "weights", "node"), FUN = NULL, simplify = TRUE, scale = TRUE, ... )
## S3 method for class 'ICcforest' predict( object, newdata = NULL, OOB = FALSE, suppress = TRUE, type = c("response", "prob", "weights", "node"), FUN = NULL, simplify = TRUE, scale = TRUE, ... )
object |
an object as returned by |
newdata |
an optional data frame containing test data. |
OOB |
a logical specifying whether out-of-bag predictions are desired (only if |
suppress |
a logical specifying whether the messages from |
type |
a character string denoting the type of predicted value returned. For |
FUN |
a function to compute summary statistics. Predictions for each node must be
computed based on arguments |
simplify |
a logical indicating whether the resulting list of predictions should be
converted to a suitable vector or matrix (if possible), see |
scale |
a logical indicating scaling of the nearest neighbor weights by the sum of weights
in the corresponding terminal node of each tree, see |
... |
additional arguments. |
An object of class ICcforest
, as a subclass of cforest
.
sbrier_IC
for evaluation of model fit for interval-censored data
library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. miceData$u[miceData$u == Inf] <- 9999999. ## First, fit an iterval-censored conditional inference forest Cforest <- ICcforest(formula = Surv(l,u,type="interval2")~grp, data = miceData) ## Predict the survival function constructed using the non-parametric maximum likelihood estimator Pred <- predict(Cforest, type = "prob") ## Out-of-bag prediction of the median survival time PredOOB <- predict(Cforest, type = "response", OOB = TRUE)
library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. miceData$u[miceData$u == Inf] <- 9999999. ## First, fit an iterval-censored conditional inference forest Cforest <- ICcforest(formula = Surv(l,u,type="interval2")~grp, data = miceData) ## Predict the survival function constructed using the non-parametric maximum likelihood estimator Pred <- predict(Cforest, type = "prob") ## Out-of-bag prediction of the median survival time PredOOB <- predict(Cforest, type = "response", OOB = TRUE)
Compute the (integrated) Brier score to evaluate the model fit for interval-censored survival data.
sbrier_IC( obj, pred, btime = range(as.numeric(obj[, 1:2])), type = c("IBS", "BS") )
sbrier_IC( obj, pred, btime = range(as.numeric(obj[, 1:2])), type = c("IBS", "BS") )
obj |
an object of class |
pred |
predicted values. This can be a matrix of survival probabilities evaluated
at a sequence of time points for a set of new data, a list of |
btime |
a vector of length two indicating the range of times that the scores are computed on.
The default |
type |
a character string denoting the type of scores returned. For |
If type = "IBS"
, this returns the integrated Brier score.
If type = "BS"
, this returns the Brier scores.
S. Tsouprou. Measures of discrimination and predictive accuracy for interval-censored data. Master thesis, Leiden University. https://www.math.leidenuniv.nl/scripties/MasterTsouprou.pdf.
### Example with dataset miceData library(survival) library(icenReg) data(miceData) ## For proper evaluation, Inf should be set to be a large number, for example, 9999999. idx_inf <- (miceData$u == Inf) miceData$u[idx_inf] <- 9999999. obj <- Surv(miceData$l, miceData$u, type = "interval2") ## Model fit for an NPMLE survival curve with survfit pred <- survival::survfit(formula = Surv(l, u, type = "interval2") ~ 1, data = miceData) # Integrated Brier score up to time = 642 sbrier_IC(obj, pred, btime = c(0, 642), type = "IBS") ## Model fit for a semi-parametric model with icenReg::ic_sp() pred <- icenReg::ic_sp(formula = Surv(l, u, type = "interval2") ~ 1, data = miceData) # Integrated Brier score up to the largest endpoints of all censoring intervals in the dataset sbrier_IC(obj, pred, type = "IBS") ## Model fit for an NPMLE survival curve with icenReg::ic_np() pred <- icenReg::ic_np(miceData[,c('l', 'u')]) # Brier score computed at every left and right endpoints of all censoring intervals in the dataset sbrier_IC(obj, pred, type = "BS")
### Example with dataset miceData library(survival) library(icenReg) data(miceData) ## For proper evaluation, Inf should be set to be a large number, for example, 9999999. idx_inf <- (miceData$u == Inf) miceData$u[idx_inf] <- 9999999. obj <- Surv(miceData$l, miceData$u, type = "interval2") ## Model fit for an NPMLE survival curve with survfit pred <- survival::survfit(formula = Surv(l, u, type = "interval2") ~ 1, data = miceData) # Integrated Brier score up to time = 642 sbrier_IC(obj, pred, btime = c(0, 642), type = "IBS") ## Model fit for a semi-parametric model with icenReg::ic_sp() pred <- icenReg::ic_sp(formula = Surv(l, u, type = "interval2") ~ 1, data = miceData) # Integrated Brier score up to the largest endpoints of all censoring intervals in the dataset sbrier_IC(obj, pred, type = "IBS") ## Model fit for an NPMLE survival curve with icenReg::ic_np() pred <- icenReg::ic_np(miceData[,c('l', 'u')]) # Brier score computed at every left and right endpoints of all censoring intervals in the dataset sbrier_IC(obj, pred, type = "BS")
Starting with the default value of mtry, search for the optimal value (with respect to Out-of-Bag error estimate) of mtry for ICcforest.
tuneICRF( formula, data, mtryStart = NULL, stepFactor = 1.5, ntreeTry = 100L, control = partykit::ctree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, minsplit = nrow(data) * 0.15, minbucket = nrow(data) * 0.06), suppress = TRUE, trace = TRUE, plot = FALSE, doBest = FALSE )
tuneICRF( formula, data, mtryStart = NULL, stepFactor = 1.5, ntreeTry = 100L, control = partykit::ctree_control(teststat = "quad", testtype = "Univ", mincriterion = 0, saveinfo = FALSE, minsplit = nrow(data) * 0.15, minbucket = nrow(data) * 0.06), suppress = TRUE, trace = TRUE, plot = FALSE, doBest = FALSE )
formula |
a formula object, with the response being a
|
data |
a data frame containing the variables named in |
mtryStart |
starting value of |
stepFactor |
at each iteration, |
ntreeTry |
number of trees used at the tuning step. |
control |
a list with control parameters, see |
suppress |
a logical specifying whether the messages from |
trace |
whether to print the progress of the search. |
plot |
whether to plot the out-of-bag error as a function of |
doBest |
whether to run an ICcforest using the optimal mtry found. |
If doBest=FALSE
(default), this returns the optimal mtry value of those searched.
If doBest=TRUE
, this returns the ICcforest object produced with the optimal mtry.
sbrier_IC
for evaluation of model fit for interval-censored data
when searching for the optimal value of mtry
.
### Example with dataset tandmob2 library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. miceData$u[miceData$u == Inf] <- 9999999. ## Create a new variable to be selected from miceData$new = rep(1:4) ## Tune mtry mtryTune <- tuneICRF(Surv(l, u, type = "interval2") ~ grp + new, data = miceData)
### Example with dataset tandmob2 library(icenReg) data(miceData) ## For ICcforest to run, Inf should be set to be a large number, for example, 9999999. miceData$u[miceData$u == Inf] <- 9999999. ## Create a new variable to be selected from miceData$new = rep(1:4) ## Tune mtry mtryTune <- tuneICRF(Surv(l, u, type = "interval2") ~ grp + new, data = miceData)