Multiple responses subgroup identification using 'GUIDE' 'Gi' option for tree building
MrSFit( dataframe, role, bestK = 1, bootNum = 0L, alpha = 0.05, maxDepth = 5, minTrt = 5, minData = max(c(minTrt * maxDepth, NROW(Y)/20)), batchNum = 1L, CVFolds = 10L, CVSE = 0, faster = FALSE, display = FALSE, treeName = paste0("tree_", format(Sys.time(), "%m%d"), ".yaml"), nodeName = paste0("node_", format(Sys.time(), "%m%d"), ".txt"), bootName = paste0("boot_", format(Sys.time(), "%m%d"), ".txt"), impName = paste0("imp_", format(Sys.time(), "%m%d"), ".txt"), writeTo = FALSE, remove = TRUE )
dataframe | The data frame used for subgroup identification in a |
---|---|
role | role follows 'GUIDE' role. role should be a In current implementation, we have following available roles.
|
bestK | number of covariates in the regression model |
bootNum | bootstrap number |
alpha | desire alpha levels for confidence interval with respect to treatment parameters |
maxDepth | maximum tree depth |
minTrt | minimum treatment and placebo sample in each node |
minData | minimum sample in each node |
batchNum | related with exhaustive search for numerical split variable |
CVFolds | cross validation times |
CVSE | cross validation SE |
faster | related with tree split searching |
display | Whether display tree in the end |
treeName | yaml file for save the tree |
nodeName | file same for each node |
bootName | file save bootstrap calibrate alpha |
impName | important variable file name |
writeTo | debug option reserve for author... |
remove | whether to remove extra files |
An object of class "guide"
Tree structure result.
Predicted node of each observation.
A raw importance score, can used MrSImp
for more accurate result.
Categorical features level mapping.
Treatment assignment level mapping.
Number of outcomes.
Number of treatment assignment levels.
Role used for data frame.
Variable names.
Numerical variable names.
Categorical variable names.
Treatment assignment variable name.
A map from node id to node information.
Treatment level mapping.
Current tree setting.
Treatment effect summary.
This function uses 'GUIDE' Gi option for tree building, it can provide subgroup identification tree and confidence intervals of treatment effect based on bootstrap calibration.
'Gi' option is testing the interaction between covariate \(x_i\) and treatment assignment \(z\). With in each tree node \(t\), if \(x_i\) is a continuous variable, the function will discretize it into four parts as \(h_i\) based on sample quartiles. If \(x_i\) is a categorical variable, function will set \(h_i\) = \(x_i\). If \(x_i\) contains missing value, the function will add missing as a new level into \(H_i\). Then, we test the full model against the main effect model.
$$H_0 = \beta_0 + \sum\limits_{i=2}^{H}\beta_{hi}I(h_i = i) + \sum\limits_{j=2}^{G}\beta_{zj}I(Z_j = j)$$ $$H_A = \beta_0 + \sum\limits_{i=2, j=2}\beta_{ij}I(h_i = i, Z_j = j)$$
Then choose the most significant \(x_i\). The details algorithm can be found in Loh, W.-Y. and Zhou, P. (2020).
The bootstrap confidence interval of treatment can be fond in Loh et al. (2019).
Loh, W.-Y. and Zhou, P. (2020). The GUIDE approach to subgroup identification. In Design and Analysis of Subgroups with Biopharmaceutical Applications, N. Ting, J. C. Cappelleri, S. Ho, and D.-G. Chen (Eds.) Springer, in press.
Loh, W.-Y., Man, M. and Wang, S. (2019). Subgroups from regression trees with adjustment for prognostic effects and post-selection inference. Statistics in Medicine, vol. 38, 545-557. doi:10.1002/sim.7677 http://pages.stat.wisc.edu/~loh/treeprogs/guide/sm19.pdf
library(MrSGUIDE) set.seed(1234) N = 200 np = 3 numX <- matrix(rnorm(N * np), N, np) ## numerical features gender <- sample(c('Male', 'Female'), N, replace = TRUE) country <- sample(c('US', 'UK', 'China', 'Japan'), N, replace = TRUE) z <- sample(c(0, 1), N, replace = TRUE) # Binary treatment assignment y1 <- numX[, 1] + 1 * z * (gender == 'Female') + rnorm(N) y2 <- numX[, 2] + 2 * z * (gender == 'Female') + rnorm(N) train <- data.frame(numX, gender, country, z, y1, y2) role <- c(rep('n', 3), 'c', 'c', 'r', 'd', 'd') mrsobj <- MrSFit(dataframe = train, role = role) printTree(mrsobj)#> ID: 1, gender = { Female, NA } #> ID: 2, Size: 118 [Terminal] #> Outcome Models: #> y1 Est SE #> X1 1.02 #> z.0 -0.179 0.132 #> z.1 0.996 0.189 #> - - - - - - - - - - - - - - #> y2 Est SE #> X2 1.194 #> z.0 -0.062 0.126 #> z.1 2.062 0.181 #> - - - - - - - - - - - - - - #> ID: 1, gender = { Male } #> ID: 3, Size: 82 [Terminal] #> Outcome Models: #> y1 Est SE #> X1 0.961 #> z.0 0.198 0.144 #> z.1 -0.227 0.2 #> - - - - - - - - - - - - - - #> y2 Est SE #> X2 0.974 #> z.0 0.167 0.149 #> z.1 -0.037 0.209 #> - - - - - - - - - - - - - -