Multiple responses subgroup identification

Multiple responses subgroup identification using 'GUIDE' 'Gi' option for tree building

MrSFit(
  dataframe,
  role,
  bestK = 1,
  bootNum = 0L,
  alpha = 0.05,
  maxDepth = 5,
  minTrt = 5,
  minData = max(c(minTrt * maxDepth, NROW(Y)/20)),
  batchNum = 1L,
  CVFolds = 10L,
  CVSE = 0,
  faster = FALSE,
  display = FALSE,
  treeName = paste0("tree_", format(Sys.time(), "%m%d"), ".yaml"),
  nodeName = paste0("node_", format(Sys.time(), "%m%d"), ".txt"),
  bootName = paste0("boot_", format(Sys.time(), "%m%d"), ".txt"),
  impName = paste0("imp_", format(Sys.time(), "%m%d"), ".txt"),
  writeTo = FALSE,
  remove = TRUE
)

Arguments

dataframe	The data frame used for subgroup identification in a `data.frame` format. The data frame should contains covariates, treatment assignment and outcomes. The order of variables does not matter.
role	role follows 'GUIDE' role. role should be a `vector`, with same length as `dataframe`'s column. The role serves for providing usage of each column in `dataframe`. In current implementation, we have following available roles. Covariates roles c Categorical variable used for splitting only. f Numerical variable used only for fitting the regression models in the nodes of tree. It will not be used for splitting the nodes. h Numerical variable always held in fitting the regression models in the nodes of tree. n Numerical variable used both for splitting the nodes and fitting the node regression model. s Numerical variable only used for splitting the node. It will not be used for fitting the regression model. x Exclude variable. Variable will not be used in tree building. Outcome role d Dependent variable. If there is only one d variable, function will do single response subgroup identification. Treatment role r Categorical tReatment variable used only for fitting the linear models in the nodes of tree. It is not used for splitting the nodes.
bestK	number of covariates in the regression model
bootNum	bootstrap number
alpha	desire alpha levels for confidence interval with respect to treatment parameters
maxDepth	maximum tree depth
minTrt	minimum treatment and placebo sample in each node
minData	minimum sample in each node
batchNum	related with exhaustive search for numerical split variable
CVFolds	cross validation times
CVSE	cross validation SE
faster	related with tree split searching
display	Whether display tree in the end
treeName	yaml file for save the tree
nodeName	file same for each node
bootName	file save bootstrap calibrate alpha
impName	important variable file name
writeTo	debug option reserve for author...
remove	whether to remove extra files

Value

An object of class "guide"

treeRes

Tree structure result.

node

Predicted node of each observation.

imp

A raw importance score, can used MrSImp for more accurate result.

cLevels

Categorical features level mapping.

tLevels

Treatment assignment level mapping.

Number of outcomes.

Number of treatment assignment levels.

role

Role used for data frame.

varName

Variable names.

numName

Numerical variable names.

catName

Categorical variable names.

trtName

Treatment assignment variable name.

nodeMap

A map from node id to node information.

TrtL

Treatment level mapping.

Settings

Current tree setting.

trtNode

Treatment effect summary.

Details

This function uses 'GUIDE' Gi option for tree building, it can provide subgroup identification tree and confidence intervals of treatment effect based on bootstrap calibration.

'Gi' option is testing the interaction between covariate $x_i$ and treatment assignment $z$. With in each tree node $t$, if $x_i$ is a continuous variable, the function will discretize it into four parts as $h_i$ based on sample quartiles. If $x_i$ is a categorical variable, function will set $h_i$ = $x_i$. If $x_i$ contains missing value, the function will add missing as a new level into $H_i$. Then, we test the full model against the main effect model.

$$H_0 = \beta_0 + \sum\limits_{i=2}^{H}\beta_{hi}I(h_i = i) + \sum\limits_{j=2}^{G}\beta_{zj}I(Z_j = j)$$ $$H_A = \beta_0 + \sum\limits_{i=2, j=2}\beta_{ij}I(h_i = i, Z_j = j)$$

Then choose the most significant $x_i$. The details algorithm can be found in Loh, W.-Y. and Zhou, P. (2020).

The bootstrap confidence interval of treatment can be fond in Loh et al. (2019).

References

Loh, W.-Y. and Zhou, P. (2020). The GUIDE approach to subgroup identification. In Design and Analysis of Subgroups with Biopharmaceutical Applications, N. Ting, J. C. Cappelleri, S. Ho, and D.-G. Chen (Eds.) Springer, in press.

Loh, W.-Y., Man, M. and Wang, S. (2019). Subgroups from regression trees with adjustment for prognostic effects and post-selection inference. Statistics in Medicine, vol. 38, 545-557. doi:10.1002/sim.7677 http://pages.stat.wisc.edu/~loh/treeprogs/guide/sm19.pdf

Examples

library(MrSGUIDE)
set.seed(1234)

N = 200
np = 3

numX <- matrix(rnorm(N * np), N, np) ## numerical features
gender <- sample(c('Male', 'Female'), N, replace = TRUE)
country <- sample(c('US', 'UK', 'China', 'Japan'), N, replace = TRUE)

z <- sample(c(0, 1), N, replace = TRUE) # Binary treatment assignment

y1 <- numX[, 1] + 1 * z * (gender == 'Female') + rnorm(N)
y2 <- numX[, 2] + 2 * z * (gender == 'Female') + rnorm(N)

train <- data.frame(numX, gender, country, z, y1, y2)
role <- c(rep('n', 3), 'c', 'c', 'r', 'd', 'd')

mrsobj <- MrSFit(dataframe = train, role = role)
printTree(mrsobj)
#> ID: 1, gender = { Female, NA }
#>     ID: 2, Size: 118 [Terminal]
#>     Outcome Models: 
#>         y1        Est        SE
#>         X1    1.02
#>         z.0    -0.179    0.132
#>         z.1    0.996    0.189
#>     - - - - - - - - - - - - - - 
#>         y2        Est        SE
#>         X2    1.194
#>         z.0    -0.062    0.126
#>         z.1    2.062    0.181
#>     - - - - - - - - - - - - - - 
#> ID: 1, gender = { Male }
#>     ID: 3, Size: 82 [Terminal]
#>     Outcome Models: 
#>         y1        Est        SE
#>         X1    0.961
#>         z.0    0.198    0.144
#>         z.1    -0.227    0.2
#>     - - - - - - - - - - - - - - 
#>         y2        Est        SE
#>         X2    0.974
#>         z.0    0.167    0.149
#>         z.1    -0.037    0.209
#>     - - - - - - - - - - - - - -