Stan is one of my faviorate programming language, there are tones of people using stan as daily research or analysis purpose, Facebook’s new pacakge PROPHET.

# Install R package *rstan*

Rstan is the R interface to stan, which using for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.

This tutorial will introduce how to install R stan into our computer and how to use R stan in real problem.

```
install.packages("rstan")
library(rstan)
```

# Genetic Linkage Model

Suppose that 197 animals (Y) are distributed into four categories as follows: \[Y = (y_1, y_2, y_3, y_4) = (125, 18, 20, 34)\] with cell probabilities \[(\frac{1}{2} + \frac{\theta}{4}, \frac{1}{4}(1-\theta), \frac{1}{4}(1-\theta), \frac{\theta}{4})\]

The EM algorithm will use augment method to make the likelihood function easy to me maximized. So how to put the question into **stan**?

Model file or code: Stan has 7 main blocks:

- functions
**Define your own function, likelihood, sampling distribution**
- data
**Using for input data**
- transformed data
- parameters
**Parameters need to estimate**
- transformed parameters
- model
**As its name**
- generated quantities
**Some quantities you are interested**

```
linkage_code <- '
data {
int<lower=0> y[4];
}
parameters {
real<lower=0, upper=1> theta;
}
transformed parameters {
vector[4] ytheta;
ytheta[1] = 0.5+theta/4;
ytheta[2] = 0.25-theta/4;
ytheta[3] = 0.25-theta/4;
ytheta[4] = theta/4;
}
model {
y ~ multinomial(ytheta);
}
'
y <- c(125, 18, 20, 34)
dat <- list()
dat$y <- y
```

First we compile our model to a S4 stanmodel object.

`model_obj <- stan_model(model_code = linkage_code, model_name = "GeneticLinkage")`

## Sampling

`fit <- sampling(model_obj, data = dat, iter = 1000, chains = 4) `

you can also save your model_code into a file maybe call “genetic.stan”, then use argument `file = "gentic.stan"`

.

*iter* means length of chains, *chains* means how many chains you want to use.

You can use a list to store all the data, then put them into your stan model.

### Some inference

First you may need to check is whether the Markov chains are converged.

`print(fit)`

```
## Inference for Stan model: GeneticLinkage.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5%
## theta 0.63 0.00 0.05 0.52 0.60 0.63 0.66 0.72
## ytheta[1] 0.66 0.00 0.01 0.63 0.65 0.66 0.66 0.68
## ytheta[2] 0.09 0.00 0.01 0.07 0.09 0.09 0.10 0.12
## ytheta[3] 0.09 0.00 0.01 0.07 0.09 0.09 0.10 0.12
## ytheta[4] 0.16 0.00 0.01 0.13 0.15 0.16 0.16 0.18
## lp__ -207.64 0.02 0.67 -209.58 -207.83 -207.37 -207.20 -207.17
## n_eff Rhat
## theta 1401 1
## ytheta[1] 1401 1
## ytheta[2] 1401 1
## ytheta[3] 1401 1
## ytheta[4] 1401 1
## lp__ 789 1
##
## Samples were drawn using NUTS(diag_e) at Wed Apr 12 13:14:53 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
```

you can check the Rhat.

Also stan use ggplot, so that you can trace plot the theta and see whether multiple chains merge together.

`traceplot(fit, "theta")`

And if you want to get the sample from the model, you can use `extract`

function.

```
theta_p <- extract(fit, "theta")
mean(theta_p$theta)
```

`## [1] 0.6250131`

Extract function will drop 50% of data in the warming up process.

## Opimization

Stan also support optimization for research purpose, so that you can perform lasso or group lasso some penalized regression in stan. Just by adding `log_increment`

to add in likelihood.

`opt_stan <- optimizing(model_obj, data = dat)`

```
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = random
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-12
## tol_grad = 1e-08
## tol_param = 1e-08
## tol_rel_obj = 10000
## tol_rel_grad = 1e+07
## history_size = 5
## seed = 1510119566
## initial log joint probability = -229.801
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
```

`opt_stan$par`

```
## theta ytheta[1] ytheta[2] ytheta[3] ytheta[4]
## 0.62682123 0.65670531 0.09329469 0.09329469 0.15670531
```

# Stan in Lasso

Lasso in stan is also easy to code.

```
lasso_code <- "
data {
int n;
int p;
vector[n] y;
matrix[n, p] x;
real<lower = 0> lambda;
}
parameters {
vector[p] beta;
real<lower = 0> sigma;
}
model{
y ~ normal(x * beta, sigma);
for (k in 1:p)
target += -lambda * fabs(beta[k]);
}
"
p = 8
beta_true <- as.vector(c(3, 1.5, 0, 0, 2, 0, 0, 0))
sigma = matrix(0, p, p)
rho = 0.5
for(i in 1:p)
for(j in 1:p)
sigma[i, j] <- 0.5^(abs(i - j))
n = 20
x <- mvrnorm(n = n, mu = rep(0, p), Sigma = sigma)
y <- as.vector(x %*% beta_true + rnorm(n, 0, 2))
lambda = 3
stan_data <- list(n = n, p = p, y = y, lambda)
```

`model_obj <- stan_model(model_code = lasso_code, model_name = "Lasso")`

`opt_stan <- optimizing(model_obj, data = stan_data)`

```
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = random
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-12
## tol_grad = 1e-08
## tol_param = 1e-08
## tol_rel_obj = 10000
## tol_rel_grad = 1e+07
## history_size = 5
## seed = 800487386
## initial log joint probability = -4482.16
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
```

`opt_stan$par`

```
## beta[1] beta[2] beta[3] beta[4] beta[5]
## 3.030875e+00 1.102667e+00 4.376140e-02 7.237658e-09 8.797249e-01
## beta[6] beta[7] beta[8] sigma
## 5.332066e-09 5.425149e-09 -9.477255e-09 2.480440e+00
```