Mar 18, 2017 - How To Add Rmarkdown

Comments

1 minutes setup

In your Rstudio or your Rmd file, the output option use html_fragment, after knit, add

---
layout: post
---

to the head of your generated html file. Put the html file in _posts folder

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Mar 16, 2017 - Rstan Example

Comments

Stan is one of my faviorate programming language, there are tones of people using stan as daily research or analysis purpose, Facebook’s new pacakge PROPHET.

Install R package rstan

Rstan is the R interface to stan, which using for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.

This tutorial will introduce how to install R stan into our computer and how to use R stan in real problem.

install.packages("rstan")
library(rstan)

Genetic Linkage Model

Suppose that 197 animals (Y) are distributed into four categories as follows: \[Y = (y_1, y_2, y_3, y_4) = (125, 18, 20, 34)\] with cell probabilities \[(\frac{1}{2} + \frac{\theta}{4}, \frac{1}{4}(1-\theta), \frac{1}{4}(1-\theta), \frac{\theta}{4})\]

The EM algorithm will use augment method to make the likelihood function easy to me maximized. So how to put the question into stan?

Model file or code: Stan has 7 main blocks:

  • functions Define your own function, likelihood, sampling distribution
  • data Using for input data
  • transformed data
  • parameters Parameters need to estimate
  • transformed parameters
  • model As its name
  • generated quantities Some quantities you are interested
linkage_code <- '
data {
  int<lower=0> y[4];
}

parameters {
  real<lower=0, upper=1> theta;
}

transformed parameters {
  vector[4] ytheta;
  ytheta[1] = 0.5+theta/4;
  ytheta[2] = 0.25-theta/4;
  ytheta[3] = 0.25-theta/4;
  ytheta[4] = theta/4;
}

model {
  y ~ multinomial(ytheta);
}
'

y <- c(125, 18, 20, 34)
dat <- list()
dat$y <- y

First we compile our model to a S4 stanmodel object.

model_obj <- stan_model(model_code = linkage_code, model_name = "GeneticLinkage")

Sampling

fit <- sampling(model_obj, data = dat, iter = 1000, chains = 4) 

you can also save your model_code into a file maybe call “genetic.stan”, then use argument file = "gentic.stan".

iter means length of chains, chains means how many chains you want to use.

You can use a list to store all the data, then put them into your stan model.

Some inference

First you may need to check is whether the Markov chains are converged.

print(fit)
## Inference for Stan model: GeneticLinkage.
## 4 chains, each with iter=1000; warmup=500; thin=1; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##              mean se_mean   sd    2.5%     25%     50%     75%   97.5%
## theta        0.63    0.00 0.05    0.52    0.60    0.63    0.66    0.72
## ytheta[1]    0.66    0.00 0.01    0.63    0.65    0.66    0.66    0.68
## ytheta[2]    0.09    0.00 0.01    0.07    0.09    0.09    0.10    0.12
## ytheta[3]    0.09    0.00 0.01    0.07    0.09    0.09    0.10    0.12
## ytheta[4]    0.16    0.00 0.01    0.13    0.15    0.16    0.16    0.18
## lp__      -207.64    0.02 0.67 -209.58 -207.83 -207.37 -207.20 -207.17
##           n_eff Rhat
## theta      1401    1
## ytheta[1]  1401    1
## ytheta[2]  1401    1
## ytheta[3]  1401    1
## ytheta[4]  1401    1
## lp__        789    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Apr 12 13:14:53 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

you can check the Rhat.

Also stan use ggplot, so that you can trace plot the theta and see whether multiple chains merge together.

traceplot(fit, "theta")

And if you want to get the sample from the model, you can use extract function.

theta_p <- extract(fit, "theta")
mean(theta_p$theta)
## [1] 0.6250131

Extract function will drop 50% of data in the warming up process.

Opimization

Stan also support optimization for research purpose, so that you can perform lasso or group lasso some penalized regression in stan. Just by adding log_increment to add in likelihood.

opt_stan <- optimizing(model_obj, data = dat)
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = random
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-12
## tol_grad = 1e-08
## tol_param = 1e-08
## tol_rel_obj = 10000
## tol_rel_grad = 1e+07
## history_size = 5
## seed = 1510119566
## initial log joint probability = -229.801
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
opt_stan$par
##      theta  ytheta[1]  ytheta[2]  ytheta[3]  ytheta[4] 
## 0.62682123 0.65670531 0.09329469 0.09329469 0.15670531

Stan in Lasso

Lasso in stan is also easy to code.

lasso_code <- "
data {
    int n;
    int p;
    vector[n] y;
    matrix[n, p] x;
    real<lower = 0> lambda;
}
parameters {
    vector[p] beta;
    real<lower = 0> sigma;
}
model{
    y ~ normal(x * beta, sigma);
    for (k in 1:p)
        target += -lambda * fabs(beta[k]);
}
"

p = 8
beta_true <- as.vector(c(3, 1.5, 0, 0, 2, 0, 0, 0))
sigma = matrix(0, p, p)
rho = 0.5
for(i in 1:p)
    for(j in 1:p)
        sigma[i, j] <- 0.5^(abs(i - j))
n = 20
x <- mvrnorm(n = n, mu = rep(0, p), Sigma = sigma)
y <- as.vector(x %*% beta_true + rnorm(n, 0, 2))

lambda = 3
stan_data <- list(n = n, p = p, y = y, lambda)
model_obj <- stan_model(model_code = lasso_code, model_name = "Lasso")
opt_stan <- optimizing(model_obj, data = stan_data)
## STAN OPTIMIZATION COMMAND (LBFGS)
## init = random
## save_iterations = 1
## init_alpha = 0.001
## tol_obj = 1e-12
## tol_grad = 1e-08
## tol_param = 1e-08
## tol_rel_obj = 10000
## tol_rel_grad = 1e+07
## history_size = 5
## seed = 800487386
## initial log joint probability = -4482.16
## Optimization terminated normally: 
##   Convergence detected: relative gradient magnitude is below tolerance
opt_stan$par
##       beta[1]       beta[2]       beta[3]       beta[4]       beta[5] 
##  3.030875e+00  1.102667e+00  4.376140e-02  7.237658e-09  8.797249e-01 
##       beta[6]       beta[7]       beta[8]         sigma 
##  5.332066e-09  5.425149e-09 -9.477255e-09  2.480440e+00

Further reading

Stan
Rstan

Mar 14, 2017 - Welcome to Jekyll!

You’ll find this post in your _posts directory. Go ahead and edit it and re-build the site to see your changes. You can rebuild the site in many different ways, but the most common way is to run jekyll serve, which launches a web server and auto-regenerates your site when a file is updated.

To add new posts, simply add a file in the _posts directory that follows the convention YYYY-MM-DD-name-of-post.ext and includes the necessary front matter. Take a look at the source for this post to get an idea about how it works.

Jekyll also offers powerful support for code snippets:

def print_hi(name)
  puts "Hi, #{name}"
end
print_hi('Tom')
#=> prints 'Hi, Tom' to STDOUT.

Check out the Jekyll docs for more info on how to get the most out of Jekyll. File all bugs/feature requests at Jekyll’s GitHub repo. If you have questions, you can ask them on Jekyll Talk.