StanでZero-inflated Poissonモデル [統計]
Stanで、Zero-inflated Poisson (ZIP)モデルのパラメータ推定をしてみる。
※ まちがっていたので修正(2013-12-26 13:20)。
こういうデータを用意する。n区画で、ある生物の数をしらべるが、その生物が存在する確率がtheta、存在するときの数が平均lambdaのポアソン分布にしたがうとする。
set.seed(1234) theta <- 0.7 lambda <- 3.5 n <- 100 # number of data y <- rbinom(n, 1, theta) y <- y * rpois(n, lambda) ## histogram h <- hist(y, breaks = min(y):(max(y) + 1), right = FALSE, plot = FALSE) barplot(h$counts, names = h$breaks[-(length(h$breaks))], las = 1, xlab = "y", ylab = "count")
Stanのコード
// Zero-Inflated Poisson model data { int<lower=1> N; int<lower=0> y[N]; } parameters { real<lower=0, upper=1> theta; real<lower=0> lambda; } model { // priors theta ~ uniform(0, 1); lambda ~ uniform(0, 100); for (i in 1:N) { if (y[i] == 0) { // not present // Bernoulli(0|θ) + Bernoulli(1|θ) * Poisson(0|λ) increment_log_prob(log((1 - theta) + theta * exp(-lambda))); } else { // present // Bernoulli(1|θ) * Poisson(y|λ) increment_log_prob(bernoulli_log(1, theta) + poisson_log(y[i], lambda)); } } }
RStanで実行してみる。
> library(rstan) > model <- stan_model("zip.stan") TRANSLATING MODEL 'zip' FROM Stan CODE TO C++ CODE NOW. COMPILING THE C++ CODE FOR MODEL 'zip' NOW. > fit <- sampling(model, data = list(N = n, y = y), + chains = 4, + iter = 2000, warmup = 500, thin = 1) SAMPLING FOR MODEL 'zip' NOW (CHAIN 1). Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.061198 seconds (Warm-up) 0.14187 seconds (Sampling) 0.203068 seconds (Total) SAMPLING FOR MODEL 'zip' NOW (CHAIN 2). Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.062225 seconds (Warm-up) 0.143415 seconds (Sampling) 0.20564 seconds (Total) SAMPLING FOR MODEL 'zip' NOW (CHAIN 3). Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.061601 seconds (Warm-up) 0.144984 seconds (Sampling) 0.206585 seconds (Total) SAMPLING FOR MODEL 'zip' NOW (CHAIN 4). Iteration: 2000 / 2000 [100%] (Sampling) Elapsed Time: 0.058801 seconds (Warm-up) 0.144183 seconds (Sampling) 0.202984 seconds (Total)
結果
> print(fit, digits = 2) Inference for Stan model: zip. 4 chains, each with iter=2000; warmup=500; thin=1; post-warmup draws per chain=1500, total post-warmup draws=6000. mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat theta 0.79 0.00 0.04 0.71 0.77 0.80 0.82 0.87 5058 1 lambda 3.70 0.00 0.22 3.28 3.55 3.69 3.84 4.15 4675 1 lp__ 45.16 0.02 1.00 42.34 44.81 45.50 45.87 46.09 2471 1 Samples were drawn using NUTS(diag_e) at Thu Dec 26 13:17:17 2013. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). > traceplot(fit)
以前のBUGSコードよりも素直にかける。
コメント 0