StanでZero-inflated Poissonモデル：Taglibro de H：SSブログ

	ブログをはじめるログイン

PyStanをインストールしてみた｜StanでZero-infrated n.. ブログトップ

StanでZero-inflated Poissonモデル　[統計] [編集]

Stanで、Zero-inflated Poisson (ZIP)モデルのパラメータ推定をしてみる。

※ まちがっていたので修正(2013-12-26 13:20)。

こういうデータを用意する。n区画で、ある生物の数をしらべるが、その生物が存在する確率がtheta、存在するときの数が平均lambdaのポアソン分布にしたがうとする。

set.seed(1234)
theta <- 0.7
lambda <- 3.5
n <- 100  # number of data
y <- rbinom(n, 1, theta)
y <- y * rpois(n, lambda)

## histogram
h <- hist(y, breaks = min(y):(max(y) + 1),
          right = FALSE, plot = FALSE)
barplot(h$counts, names = h$breaks[-(length(h$breaks))],
        las = 1, xlab = "y", ylab = "count")

Stanのコード

// Zero-Inflated Poisson model
data {
  int<lower=1>  N;
  int<lower=0>  y[N];
}
parameters {
  real<lower=0, upper=1> theta;
  real<lower=0>          lambda;
}
model {
  // priors
  theta ~ uniform(0, 1);
  lambda ~ uniform(0, 100);
  for (i in 1:N) {
    if (y[i] == 0) {
      // not present
      // Bernoulli(0|θ) + Bernoulli(1|θ) * Poisson(0|λ)
      increment_log_prob(log((1 - theta) + theta * exp(-lambda)));
    } else {
      // present
      // Bernoulli(1|θ) * Poisson(y|λ)
      increment_log_prob(bernoulli_log(1, theta)
                           + poisson_log(y[i], lambda));
    }
  }
}

RStanで実行してみる。

> library(rstan)
> model <- stan_model("zip.stan")

TRANSLATING MODEL 'zip' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'zip' NOW.
> fit <- sampling(model, data = list(N = n, y = y),
+                 chains = 4,
+                 iter = 2000, warmup = 500, thin = 1)
SAMPLING FOR MODEL 'zip' NOW (CHAIN 1).
Iteration: 2000 / 2000 [100%]  (Sampling)
Elapsed Time: 0.061198 seconds (Warm-up)
              0.14187 seconds (Sampling)
              0.203068 seconds (Total)

SAMPLING FOR MODEL 'zip' NOW (CHAIN 2).
Iteration: 2000 / 2000 [100%]  (Sampling)
Elapsed Time: 0.062225 seconds (Warm-up)
              0.143415 seconds (Sampling)
              0.20564 seconds (Total)

SAMPLING FOR MODEL 'zip' NOW (CHAIN 3).
Iteration: 2000 / 2000 [100%]  (Sampling)
Elapsed Time: 0.061601 seconds (Warm-up)
              0.144984 seconds (Sampling)
              0.206585 seconds (Total)

SAMPLING FOR MODEL 'zip' NOW (CHAIN 4).
Iteration: 2000 / 2000 [100%]  (Sampling)
Elapsed Time: 0.058801 seconds (Warm-up)
              0.144183 seconds (Sampling)
              0.202984 seconds (Total)

結果

>  print(fit, digits = 2)
Inference for Stan model: zip.
4 chains, each with iter=2000; warmup=500; thin=1; 
post-warmup draws per chain=1500, total post-warmup draws=6000.

        mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
theta   0.79    0.00 0.04  0.71  0.77  0.80  0.82  0.87  5058    1
lambda  3.70    0.00 0.22  3.28  3.55  3.69  3.84  4.15  4675    1
lp__   45.16    0.02 1.00 42.34 44.81 45.50 45.87 46.09  2471    1

Samples were drawn using NUTS(diag_e) at Thu Dec 26 13:17:17 2013.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
> traceplot(fit)