Stanによる順序ロジット回帰：Taglibro de H：SSブログ

	ブログをはじめるログイン

Stanによる順序ロジット回帰　[統計] [編集]

Stanで順序ロジット回帰をやってみる。

東北地方の2012年のブナ結実データと2013年のブナ結実データをつかう。とりあえず、時系列とか、調査地のランダム効果は考えないことにする。

2012年と2013年の結実状況をそれぞれ、非結実→1、少→2、中→3、多→4、という順序尺度変数にする。

f2012 <- c(rep(1, 93), rep(2, 46), rep(3, 4), rep(4, 2))
f2013 <- c(rep(1, 5), rep(2, 44), rep(3, 33), rep(4, 63))
f <- c(f2012, f2013)

年をダミー変数にする。2012年→0、2013年→1。

y <- c(rep(0, length(f2012)), rep(1, length(f2013)))

Stanのモデルを定義してコンパイル・サンプリング。各ランクに対応した切片のパラメーターをordered型にして、ordered_logistic()分布を使用する。詳細はマニュアルの 33.6. Ordered Logistic Distribution を参照。

library(rstan)

ordered <- "
data {
  int<lower=1>         N;
  int<lower=1>         K;
  int<lower=0,upper=1> y[N];  // year
  int<lower=1,upper=K> f[N];  // fruiting index
}
parameters {
  real         beta;
  ordered[K-1] c;
}
model {
  for (i in 1:N) {
    f[i] ~ ordered_logistic(beta * y[i], c);
  }
  beta ~ normal(0, 100);
  c ~ normal(0, 100);
}
"

model <- stan_model(model_code = ordered)
fit <- sampling(model, data = list(N = length(f), K = max(f), f = f, y = y))

結果

>  print(fit)
Inference for Stan model: ordered.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta    3.94    0.01 0.38    3.23    3.68    3.93    4.18    4.71  1350    1
c[1]    0.58    0.00 0.18    0.24    0.46    0.59    0.70    0.94  1626    1
c[2]    3.24    0.01 0.35    2.61    3.01    3.22    3.46    3.97  1416    1
c[3]    4.21    0.01 0.37    3.54    3.96    4.19    4.45    4.98  1393    1
lp__ -288.88    0.05 1.45 -292.47 -289.61 -288.57 -287.84 -287.10   920    1

Samples were drawn using NUTS(diag_e) at Wed Jul 16 20:59:14 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

2012年と2013年の、結実が「多」となる確率の予測値を求める。

inv_logit <- function(x) {
  1 / (1 + exp(-x))
}

##
beta <- get_posterior_mean(fit, pars = "beta")[, "mean-all chains"]
c <- get_posterior_mean(fit, pars = c("c[1]", "c[2]", "c[3]"))[, "mean-all chains"]

## probability for rank == 4
inv_logit(c(0, 1) * beta - c[3])

結果