MCMC再訪(3): 混合モデル：Taglibro de H：SSブログ

	ブログをはじめるログイン

MCMC再訪(3): 混合モデル　[統計] [編集]

ブロック(1–8)ごとにとられたデータx₁, x₂, yに、y = β + &beta₁x₁ + &beta₂x₂という関係があると想定されるとき、パラメータβ, &beta₁, &beta₂と、ブロックによるランダムエフェクトを推定する。

データ。sample3.csvというファイルに保存しておく。

"block", "x1", "x2", "y"
1, 5.89, 3.09, 5.58
1, 3.96, 2.88, 4.88
1, 4.18, 3.05, 3.88
1, 3.45, 2.89, 1.23
1, 3.07, 2.91, 1.13
2, 5.81, 2.68, 8.28
2, 3.31, 3.00, 3.62
2, 4.53, 3.06, 4.61
2, 6.10, 2.98, 6.74
2, 4.98, 3.39, 5.55
3, 6.09, 2.72, 8.75
3, 6.70, 3.00, 9.29
3, 3.80, 3.21, 3.16
3, 3.72, 3.04, 4.95
3, 3.95, 3.01, 3.62
4, 3.85, 3.16, 4.39
4, 5.36, 3.03, 6.20
4, 4.95, 2.77, 6.15
4, 4.30, 3.02, 5.30
4, 5.24, 3.13, 5.86
5, 3.72, 2.88, 4.51
5, 5.48, 3.24, 4.35
5, 4.96, 2.89, 5.08
5, 3.91, 3.24, 3.51
5, 4.63, 2.74, 4.72
6, 7.20, 3.32, 9.87
6, 4.17, 3.12, 4.45
6, 5.07, 3.35, 4.33
6, 3.11, 3.23, 2.99
6, 5.22, 2.82, 7.38
7, 4.30, 2.73, 5.57
7, 4.91, 3.01, 4.72
7, 6.14, 2.64, 7.54
7, 6.64, 2.90, 8.31
7, 5.46, 3.23, 5.78
8, 2.73, 2.92, 2.81
8, 3.58, 2.86, 2.63
8, 7.04, 3.14, 8.84
8, 4.79, 2.95, 5.28
8, 4.90, 2.77, 5.26

BUGSによるモデル。sample3.bugというファイルに保存しておく。

var
  M,                    # number of blocks
  N,                    # number of data in a block
  X1[N, M], X2[N, M],    # data
  Y[N, M],
  e[M],                 # random effect
  beta, beta.x1, beta.x2,  # parameters
  tau, tau.e,
  sigma, sigma.e;

model {
  for (j in 1:M) {
    for (i in 1:N) {
      Y[i, j] ~ dnorm(mu[i, j], tau);
      mu[i, j] <- beta + beta.x1 * X1[i, j] +
                         beta.x2 * X2[i, j] + e[j];
    }
    e[j] ~ dnorm(0, tau.e);
  }

  ## priors
  beta ~ dnorm(0, 1.0E-6);
  beta.x1 ~ dnorm(0, 1.0E-6);
  beta.x2 ~ dnorm(0, 1.0E-6);
  tau ~ dgamma(1.0E-3, 1.0E-3);
  tau.e ~ dgamma(1.0E-3, 1.0E-3);

  sigma <- 1/sqrt(tau);
  sigma.e <- 1/sqrt(tau.e);
}

行列(2次元配列)を使用するときは変数の宣言が必要である。それ以外の変数の宣言は省略できる。

外側のforブロック(for (j in 1:M) { ... }が各ブロックに相当し、内側のforブロック(for (i in 1:N) { ... }が各ブロック内のデータに相当する。外側のブロックにあるe[j] ~ dnorm(0, tau.e);がランダムエフェクトをモデル化した部分。ランダムエフェクトeは、平均0、分散1/tau.eの正規分布に従うとしている。tau.eが、パラメータeの分布を規定する超パラメータとなる。

以下、上のファイルを使用したRコード。

library(R2WinBUGS)

R2WinBUGSを使用。

data <- read.csv("sample3.csv")
n.block <- 8    # number of blocks
n.data <- 5	    # number of data in a block

データを用意。

x1 <- matrix(data$x1, nrow = n.data, ncol = n.block)
x2 <- matrix(data$x2, nrow = n.data, ncol = n.block)
y <- matrix(data$y, nrow = n.data, ncol = n.block)

データを、BUGSコードで定義したモデルにあうように変換する。ここでは、各列がブロックになるような行列にしている。

platform <- .Platform$OS.type
if (platform == "unix") {
  useWINE = TRUE
  ## check Darwine
  darwine.bin <- "/Applications/Darwine/Wine.bundle/Contents/bin"
  if (file.exists(darwine.bin)) {
    wine <- paste(darwine.bin, "wine", sep = "/")
    winepath <- paste(darwine.bin, "winepath", sep = "/")
  } else {
    ## use `which'
    wine <- system("which wine", intern = TRUE)
    winepath <- system("which winepath", intern = TRUE)
    if (wine == "" || winepath == "") {
      stop("Wine not found.")
    }
  }
  bugs.dir <- paste(Sys.getenv("HOME"),
                    ".wine/drive_c/Program Files/WinBUGS14",
                    sep = "/")
  if (!file.exists(bugs.dir)) {
    stop("bugs directory not found.")
  }
} else if (platform == "windows") {
  ## Windows
  useWINE = FALSE
  bugs.dir <- "C:/Program Files/WinBUGS14"
  if (!file.exists(bugs.dir)) {
    stop("bugs directory not found.")
  }
  wine <- NULL
  winepath <- NULL
}

bugs()関数に渡す変数の設定。

## number of chains
n.chains <- 3

## initial values
init1 <- list(beta =  0, beta.x1 = 10, beta.x2 = 2,
              tau = 0.1, tau.e = 0.1)
init2 <- list(beta =  5, beta.x1 = 5,  beta.x2 = 5,
              tau = 1,   tau.e = 1)
init3 <- list(beta = 10, beta.x1 = 0,  beta.x2 = 0,
              tau = 10,  tau.e = 10)

chainの数と、初期値の設定。

post.samp <- bugs(data = list(X1 = x1, X2 = x2, Y = y,
                              M = n.block, N = n.data),
                  inits = list(init1, init2, init3),
                  parameters.to.save = c("beta", "beta.x1", "beta.x2",
                                         "sigma", "sigma.e", "e"),
                  model = "sample3.bug",
                  n.chains = n.chains,
                  n.iter = 21000,
                  n.burnin = 1000,
                  n.thin = 10,
                  debug = FALSE,
                  bugs.directory = bugs.dir,
                  working.directory = getwd(),
                  clearWD = TRUE,
                  useWINE = useWINE,
                  newWINE = TRUE,
                  WINE = wine,
                  WINEPATH = winepath)

bugs()からWinBUGSを呼び出す。

print(post.samp, digits = 3)

結果の表示。

Inference for Bugs model at "sample3.bug", fit using WinBUGS,
 3 chains, each with 21000 iterations (first 1000 discarded), n.thin = 10
 n.sims = 6000 iterations saved
           mean    sd   2.5%    25%    50%     75%   97.5%  Rhat n.eff
beta      3.072 2.317 -1.527  1.549  3.079   4.607   7.639 1.002  1700
beta.x1   1.608 0.121  1.376  1.528  1.605   1.686   1.848 1.001  6000
beta.x2  -1.826 0.740 -3.325 -2.309 -1.825  -1.345  -0.368 1.002  1600
sigma     0.822 0.112  0.635  0.742  0.812   0.888   1.068 1.001  4500
sigma.e   0.349 0.246  0.034  0.152  0.311   0.486   0.949 1.002  1400
e[1]     -0.394 0.382 -1.271 -0.640 -0.315  -0.080   0.103 1.002  1500
e[2]      0.108 0.273 -0.394 -0.044  0.064   0.246   0.736 1.004  1300
e[3]      0.232 0.301 -0.218  0.019  0.166   0.412   0.919 1.001  6000
e[4]      0.168 0.284 -0.283 -0.014  0.110   0.319   0.821 1.003  1400
e[5]     -0.194 0.294 -0.892 -0.352 -0.133   0.001   0.291 1.001  6000
e[6]      0.244 0.316 -0.226  0.017  0.172   0.430   0.987 1.001  3700
e[7]     -0.088 0.274 -0.724 -0.226 -0.050   0.057   0.424 1.001  6000
e[8]     -0.078 0.272 -0.699 -0.210 -0.040   0.072   0.436 1.002  1800
deviance 97.055 5.763 86.950 92.520 97.075 101.200 108.300 1.001  2800

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = Dbar-Dhat)
pD = 7.8 and DIC = 104.8
DIC is an estimate of expected predictive error (lower deviance is better).