R: Stanをためす (3) 欠測値のあるデータ [統計]
Stanで欠測値のあるデータの解析。
missing.stan
data { int<lower=0> N_obs; int<lower=0> N_miss; real x_obs[N_obs]; } parameters { real mu; real<lower=0> sigma; real x_miss[N_miss]; } model { for (i in 1:N_obs) { x_obs[i] ~ normal(mu, sigma); } for (i in 1:N_miss) { x_miss[i] ~ normal(mu, sigma); } mu ~ normal(0, 1.0e+2); sigma ~ uniform(0, 1.0e+4); }
missing.R
## seed of random number set.seed(1234) ## data N_obs <- 36 N_miss <- 4 mu <- 50 sigma <- 10 x_obs <- rnorm(N_obs, mu, sigma) library(coda) library(parallel) ## Stan program <- "missing" # dump data data.file <- paste(program, "_dump.R", sep = "") dump(c("N_obs", "N_miss", "x_obs"), data.file) # initial values init.file <- paste(program, "_init.R", sep = "") mu <- 0 sigma <- 1 x_miss <- rep(0, N_miss) dump(c("mu", "sigma", "x_miss"), init.file) # compile stan.home <- ("~/Documents/src/stan-src-1.0.0") cur.dir <- getwd() setwd(stan.home) system(paste("make ", cur.dir, "/", program, sep = "")) setwd(cur.dir) warmup <- 1000 iter <- 20000 + warmup thin <- 10 run <- function(chain) { cmd <- paste("./", program, " --data=", data.file, " --init=", init.file, " --iter=", iter, " --warmup=", warmup, " --thin=", thin, " --chain_id=", chain, " --samples=", program, ".chain", chain, ".csv", sep = "") system(cmd) s <- read.csv(paste(program, ".chain", chain, ".csv", sep = ""), comment.char = "#") mcmc(s, start = warmup + 1, thin = thin) } samples <- as.mcmc.list(mclapply(1:3, run))
結果
> summary(samples[, c("mu", "sigma", + "x_miss.1", "x_miss.2", "x_miss.3", "x_miss.4")]) Iterations = 1001:20991 Thinning interval = 10 Number of chains = 3 Sample size per chain = 2000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE mu 46.570 1.575 0.02034 0.01888 sigma 9.334 1.177 0.01520 0.01550 x_miss.1 46.726 9.658 0.12469 0.10932 x_miss.2 46.610 9.579 0.12366 0.11869 x_miss.3 46.566 9.591 0.12382 0.13662 x_miss.4 46.445 9.620 0.12420 0.12369 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% mu 43.542 45.537 46.565 47.58 49.73 sigma 7.403 8.513 9.204 10.03 11.97 x_miss.1 27.253 40.572 46.922 53.05 65.82 x_miss.2 27.645 40.324 46.619 52.86 65.42 x_miss.3 27.944 40.281 46.573 53.00 65.63 x_miss.4 27.444 40.026 46.583 52.75 65.22
データにNAを入れるとエラーになる
Exception: variable x_obs not found. Diagnostic information: Dynamic exception type: std::runtime_error std::exception::what: variable x_obs not found.
コメント 0