階層化ブートストラップ：Taglibro de H：SSブログ

	ブログをはじめるログイン

階層化ブートストラップ　[統計] [編集]

Rのboot()関数には、strataという引数があって、データが多標本からなるときに対応しているのですが、実際どのようになるのか実験してみました。

bootとggplot2を使用します。

library(boot)
library(ggplot2)

模擬データを生成します。

n1 <- 160
n2 <- 40

set.seed(11)
x1 <- rnorm(n1, -1, 0.5)
x2 <- rnorm(n2, 1, 1)
df <- data.frame(x = c(x1, x2),
                 group = rep(c("1", "2"), c(n1, n2)))
mean(df$x)

平均値は-0.581です。

[1] -0.5811912

グラフにしてみます。

p <- ggplot(df) +
  geom_histogram(aes(x = x, fill = group), position = "dodge",
                 binwidth = 0.2)
print(p)

こういうデータの平均にどういう意味があるのかという問題もあるかもしれませんが、ここではあくまで例題です。

ブートストラップで平均値を推定します。strata引数を指定しないときと、指定したときとを比較します。

set.seed(12)
out1 <- boot(df, function(d, i) mean(df[i, ]$x), 2000)
out2 <- boot(df, function(d, i) mean(df[i, ]$x), 2000,
           strata = df$group)

print(out1)
boot.ci(out1, conf = 0.95, type = "bca")

print(out2)
boot.ci(out2, conf = 0.95, type = "bca")

結果です。

> print(out1)

ORDINARY NONPARAMETRIC BOOTSTRAP


Call:
boot(data = df, statistic = function(d, i) mean(df[i, ]$x), R = 2000)


Bootstrap Statistics :
      original       bias    std. error
t1* -0.5811912 -0.001101127  0.07504282
> boot.ci(out1, conf = 0.95, type = "bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates

CALL : 
boot.ci(boot.out = out1, conf = 0.95, type = "bca")

Intervals : 
Level       BCa          
95%   (-0.7195, -0.4240 )  
Calculations and Intervals on Original Scale
> 
> print(out2)

STRATIFIED BOOTSTRAP


Call:
boot(data = df, statistic = function(d, i) mean(df[i, ]$x), R = 2000, 
    strata = df$group)


Bootstrap Statistics :
      original       bias    std. error
t1* -0.5811912 0.0009909305  0.04066183
> boot.ci(out2, conf = 0.95, type = "bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates

CALL : 
boot.ci(boot.out = out2, conf = 0.95, type = "bca")

Intervals : 
Level       BCa          
95%   (-0.6590, -0.5013 )  
Calculations and Intervals on Original Scale

strata引数を指定して、各群から別々にリサンプリングした方が、標準誤差が小さく、95%信用区間も狭くなりました。