Exercise 38
Parallel bootstrap
As we saw in Assignment 4, we can use random resampling with replacement to simulate drawing samples from a population, and then we can quantitatively assess the probability of the null hypothesis.
Assume we have the following sample of data, x
:
3 5 4 3 6 7 -1 2 -3 -4 2 -5 3 2 -1
and a second sample, y
:
2 7 4 7 6 9 -1 3 -2 -1 3 -1 2 4 2
The mean of the sample \(x\) is 1.533 and the mean of sample \(y\) is 2.933. Let's compute a statistic we'll call \(d_{samp}\) which is the difference between means, \(\bar{y}-\bar{x}\) which in this case is equal to \(d_{samp}=1.400\).
The null hypothesis is that the samples were taken from a single population with the same mean, and so \(d_{pop}=0.00\), and any departure from this in our sample statistic \(d_{samp}\) is due to random sampling error.
Question 1
Use bootstrapping to compute the probability of the null hypothesis, with one million bootstrap (re)samples.
Question 2
Rewrite your solution to question 1, to parallelize the bootstrap. In other words execute the one million bootstrap iterations in parallel, with the work spread over your available compute cores.