Bayesian Statistics and Computing Problem Set 3

Possible points: 100

- Problem 1 [60 pts]Use the same dataset as in the previous homework (Mendenhall et al. 1989):

x = c(21, 24, 25, 26, 28, 31, 33, 34, 35, 37, 43, 49, 51, 55, 25, 29, 43, 44, 46, 46, 51, 55, 56, 58)

y = c(rep(1, 14), rep(0, 10))

where yi represents the response (0 or 1) for each of the 24 patients as a function of the number of days

of radiotherapy (xi) that patient received. (Note the above syntax means that the first 14 outcomes

were ones and the last 10 were zeros.) We have a Bayesian logistic regression model:

logit(P(Yi = 1)) = α0 + α1Xi

with the following structure:

α0 ∼ N(δ0, 2)

α1 ∼ N(δ1, 2)

δ0 ∼ N(0, 10)

δ1 ∼ N(0, 10)

(a) Find the analytic expression for the 4-dimensional joint posterior density for the 4 parameters of

the logistic regression model above (this was done in the previous homework, so just repeat this

step.

(b) Based on the joint posterior above, find the individual one-dimensional full-conditional densities

for each of the 4 parameters.

(c) Using the above full-conditional densities, implement the Gibbs sampler. Be specific about how

to sample from each individual 1-dimensional full-conditional. Provide a detailed pseudo code for

this implementation.

(d) Run the Gibbs sampler for 10,000 iterations (please write the entire sampler yourself). Present

the results, and diagnostics (you can use CODA or some other diagnostics library.)

(e) Describe what you did to determine the burn-in period for the sampler above.

(f) Show the ACF and describe what you did to choose the thinning parameter

(g) Would you say your sampler converged after 10,000 iterations? If not, run the sampler until

convergence seems to have been achieved.

(h) Using the final thinned set of 4-dimensional samples, plot 4 separate histograms (estimated

marginal densities), one for each of the 4 parameters.

(i) Find the marginal posterior modes, means and 95% central credible intervals for each of the 4

parameters.

(j) For α0 and α1, comment on how similar the results above are to the results you got on the previous

homework using the normal and importance sampling approximations

- Problem 2 [40 pts]Recall the genetic linkage example: animals are distributed into 4 categories: Y = (y1, y2, y3, y4)

according to the genetic linkage model, where the probabilities of falling into the 4 categories are

given as:

((2 + θ)/4, (1 1 θ)/4, (1 1 θ)/4, θ/4).

For data Y = (125, 18, 20, 34), do the following:

(a) Design the random-walk Metropolis-Hastings algorithm to obtain a Monte Carlo approximation

to the posterior density of θ. Provide a detailed pseudo-code used for this sampler.

(b) Implement the above sampler, using 2 different MH transition kernels: N(0, 0.1) and N(0, 0.01).

(The numbers in parentheses are the mean and standard deviation.) Please write the entire

sampler yourself.

(c) Comment on the performance of each: compare trace plots and convergence diagnostics (You can

use CODA or some other diagnostics library.). What are the acceptance rates like?

(d) Suggest an alternative transition kernel that should perform better than any of the kernels above.

(e) Using the kernel of your choice, run the MH algorithm for 10,000 iterations. Report on the results

and diagnostics. Again, please write the sampler yourself, though you can use CODA or some

other diagnostics library.

(f) What was the acceptance rate?

(g) Describe what you did to determine the burn-in period for the sampler above.

(h) Show the ACF and describe what you did to choose the thinning parameter

(i) Would you say your sampler converged after 10,000 iterations? If not, run the sampler until

convergence seems to have been achieved.

(j) Using the final thinned set of samples, plot the histogram of the estimated posterior density of θ.

(k) Find the posterior mode, mean and 95% central credible interval for θ.