STAC51H3:Categorical Data Analysis
	
		Assign 1
	
		Due: Thu Sep 28, 2017 in class
	
		All relevant work must be shown for credit.
	
		Note: In any question, if you are using R, all R codes and R outputs must be
	
		included in your answers. You should assume that the reader is not familiar
	
		with R outputs and so explain all your findings, quoting necessary values form
	
		your outputs.
	
		Whenever you are using an R command for generating random numbers, set
	
		seed to 123. This can be done by simply adding the command set.seed(123)
	
		before the your R command for generating the random number.
	
		Please note that academic integrity is fundamental to learning and scholarship.
	
		You may discuss questions with other students. However, the work you submit
	
		should be your own. If I feel suspicious of any assignment (e.g. if your work
	
		doesn’t appear to be consistent with what we have discussed in class), I will not
	
		mark the assignment. Instead, I will ask you to present your work in my office
	
		and your grade will be assigned based on your presentation.
	
		Total points for this assignment: 45
	
		1. Let Y ∼ Bin(n,π), where n = 20 and π = 0.8. Y can be interpreted as the number of
	
		successes in a sample of size n = 20 from a Bernoulli distribution with probability of
	
		success π = 0.8.
	
		(a) (12 points) y = 15 is an observed value of Y where Y ∼ Bin(n,π), where n = 20
	
		and π = 0.8. Calculate the Wald , score (i.e. Wilson’s method), Agresti-Coull and
	
		Clopper-Pearson 95 percent confidence intervals for π.
	
		In this part (i.e all confidence intervals in part a ) do not use R or many computer
	
		package. Show your work clearly.
	
		(b) (3 points) Calculate a 95% confidence interval for π based on likelihood ratio test.
	
		(For this part you may use the R code we discussed in class but do not use any R
	
		functions that give the confidence interval directly.)
	
		2. Observed (or true) coverage and the targeted coverage probabilities of confidence in-
	
		tervals are not necessarily equal. In this question we will calculate the observed (or
	
		true) coverage probability of Wald confidence intervals using two methods: Mote Carlo
	
		simulation and direct calculation.
	
		(a) (5 points) (Monte Carlo simulation) Generate N = 1000000 observations on Y
	
		where Y ∼ Bin(n,π), where n = 20 and π = 0.8. From each observation gener-
	
		ated, calculate a Wald 95% confidence interval for the population proportion (π).
	
		(Note: This means you are calculating 1000000 confidence intervals). Calculate the
	
		Question 2 continues on the next page...
	
		Page 2 of 2
	
		proportion of these Wald intervals that contain 0.8 (the value of π). Comment on
	
		your results.
	
		(b) (5 points) (Direct calculation) In order to calculate the coverage probability for a
	
		known value of π, calculate a confidence interval for every possible value of y (y =
	
		0,...,n) and check whether true value of the parameter is in the confidence interval
	
		calculated. Identify those confidence intervals that contain the true parameter. For
	
		example if the interval with n = 20, y = 5 contains the true value of π (say
	
		π = 0.8), then the probability for that interval is P(y = 5) =
	
		? 20
	
		5
	
		?
	
		× 0.8 5 × 0.2 20−5 .
	
		The coverage probability is the sum of all these probabilities for the intervals that
	
		contain π (in this example 0.2). Use this way to calculate the coverage probability
	
		of 95% Wald confidence intervals cased on a sample of size n = 20 if the true value
	
		of π is 0.8.
	
		3. In this question also we will calculate and plot the true coverage probabilities of Wald
	
		confidence intervals for proportions (i.e. Binomial parameter) based on a sample of
	
		given size (n), but this time we calculate the coverage probabilities for many values of
	
		π making a plot of coverage probably versus π.
	
		(a) (5 points) For a Bernoulli sample of size n = 25, use the method in part (b) of
	
		the previous question (i.e. direct calculation) to calculate the coverage probability
	
		of a 95% confidence interval for π = 0.01,0.02,...,0.99 and plot them against π.
	
		Draw a horizontal line through the target probability 0.95. Comment on what you
	
		learned from your plot.
	
		(b) (5 points) Repeat part (a) above with n = 100 and plot both the curves on the
	
		same plot. Compare and comment on your findings.
	
		(c) (10 points) Repeat part (a) for Wald, Wilson, Agresti-Coull and Clopper-Preason
	
		confidence intervals and plot the coverage probabilities versus π for all four confi-
	
		dence intervals on one graph (i.e all four curves on the same system of axes). Use
	
		four different colours for easy comparison. Compare and comment on your results.
	
		(Note that in this part, we are using the same values as in aprt (a) above, i.e n = 25,
	
		95% confidence interval and π = 0.01,0.02,...,0.99 )