STAC51H3:Categorical Data Analysis assignment 代写

100%原创包过,高质代写&免费提供Turnitin报告--24小时客服QQ&微信：120591129

STAC51H3:Categorical Data Analysis

Assign 1

Due: Thu Sep 28, 2017 in class

All relevant work must be shown for credit.

Note: In any question, if you are using R, all R codes and R outputs must be

included in your answers. You should assume that the reader is not familiar

with R outputs and so explain all your findings, quoting necessary values form

your outputs.

Whenever you are using an R command for generating random numbers, set

seed to 123. This can be done by simply adding the command set.seed(123)

before the your R command for generating the random number.

Please note that academic integrity is fundamental to learning and scholarship.

You may discuss questions with other students. However, the work you submit

should be your own. If I feel suspicious of any assignment (e.g. if your work

doesn’t appear to be consistent with what we have discussed in class), I will not

mark the assignment. Instead, I will ask you to present your work in my office

and your grade will be assigned based on your presentation.

Total points for this assignment: 45

1. Let Y ∼ Bin(n,π), where n = 20 and π = 0.8. Y can be interpreted as the number of

successes in a sample of size n = 20 from a Bernoulli distribution with probability of

success π = 0.8.

(a) (12 points) y = 15 is an observed value of Y where Y ∼ Bin(n,π), where n = 20

and π = 0.8. Calculate the Wald , score (i.e. Wilson’s method), Agresti-Coull and

Clopper-Pearson 95 percent confidence intervals for π.

In this part (i.e all confidence intervals in part a ) do not use R or many computer

package. Show your work clearly.

(b) (3 points) Calculate a 95% confidence interval for π based on likelihood ratio test.

(For this part you may use the R code we discussed in class but do not use any R

functions that give the confidence interval directly.)

2. Observed (or true) coverage and the targeted coverage probabilities of confidence in-

tervals are not necessarily equal. In this question we will calculate the observed (or

true) coverage probability of Wald confidence intervals using two methods: Mote Carlo

simulation and direct calculation.

(a) (5 points) (Monte Carlo simulation) Generate N = 1000000 observations on Y

where Y ∼ Bin(n,π), where n = 20 and π = 0.8. From each observation gener-

ated, calculate a Wald 95% confidence interval for the population proportion (π).

(Note: This means you are calculating 1000000 confidence intervals). Calculate the

Question 2 continues on the next page...

Page 2 of 2

proportion of these Wald intervals that contain 0.8 (the value of π). Comment on

your results.

(b) (5 points) (Direct calculation) In order to calculate the coverage probability for a

known value of π, calculate a confidence interval for every possible value of y (y =

0,...,n) and check whether true value of the parameter is in the confidence interval

calculated. Identify those confidence intervals that contain the true parameter. For

example if the interval with n = 20, y = 5 contains the true value of π (say

π = 0.8), then the probability for that interval is P(y = 5) =

? 20

× 0.8 5 × 0.2 20−5 .

The coverage probability is the sum of all these probabilities for the intervals that

contain π (in this example 0.2). Use this way to calculate the coverage probability

of 95% Wald confidence intervals cased on a sample of size n = 20 if the true value

of π is 0.8.

3. In this question also we will calculate and plot the true coverage probabilities of Wald

confidence intervals for proportions (i.e. Binomial parameter) based on a sample of

given size (n), but this time we calculate the coverage probabilities for many values of

π making a plot of coverage probably versus π.

(a) (5 points) For a Bernoulli sample of size n = 25, use the method in part (b) of

the previous question (i.e. direct calculation) to calculate the coverage probability

of a 95% confidence interval for π = 0.01,0.02,...,0.99 and plot them against π.

Draw a horizontal line through the target probability 0.95. Comment on what you

learned from your plot.

(b) (5 points) Repeat part (a) above with n = 100 and plot both the curves on the

same plot. Compare and comment on your findings.

confidence intervals and plot the coverage probabilities versus π for all four confi-

dence intervals on one graph (i.e all four curves on the same system of axes). Use

four different colours for easy comparison. Compare and comment on your results.

(Note that in this part, we are using the same values as in aprt (a) above, i.e n = 25,

95% confidence interval and π = 0.01,0.02,...,0.99 )