- February 17, 2021

1 Assignement 1 MAT 5192 Winter 2021 Hand-in on February 15, 2021 (Please work in groups of 2) 1. (10 marks) Let U be a population of size N. From U, we first select a SRSWOR, 1S , of size 1n . Then, from 1S , we select a SRSWOR, 2S , of size 2n . Show that 2S is a SRSWOR of size 2n selected from U. 2. (20 marks) Let U be a population of size N. From U, we first select a SRSWOR, 1S , of size 1n . Then, from U \ 1S , we select a SRSWOR, 2S , of size 2n . The final sample is 1 2S S S= of size 1 2n n n= + . Let 1 1 1 1 k k S y y n = and 2 2 2 1 .k k S y y n = (a) (5 marks) Show that S is a SRSWOR of size n selected from U. What is interesting about this result? (b) (10 marks) Show that ( ) 2 2 2 2 1 ,yp Sn V y N n = − where ( ) 22 1 1y kk U S y Y N = − − with 1 .k k U Y y N = Hint: You may want to show first that 2S is a SRSWOR from U. (c) (5 marks) Use (a) and (b) to determine ( )1 2, .pCov y y 3. (15 marks) The files Parishes_pop shows data on a population of parishes in Quebec. The file Parishes_sample shows data from a SRSWOR of size 50 selected from the population. (a) (5 marks) From the sample data, estimate the total number of births, ,yt in the population of parishes and construct a 95% confidence interval for .yt (b) (5 marks) The parish no. 71105 (Saint-Lazare) seems to be particularly large: its population size (in 2001) as well as its number of births are both larger than the corresponding population means. An analyst thinks that this parish should always be included in the sample and that the sample should be completed with 49 parishes selected from the 209 remaining parishes according to SRSWOR. The analyst thinks that this procedure will lead to a more precise estimator. Use the population data to confirm his belief. (c) (5 marks) Using the sample data, estimate the proportion yP of parishes in which there had been 10 births or more and determine a 95% confidence interval for .yP 2 4. (20 marks) From a population U, a sample S, of size n, is selected according to a given sampling design, ( ).p S Let N be a finite population parameter and let ˆ be an estimator of .N As a measure of influence of a unit, we consider the conditional bias of unit k defined as ( )ˆ | 1 .k p N kB E I = − = (a) (7 marks) If N yt and ,ˆ ˆ ,yt show that ( )1 = 1 1 . l k kl kl k l k k l l U l Uk l k l B y d y y = − − + − What is the value of kB when 1?k = Interpret this result. (b) (3 marks) Express the design variance of ,yˆt (for a fixed- or random-size sampling design) as a function of ,kB 1,…, .k N= (c) (5 marks) Give the expression of kB in the cases of simple random sampling without replacement and Bernoulli sampling. (d) (5 marks) Suggest a conditionally p-unbiased estimator of ,kB ˆ ,kB in the sense that ( )ˆ | 1 .k p k kB E B I= = 5. (10 marks) Let S be a sample selected by a Bernoulli design with probability . Let sn denote the random size of s. Show that the conditional probability of obtaining S given sn is the same as the probability of a SRSWOR of the fixed size sn from N. 6. (15 marks) Let S be a sample realized by the Bernoulli design with probability . (a) (7 marks) To estimate the population mean 1 ,k k U Y N y− = we consider the following estimator: * if 1 0 if 0 s s y n y n = = where 1 k k Ss y y n = denote the sample mean of the y-values in the sample. Show that the relative bias of *y is given by ( ) ( ) ( ) ( ) * * 0 1 . Np p s E y Y RB y P n Y − = = − = = − − Note that the relative bias of *y is negligible if the population size N is large. 3 (b) (8 marks) To estimate the population variability of the y-values, ( ) 22 1 , 1y kk U S y Y N = − − we consider the following estimator: 2 2* if 2 0 if 1 y s y s s n s n = where ( ) 22 1 . 1y kk Ss s y y n = − − Show that the relative bias of 2*ys is given by ( ) ( ) ( ) ( ) ( ) 2* 2 12* 2 1 1 1 1 . Np y y p y s y E s S RB s P n N S −− = = − = − − + − Once again, note that the relative bias of 2*ys is negligible if the population size N is large. 7. (40 marks) Suppose that we want to estimate a population total .y k k U t y = We select a simple random sample with replacement of fixed size m from U. We propose the following estimator of :yt ,ˆ ,y HH k k k U N t Q y m = where kQ is the number of times that unit k is selected in the sample, k = 1,…, N. (a) (5 marks) Let ( )1,…, NQ Q=Q . What is the distribution of Q? Deduce ( ) ,p kE Q ( )p kV Q and ( ),p k lCov Q Q for .k l (b) (5 marks) Show that ,yˆ HHt is design-unbiased for .yt (c) (5 marks) Show that ( ) 2 2 , ˆ ,p y HH y N V t S m = where ( ) 22 1 .y k k U S y Y N = − (d) (5 marks) Find a design-unbiased estimator of ( ),ˆp y HHV t . (e) (20 marks) We consider the following alternative estimator of yt : ,ˆ ,y alt k k Ss N t y n = 4 where sn m denotes the number of distinct elements and S denotes the sample of distinct elements. Note that sn is a random variable. (i) (5 marks) We can show that, given ,sn s is a SRSWOR from the population. Une this fact to show that ,yˆ altt is design-unbiased for yt . (ii) (5 marks) Show that the variance of ,yˆ altt is given by ( ) 2 2, 1 1 ˆ sp y alt n y s V t N E S n N = − . (iii) (10 marks) Consider the following variance estimators: 2 21 1 1ˆ y s V N s n N = − and ( ) 2 2 1 1ˆ 1 1 ys s s sn n V NE N n − = − − , where 1 2 1 2 1 1 m m N N = − − + − and ( ) 22 1 1y kk Ss s y y n = − − with 1 .k k Ss y y n = Show that both 1Vˆ and 2Vˆ are design-unbiased for ( ),ˆp y altV t . Hint: You can use the following identities: ( ) 1 1 s m n s N E n N N − = − and ( ) ( ) ( ) ( ) ( ) 2 1 1 2 2 1 2 1 1 . s m m m n s m m m N N N V n N N N N− − − − − − = + − − 8. (40 marks) Construct a simulation program for comparing the performance of the Horvitz- Thompson estimator in terms of bias and efficiency under three sampling designs: simple random sampling without replacement, systematics sampling and Bernoulli sampling. Use the three populations (pop1, pop2 and pop3) provided. As a measure of bias, compute the Monte Carlo relative bias of the Horvitz-Thompson and, as a measure of efficiency, compute its Monte Carlo mean square error. Compute the Monte Carlo design effect for each sampling design. Use n = 100. Discuss the Monte Carlo results. Are the results consistent with the theory? 欢迎咨询51作业君