- May 15, 2020

Data Mining MATH5836 Quiz 1Statistics and Linear Algebra Revision1. Suppose that X, Y, Z are random variables such that Y = E[X |Z]. ShowthatVar(Y ) ≤ Var(X).2. The skewness of a random variable X with mean µ and standard deviationσ is defined as E(X − µ)3/σ3. Suppose that X has zero skewness.(a) Show thatEX2 = E[(X − µ)2 + µ2],so that both X2 and (X−µ)2+µ2 are unbiased estimators of the samequantity.(b) Show thatVar(X2) ≥ Var((X − µ)2 + µ2).By what factor is the variance of (X−µ)2+µ2 smaller than the varianceof X2?3. Show the formula for the expectation:E|X| =∫ ∞0P[|X| > t]dt.Hint: For a positive random variable X with density fX , use the fact that∫ ∞0P[X > t]dt =∫ ∞0∫ ∞tfX(y)dydt =∫ ∞0∫ y0fX(y)dtdy.4. Let X ∼ N (0,A) and Y ∼ N (0,B) be two independent multivariatevectors. The Hadamard product of A and B is matrix C = A B suchthat cij = aij× bij. Suppose that Z = XY (so that Zi = Xi×Yi). ShowthatCov(Z) = C.5. Suppose that A and B are positive definite matrices of the same size. Ex-plain why AB is also a positive definite matrix.16. Let Σ = QDQ> be the diagonalization of the covariance matrix Σ ∈ Rn×n(see Theorem A.8 on page 366), where D = diag(λ) is a diagonal matrixwith the e-values of Σ. Show thatΣ11…Σnn = (QQ)λ .1This result is known as Schur’s Product Theorem.17. (Page 360 in book) The Moore-Penrose pseudo-inverse of a real matrixX ∈ Rn×p in a real vector space is defined as the matrix X+ satisfying theconditions:(a) XX+X = X(b) X+XX+ = X+(c) (XX+)> = XX+(d) (X+X)> = X+XNote that such a matrix always exists. Suppose that β̂ = X+y. Show thatfor any β ∈ Rd,‖Xβ̂ − y‖22 ≤ ‖Xβ − y‖22In other words, show that Xβ̂ is the projection of y onto the space spannedby the columns of X. What is an explicit expression for X+ when X is fullrank?8. (Section A.6.4 on page 375 in book) Let x1, . . . ,xp be linearly independentvectors in Rn with the so-called inner-product between any two vectorsdefined here as 〈x,y〉 = x>y (the so-called norm is defined as ‖x‖2 =〈x,x〉). Consider the following so-called Gram-Schmidt recursive procedurefor k = 1, . . . , p:zk = xk −k−1∑j=1〈xk,uj〉ujuk = zk/‖zk‖Show that u1, . . . ,up are orthonormal, that is, 〈ui,uj〉 = I{i = j} (the lastbeing the indicator function of the condition i = j). Show that the matrixX with columns x1, . . . ,xp can be written as X = QR, whereR> =〈x1,u1〉 0 0 · · · 0〈x2,u1〉 〈x2,u2〉 0 · · · 0……… · · · …〈xp,u1〉 〈xp,u2〉 〈xp,u3〉 · · · 〈xp,up〉andQ = [u1, . . . ,up]is an n×p matrix. This is called the QR decomposition of matrix X. Showthat the least squares solution can be written asβ̂ = R−1Q>Y29. Let A be an n × n matrix and X an n-dimensional random vector withexpectation µ and covariance matrix Σ. Show that E[X>AX] = µ>Aµ+tr(AΣ) (see Theorem C.2 on page 430).10. Suppose thatX =[XpXq]∼ N (µ,Σ),and that we partition X,µ and Σ as in (C.29) in Theorem C.7 (page 436in textbook). Show thatXq + Σ>r Σ−1p (xp −Xp)has the same distribution as (Xq |Xp = xp) (see Theorem C.8 on page436).Hint: WriteXq −Σ>r Σ−1p Xp =[−Σ>r Σ−1p , Iq]Xand calculate the covariance matrix of the right-hand side.3