- August 18, 2020

STA261 – Lecture 8 Interval Estimation: Part II Rob Zimmerman University of Toronto July 29, 2020 Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 1 / 15 Example 9.3.1: Optimizing the Length of a Confidence Interval Let X1, X2, . . . , Xn iid∼ N ( µ, σ2 ) , µ ∈ R, σ2 known. Among all the 1− α confidence intervals for µ, can we find the one with the minimum length? From last lecture, we know that to find a 1− α confidence interval, we can use the pivot X−µ σ/ √ n ∼ N (0, 1) and find constants a and b such that P (a ≤ Z ≤ b) = 1− α, which would yield the 1− α confidence interval[ X − b σ√ n , X − a σ√ n ] . Absorbing the constant σ√ n into a and b, this is the same as finding a and b such that 1− α = P (a ≤ Z ≤ b) and (b− a) is minimized. Previously, we chose a = −zα/2 and b = zα/2 without any particular justification, although the symmetry of the Normal cdf lends some intuition towards this choice. It turns out that splitting the probability α equally is optimal in this case, although that’s not always the case. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 2 / 15 Theorem 9.3.2: Shortest Intervals for Unimodal Densities Let f(x) be a unimodal pdf. Suppose the interval [a, b] satisfies the following three conditions: 1 ∫ b a f(x) dx = 1− α 2 f(a) = f(b) > 0 3 a ≤ x∗ ≤ b, where x∗ is a mode of f(x) Then [a, b] is the shortest among all intervals that satisfy ∫ b a f(x) dx = 1− α. Proof. Let [a′, b′] be any interval with shorter length (that is, b′ − a′ < b− a). Suppose also that a′ ≤ a. Our goal is to show that ∫ b′ a′ f(x) dx < 1− α. We break the proof into two cases: one for b′ ≤ a, and the other for b′ > a. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 3 / 15 Theorem 9.3.2: Shortest Intervals for Unimodal Densities (Continued) For the first case, suppose that b′ ≤ a. Then a′ ≤ b′ ≤ a ≤ x∗, so that∫ b′ a′ f(x) dx ≤ ∫ b′ a′ f(b′) dx since x ≤ b′ ≤ x∗ =⇒ f(x) ≤ f(b′) = f(b′)(b′ − a′) ≤ f(a)(b′ − a′) since b′ ≤ a ≤ x∗ =⇒ f(b′) ≤ f(a) < f(a)(b− a) = ∫ b a f(a) dx ≤ ∫ b a f(x) dx since f(a) ≤ f(x) for x ∈ [a, b] by Condition 3 = 1− α. For the second case, now suppose that b′ > a. Then a′ ≤ a < b′ < b, so that∫ b′ a′ f(x) dx = ∫ b a f(x) dx︸ ︷︷ ︸ = 1−α + [∫ a a′ f(x) dx− ∫ b b′ f(x) dx ] . Rob Zimmerman (University of Toronto) STA261 - Lecture 8 July 29, 2020 4 / 15 Theorem 9.3.2: Shortest Intervals for Unimodal Densities (Continued) If we can show that the expression in the square brackets above is negative, then we’ll be done. To that end, the unimodality of f and a′ ≤ a < b′ < b gives us∫ a a′ f(x) dx ≤ ∫ a a′ f(a) dx = f(a)(a− a′) and ∫ b b′ f(x) dx ≥ ∫ b b′ f(b) dx = f(b)(b− b′). Therefore, the expression in the square brackets is∫ a a′ f(x) dx− ∫ b b′ f(x) dx ≤ f(a)(a− a′)− f(b)(b− b′). Condition 2 forces the term on the right to be equal to f(a)︸︷︷︸ > 0 [(b′ − a′)− (b− a)︸ ︷︷ ︸ < 0 ] < 0. All of that was for a′ ≤ a. If a′ < a, the proof proceeds along the same lines (with most of the weak inequalities becoming strict). Rob Zimmerman (University of Toronto) STA261 - Lecture 8 July 29, 2020 5 / 15 Theorem: Shortest Intervals for Symmetric Unimodal Densities Suppose that X ∼ f(x), where f(x) is a symmetric, unimodal, and continuous pdf. Of all the intervals [a, b] which satisfy ∫ b a f(x) dx = 1 − α, the shortest is obtained by choosing a and b such that P (X ≤ a) = P (X > b) = α2 . Proof. Let x∗ be the mode of f(x), and suppose that ∫ a −∞ f(x) dx = α 2 for some a. Since the integral is positive and f(x) is unimodal, we must have that f(a) > 0. Next, we show that the only possible value of b that makes ∫ b a f(x) dx = 1− α is b = 2x∗ − a. To that end, the substitution y = 2x∗ − x below gives us α 2 = ∫ a −∞ f(x) dx = − ∫ 2x∗−a ∞ f(2x∗ − y) dy = − ∫ b ∞ f(x∗ + (x∗ − y)) dy. Now, since f(x) is clearly symmetric around its mode x∗, we must have that f(x∗ + z) = f(x∗ − z) for any z ∈ R. In particular, taking z = x∗ − y gives us f(x∗ + (x∗ − y)) = f(x∗ − (x∗ − y)) = f(y), so the integral on the right is equal to − ∫ b ∞ f(y) dy = ∫ ∞ b f(x) dx. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 6 / 15 Theorem: Shortest Intervals for Symmetric Unimodal Densities (Continued) That is, α2 = ∫∞ b f(x) dx, and b is the unique value for which this is true, since f(x) is unimodal and f(b) = f(2x∗ − a) = f(x∗ + (x∗ − a)) = f(a) > 0. Finally, it is clear that a ≤ x∗, since we have P (X ≤ a) = α2 ≤ 12 = P (X ≤ x∗). Similarly, we must also have x∗ ≤ b. To summarize, choosing a and b such that P (X ≤ a) = P (X > b) = α2 implies the following:∫ b a f(x) dx = 1− α f(a) = f(b) > 0 a ≤ x∗ ≤ b, where x∗ is a mode of f(x) Thus, the three conditions of Theorem 9.3.2 are satisfied, and it follows that [a, b] is the shortest among all intervals which satisfy ∫ b a f(x) dx = 1− α. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 7 / 15 Example 9.3.3: Optimizing Expected Length Let X1, X2, . . . , Xn iid∼ N ( µ, σ2 ) , µ ∈ R, σ2 > 0. By Theorem 9.3.2, we now know that our observed 1− α confidence interval for µ given by[ x− b s√ n , x− a s√ n ] has shortest length when a = −b = −tn−1,α/2. But this is after seeing the data; what about minimizing the length of the actual 1− α confidence interval[ X − b S√ n ,X − a S√ n ] ? The length of that is (b− a) S√ n , which is random. Therefore we might consider trying to minimize the expected length Eσ2 [ (b− a) S√ n ] = b−a√ n Eσ2 [S]. One can show that Eσ2 [S] = σ √ 2 n− 1 Γ(n2 ) Γ(n−12 ) , so the expected length is (b− a)c(n) σ√ n for a function c(·) of n alone. Subject to the 1− α constraint, c(n) σ√ n is a constant, so minimizing (b− a)c(n) σ√ n is exactly the same as minimizing b− a itself, and we still get a = −b = −tn−1,α/2 from Theorem 9.3.2. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 8 / 15 Example 9.3.4: Shortest Gamma Pivotal Interval Let X ∼ Γ(k, 1 β ), so Y = X/β ∼ Γ(k, 1) is a pivot. Can we apply Theorem 9.3.2 to Y to get the shortest 1− α confidence interval for β? Not directly. The theorem would seem to imply that we want constants a and b to satisfy P (a ≤ Y ≤ b) = 1− α and fY (a) = fY (b). However, the resulting confidence confidence interval for β is of the form [ X b , X a ] , and the length of that is proportional to 1 a − 1 b = (b−a) ab , which is definitely not the same as (b− a). So blindly applying Theorem 9.3.2 wouldn’t give us a minimum length confidence interval for β. However, the general idea can be rescued. Observe that the only unknowns in Condition 1 of Theorem 9.3.2 are a and b, so we can formally consider b = b(a) as a smooth function of a. Thus, the problem of finding the shortest length 1− α confidence interval based on the pivot Y turns into the optimization problem min a ( 1 a − 1 b(a) ) subject to ∫ b(a) a fY (y) dy = 1− α. A bit of Calculus shows that this implies fY (b) · b2 = fY (a) · a2. Thus, the problem reduces to minimizing (b−a) ab subject to fY (b) · b2 = fY (a) · a2 and this can usually be solved using numerical methods (if not analytically). Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 9 / 15 Definition: Probability of False Coverage Let (X1, X2, . . . , Xn) ∼ f(x | θ), and suppose that C(X) is a 1− α confidence set for the parameter θ. For a parameter θ′ 6= θ, the probability of false coverage for C(X) is the function Pθ (θ′ ∈ C(X)) , when C(X) = [L(X), U(X)] and θ 6= θ′ Pθ (θ′ ∈ C(X)) , when C(X) = [L(X),∞) and θ′ < θ Pθ (θ′ ∈ C(X)) , when C(X) = (−∞, U(X)] and θ′ > θ . That is, the probability of false coverage is the probability that C(X) covers another θ′ when the true parameter is θ. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 10 / 15 Definition: Uniformly Most Accurate Confidence Set A 1− α confidence set C(X) is called a uniformly most accurate (UMA) confidence set if it minimizes the probability of false coverage over all 1−α confidence sets. That is, if Pθ (θ′ ∈ C(X)) is the probability of false coverage for C(X), then Pθ ( θ′ ∈ C(X) ) ≤ Pθ ( θ′ ∈ C∗(X) ) for all θ, θ′ ∈ Ω, where C∗(X) is any other 1− α confidence set. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 11 / 15 Theorem 9.3.5: One-Sided UMP Tests Yield One-Sided UMA Bounds Let (X1, X2, . . . , Xn) ∼ f(x | θ), where θ ∈ Ω ⊆ R. For each θ0 ∈ Ω, let A∗(θ0) be the UMP level-α acceptance region of a test of H0 : θ = θ0 versus HA : θ > θ0, and let C∗(X) be the 1− α confidence set formed by inverting the UMP acceptance region. Then for any other 1−α confidence set C(X), we have Pθ ( θ′ ∈ C∗(X) ) ≤ Pθ ( θ′ ∈ C(X) ) for all θ′ < θ. Proof. Fix θ′ < θ, and let A(θ′) be the acceptance region of the level-α test of H0 : θ = θ′ formed by inverting C(X), which exists by Theorem 9.2.2. Since A∗(θ′) is the UMP acceptance region for testing H0 : θ = θ′ versus HA : θ > θ′, we must have that Pθ ( θ′ ∈ C∗(X) ) = Pθ ( X ∈ A∗(θ′) ) ≤ Pθ ( X ∈ A(θ′) ) = Pθ ( θ′ ∈ C(X) ) , where the inequality happens because A∗ corresponds to a UMP test, θ′ ∈ Ω0, and Pθ ( X ∈ A∗(θ′) ) = 1− Pθ ( X ∈ R∗(θ′) ) = 1− β∗(θ′) ≤ 1− β(θ′). Here β∗(·) and β(·) are the power functions corresponding to A∗ and A, respectively. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 12 / 15 Example 9.3.6: Normal UMA Confidence Bounds Let X1, X2, . . . , Xn iid∼ N ( µ, σ2 ) , µ ∈ R, σ2 known. An analysis similar to that of Example 8.3.18 that shows that the UMP level-α test of H0 : µ = µ0 versus HA : µ > µ0 has acceptance region A(µ) = {x : x ≤ µ+ σ√nzα}, and inverting this leads to the 1− α lower confidence bound C(X) = [ X − σ√ n zα, ∞ ) . By Theorem 9.3.5, this must be a 1− α UMA lower confidence bound for µ, since it was obtained by inverting a UMP level-α acceptance region. On the other hand, the classic two-sided interval that we’ve already seen many times, C′(X) = [ X − σ√ n zα/2, X + σ√ n zα/2 ] , is not a UMA confidence interval for µ, since it was obtained by inverting the two-sided acceptance region of H0 : µ = µ0 versus HA : µ 6= µ0, and we showed in Example 8.3.19 that no UMP test exists for testing that set of hypotheses. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 13 / 15 Theorem 9.3.9: Pratt’s Theorem Let T ∼ g(t | θ), θ ∈ Ω ⊆ R, where g(t | θ) is continuous. Let C(T ) = [L(T ), U(T )] be an interval estimator for θ. If U(t) and L(t) are both increasing in t, then for any θ∗ ∈ R, Eθ∗ [Length(C(T ))] = ∫ θ 6=θ∗ Pθ∗ (θ ∈ C(T ))] dt. That is, the expected length of an interval estimator is the integral of the proba- bilities of false coverage, taken over all “false” values of the parameter. Proof. Since L(·) and U(·) are both increasing, we can invert the parameter region and sample region: θ ∈ C(T ) ⇐⇒ θ ∈ [L(T ), U(T )] ⇐⇒ T ∈ [U−1(θ), L−1(θ)]. We have that Eθ∗ [Length(C(T ))] = ∫ T [U(t)− L(t)] g(t | θ∗) dt = ∫ T [∫ U(t) L(t) dθ ] g(t | θ∗) dt. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 14 / 15 Theorem 9.3.9: Pratt’s Theorem (Continued) Now, since the integrands are finite, we can swap the order of integration by Fubini’s theorem: ∫ T [∫ U(t) L(t) dθ ] g(t | θ∗) dt = ∫ Ω [∫ L−1(θ) U−1(θ) g(t | θ∗) dt ] dθ = ∫ Ω Pθ∗ ( T ∈ [U−1(θ), L−1(θ)] ) dθ = ∫ Ω Pθ∗ (θ ∈ C(T )) dθ = ∫ θ 6=θ∗ Pθ∗ (θ ∈ C(T )) dθ. The last equality holds because leaving out one point of the region of integration does not change the value of the integral. Rob Zimmerman (University of Toronto) STA261 – Lecture 8 July 29, 2020 15 / 15