EC500 Introduction to Online Learning Boston University Francesco Orabona Fall 2019 Homework 3 Due: Thursday, October 31, 3:45pm Convex Analysis Problems Problem 3.1 (2 points) Prove that the losses `t(x) = 1 2(〈zt,x〉−yt)2, where ‖zt‖2 ≤ 1, |yt| ≤ 1, and V = {x ∈ Rd : ‖x‖2 ≤ 1}, are exp-concave and find the exp-concavity constant. Hint: A twice continuously differentiable function is convex on a convex set iff its Hessian is positive semidefinite on the interior of the convex set. Online Convex Optimization Problems Problem 3.2 (3 points) We saw that in Online Subgradient Descent it is possible to choose the learning rate to have an adaptive regret bound: RegretT (u) ≤ √ 2 2 D √√√√ T∑ t=1 ‖gt‖22 . (1) In this problem, you have to find a way the exact expression of a sequence of regularizers for FTRL with linearized losses that gives use the regret bound RegretT (u) ≤ (‖u‖2 2α + α )√√√√L2 + T∑ t=1 ‖gt‖22, where α > 0 is a free parameter of the algorithm. Here the precise details: The domain V = Rd is unbounded. The losses are L-Lipschitz w.r.t. the L2 norm. You cannot assume the knowledge of the future! Problem 3.3 In this problem, we want to derive an FTRL-based online convex optimization algorithm for unbounded domains that depends on the L1 norm of the competitor and the infinity norm of the gradients. Assumptions: Feasibility set V = Rd, d ≥ 2. ‖gt‖∞ ≤ L∞. Number of iterations unknown and you cannot use the doubling trick. 1 (a) (2 point) What kind of time-varying regularizer ψt(x) gives us the desired regret upper bound? (b) (1 point) Write the resulting regret upper bound using your chosen regularizer, justifying your answer. Problem 3.4 (3 points) Assume that `t is µt strongly convex w.r.t. ‖ · ‖2. We have said that update of FTRL with “quadratized” losses is xFTRLt = argmin x t−1∑ i=1 ( 〈gi,x〉+ µi 2 ‖x− xFTRLi ‖22 ) = ∑t−1 i=1(µix FTRL i − gi)∑t−1 i=1 µi . (2) On the other hand, OSD with learning rate ηt = 1∑t i=1 µi is xOSDt+1 = x OSD t − gt∑t i=1 µi . (3) Prove that if xOSD1 = 0, we have that x FTRL t = x OSD t , for t = 1, . . . , T . Problem 3.5 (3 points) In Online Netwon Step (ONS), the prediction at round t is given by xt = argmin x∈V t−1∑ i=1 〈gi,x〉+ λ 2 ‖x‖22 + 1 2 t−1∑ i=1 ‖x− xi‖2Ai . Define St = λI + ∑t i=1Ai. Prove that the above is equivalent to the following two steps: x˜t = S −1 t−1 ( t−1∑ i=1 Aixi − t−1∑ i=1 gi ) xt = argmin x∈V ‖x− x˜t‖St−1 . Hint: you might want to take a look at the lectures on OMD. Problem 3.6 Consider the losses `t(x) = − ln(1 + zt x) with V = {x ∈ R : |x| ≤ 1/2} and |zt| ≤ 1, t = 1, . . . , T . These kind of losses appears when you want to solve the online coin betting game, that we mentioned in the lecture on lower bounds. (a) (2 points) Prove that this losses are exp-concave and find the best exp-concavity index. (b) (2 points) Write the closed formula for the update rule of ONS over these losses. Problem 3.7 We saw that it is possible to choose p = 2 ln d to have a regret upper bound of O(‖u‖1L∞ √ T ln d), for any u ∈ Rd. However, we could do even better! In fact, imagine that all the subgradients we receive have only two coordinates different than 0. If we knew that from the beginning, we could have tuned p differently and have a regret that does not depend on d but just on 2. Of course, knowing the future is not a possibility, so we will design an adaptive strategy to have the same guarantee in an adaptive way. We will assume that the losses are of the form `t(x) = `(〈zt,x〉, yt), where ` is convex and 1-Lipschitz in the first argument. Also, assume that we get to see zt before producing xt and ‖zt‖∞ ≤ L∞. Set q1 = 2 and qt = max(qt−1, 2 ln ‖zt‖0) for t ≥ 2. Consider the regularizer ψt(x) = L∞ √ qtt 2 ‖x‖2 qt qt−1 . 2 (a) (1 point) Prove that ψt is increasing with t. (b) (2 points) Prove that ‖gt‖2qt qt−1 ≤ O(qt). (c) (2 points) Prove the following regret upper bound for FTRL with linearized loss and regu- larizer ψt: T∑ t=1 `(〈zt,xt〉, yt)− T∑ t=1 `(〈zt,u〉, yt) ≤ K(‖u‖21 + 1)L∞ √ T ln d˜ . where K is some universal constant, d˜ = maxTt=1 ‖zt‖0, and ‖ · ‖0 counts the number of non-zero elements in a vector. 3