辅导案例-ECSE4500/6500

ECSE4500/6500 Distributed Systems and Sensor Networks Midterm Exam Due: Friday, 6PM, March 6, 2020 Submit via email to Glory Justin ([email protected]) and cc me ([email protected]) 1. Logistic regression (20%) We consider logistic regression for a 2-class classification setting. Let {xn, yn}Nn=1 denote the training data with xn ∈ Rd and yn ∈ {0, 1}. (a) (10%) Assume the class conditional distributions are members of an exponential family distribution, so that p(xn|yn = 0) = exp(η⊤0 xn)g(η0)h(xn) and p(xn|yn = 1) = exp(η⊤1 xn)g(η1)h(xn) (1) where η0,η1 are some given vectors and h, g are given functions. Show that the log-odds over the class posterior distributions is affine, which is the assumption logistic regression makes, i.e., log ! p(xn|yn = 0) p(xn|yn = 1) ” = w⊤xn + w0 (2) where w, w0 are sought parameters. (b) (10%) The sparse logistic regression formulation considers an ℓ1-norm regularized version of standard logistic regression, and the loss function is given by: L(w) := N# n=1 $ −ynx⊤nw+ log $ 1 + exp(x⊤nw) %% + λ#w#1 (3) where #w#1 := &d i=1 |wi|. Show that L(w) is a convex function of w. 2. Programming assignment (30%) The ℓ2-regularized logistic regression is formulated as min w M# m=1 Lm(w) with Lm(w) := # n∈Nm log $ 1 + exp(−ynx⊤nw) % + λ 2 #w#22 (4) where {xn, yn}n∈Nm are given training data at worker m with xn ∈ Rd and yn ∈ {−1,+1}, M is the total number of worker and λ = 0.01 is the regularization constant. Assume that M = 10, and uniformly distribute the entire dataset over M workers. Note that the logistic regression objective function of (4) is different from (3) due to using different labeling. 1 ECSE4500/6500 Distributed Systems and Sensor Networks 2.1 Dataset Adult dataset contains samples that predict whether a person makes over 50K a year based on features in xn such as work-class, education, and marital-status. Download: https://archive.ics.uci.edu/ml/datasets/Adult 2.2 Algorithms 1. Distributed implementation of gradient descent algorithm. 2. Distributed gradient descent algorithm with stochastic quantization (2 bits per entry). 3. Distributed gradient descent algorithm with deterministic quantization (2 bits per entry). 4. Distributed gradient descent with stochastic sparsification (uniform sparsification prob). 5. Distributed gradient descent with deterministic sparsification (top-2 sparsification). 2.3 Submitted results 1. Well documented MATLAB/Python codes. (10%) 2. Plots of iteration index vs objective error for all algorithms. (5%) 3. Plots of number of communication bits vs objective error for all algorithms. (5%) 4. Brief discussions of your findings. (10%) 2

辅导案例-ECSE4500/6500

Related

Previous Post辅导案例-EE 4TM4

Next Post辅导案例-CSE 691 AS-Assignment 3

Author admin