辅导案例-CSCI 4622

Machine Learning CSCI 4622 Fall 2019 Prof. Claire Monteleoni Today: Lecture 10 •  Introduction to Ensemble Methods – Random Forests – Voted Perceptron – Boosting (if time) With slide credits: S. Dasgupta, T. Jaakkola, J. Shotton, C. Tan, and A. Torralba Review: (Binary) Decision Trees class c split nodes leaf nodes v ≥ < < ≥ Credit: Shotton, ICCV 09 Tutorial Distribution over class labels Speeding up Decision Tree Learning x y It is cumbersome to test all possible splits Try several random splits Keep the split that best separates data •  Reduces uncertainty Recurse Animation: Shotton, ICCV 09 Tutorial x y Speeding up Decision Tree Learning Shotton, ICCV 09 Tutorial It is cumbersome to test all possible splits Try several random splits Keep the split that best separates data •  Reduces uncertainty Recurse x y Speeding up Decision Tree Learning Shotton, ICCV 09 Tutorial It is cumbersome to test all possible splits Try several random splits Keep the split that best separates data •  Reduces uncertainty Recurse x y Speeding up Decision Tree Learning Shotton, ICCV 09 Tutorial Random Decision Tree It is cumbersome to test all possible splits Try several random splits Keep the split that best separates data •  Reduces uncertainty Recurse Random Decision Trees •  How many features and thresholds to try? –  just one: “extremely randomized” [Geurts et al. 06] –  few: fast training, may under-fit –  many: slower training, may over-fit •  When to stop growing the tree? –  maximum depth –  minimum entropy gain –  threshold changes in class distribution –  pruning •  A forest is an ensemble of several decision trees Classification: Decision Forests …… tree t1 tree tT class c class c split nodes leaf nodes v v Shotton, ICCV 09 Tutorial Learning a Forest •  Divide training examples into T subsets St –  improves generalization –  reduces memory requirements & training time •  Train each decision tree, t, on subset St –  same decision tree learning as before •  Easy to parallelize An ensemble classifier combines a set of weak “base” classifiers into a “strong” ensemble classifier. •  “boosted” performance •  more robust against overfitting •  Decision Forests, Random Forests [Breiman ‘01], Bagging •  Voted-Perceptron •  Boosting •  Learning with expert advice •  …. Ensemble methods Perceptron: nonseparable data What if data is not linearly separable? In this case: almost linearly separable… how will the perceptron perform? Batch perceptron Batch algorithm: w = 0 while some (xi,yi) is misclassified: w = w + yi xi Nonseparable data: this algorithm will never converge. How can this be fixed? Dream: somehow find the separator that misclassifies the fewest points… but this is NP-hard (in fact, even NP- hard to approximately solve). Fixing the batch perceptron Idea 1: only go through the data once, or a fixed number of times, K w = 0 for k = 1 to K for i = 1 to m if (xi,yi) is misclassified: w = w + yi xi At least this stops! Problem: the final w might not be good. Eg. right before terminating, the algorithm might perform an update on an outlier! Voted-perceptron Idea 2: keep around intermediate hypotheses, and have them “vote” [Freund and Schapire, 1998] n = 1 w1 = 0 c1 = 0 for k = 1 to K for i = 1 to m if (xi,yi) is misclassified: wn+1 = wn + yi xi cn+1 = 1 n = n + 1 else cn = cn + 1 At the end, a collection of linear separators w0, w1, w2, …, along with survival times: cn = amount of time that wn survived. Voted-perceptron Idea 2: keep around intermediate hypotheses, and have them “vote” [Freund and Schapire, 1998] At the end, a collection of linear separators w0, w1, w2, …, along with survival times: cn = amount of time that wn survived. This cn is a good measure of the reliability (or confidence) of wn. To classify a test point x, use a weighted majority vote: Voted-perceptron Problem: may need to keep around a lot of wn vectors. Solutions: (i)  Find “representatives” among the w vectors. (ii)  Alternative prediction rule: Just keep track of a running average, wavg 100 rounds, 1595 updates (5 errors) Final hypothesis: makes 5 errors for voting IRIS: features 3 and 4; goal: separate + from o/x Interesting questions Modify the (voted) perceptron algorithm to: [1] Find a linear separator with large margin [2] “Give up” on troublesome points after a while Voting margin and generalization • If we can obtain a large (positive) voting margin across the training examples, we will have better generalization guarantees (discussed later) fraction of points with positive margin fraction of points with margin >0.05 fraction of points with margin >0.1 iteration •  A simple algorithm for learning robust ensemble classifiers –  Freund & Shapire, 1995 –  Friedman, Hastie, Tibshhirani, 1998 •  Easy to implement, no external optimization tools needed. Boosting •  Defines a classifier using an additive model: Boosting Strong classifier Weak classifier Weight Data point •  Defines a classifier using an additive model: •  We need to define a family of weak classifiers –  E.g. linear classifiers, decision trees, or even decision stumps (threshold on one axis-parallel dimension) Boosting Strong classifier Weak classifier Weight Data point Each data point has a class label: – we initialize all wt =1 and a weight, wt. +1 ( ) -1 ( ) yt = Boosting •  Run sequentially on a batch of n data points xt=1 xt=2 xt Example using linear separators Weak learners from the family of lines This linear separator has error rate 50% and a weight: Torralba, ICCV 05 Short Course Each data point has a class label: +1 ( ) -1 ( ) yt = wt =1 Example using linear separators This one seems to be the best, call it f1 and a weight: This is a ‘weak classifier’: Its error rate is slightly less than 50%. Torralba, ICCV 05 Short Course wt =1 Each data point has a class label: +1 ( ) -1 ( ) yt = Example using linear separators – Re-weight the points such that the previous weak classifier now has 50% error – Iterate: find a weak classifier for this new problem We update the weights: Torralba, ICCV 05 Short Course Each data point xi has a class label: +1 ( ) -1 ( ) yi= wt wt exp{-yt fm(xt)} Example using linear separators We update the weights: Torralba, ICCV 05 Short Course – Re-weight the points such that the previous weak classifier now has 50% error – Iterate: find a weak classifier for this new problem Each data point xi has a class label: +1 ( ) -1 ( ) yi= wt wt exp{-yt fm(xt)} Example using linear separators We update the weights: Torralba, ICCV 05 Short Course – Re-weight the points such that the previous weak classifier now has 50% error – Iterate: find a weak classifier for this new problem Each data point xi has a class label: +1 ( ) -1 ( ) yi= wt wt exp{-yt fm(xt)} Example using linear separators We update the weights: Torralba, ICCV 05 Short Course – Re-weight the points such that the previous weak classifier now has 50% error – Iterate: find a weak classifier for this new problem Each data point xi has a class label: +1 ( ) -1 ( ) yi= wt wt exp{-yt fm(xt)} Example using linear separators The strong (non-linear) ensemble classifier is built as a weighted combination of all the weak (linear) classifiers. f1 f2 f3 f4 Torralba, ICCV 05 Short Course Boosting •  AdaBoost (Freund and Shapire, 1995) •  Real AdaBoost (Friedman et al, 1998) •  LogitBoost (Friedman et al, 1998) •  Gentle AdaBoost (Friedman et al, 1998) •  BrownBoosting (Freund, 2000) • 
FloatBoost (Li et al, 2002) •  … Mostly differ in choice of loss function and how it is minimized. Torralba, ICCV 05 Short Course Loss functions: motivation • Want a smooth upper bound on 0-1 training error. training loss 0-1 training error iteration Loss functions -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 3.5 4 Squared error Exponential loss yF(x) = margin Misclassification error Loss Squared error Exponential loss Torralba, ICCV 05 Short Course Boosting Sequential procedure. At each step we add For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998) to minimize the residual loss input Desired output Parameters weak classifier How to set the ensemble weights? •  Prediction on a new data point x is typically of the form: •  How to set the values? •  Depends on the algorithm. E.g. in AdaBoost: •  Where is the training error of on the (currently) weighted data set. F (x) = kX m=1 ↵mfm(x) ↵m ↵m = 1 2 ln 1 ✏m ✏m fm✏m Boosting example Boosting example Boosting example Boosting example Boosting example Bias-Variance Error Decomposition Assume the data (x,y) is drawn i.i.d. from D, s.t.: where: Then, for any data point (x0,y0), Irreducible error (can’t be changed by choice of h) where: NOTE: All expectations are w.r.t. the random training set h is learned from, NOT x0. y = f(x) + ✏ E[✏] = 0 Var(✏) = 2 AAACRnicbVBBSxtBGP02ttau1 ab16GVoKESEsBuElkJAKAWPCk0Usmv4dvJtHJydWWZmS8OSX9dL z976E3rpoUW8djZGqNoHA4/33sf3zctKKayLoh9Ba+3J0/VnG8 /DzRdb2y/br16PrK4MpyHXUpuzDC1JoWjohJN0VhrCIpN0ml1+b PzTL2Ss0Oqzm5eUFjhTIhccnZcm7TScD/Lu1z22zxIqrZBasSQJ E12SQaeNwoLqT4vxnZkOIu+z+/4IzaJ7l9gbJFbMCjzvh+Gk3Yl 60RLsMYlXpAMrHE/aV8lU86og5bhEa8dxVLq0RuMEl7QIk8pSif wSZzT2tNlu03pZw4K99cqU5dr4pxxbqv9O1FhYOy8ynyzQXdiHX iP+zxtXLn+f1kKVlSPFbxfllWROs6ZTNhWGuJNzT5Ab4W9l/AIN cuebb0qIH375MRn1e3HUi08OOocfVnVswC68gS7E8A4O4QiOYQ gcvsFP+A1/gu/Br+A6uLmNtoLVzA7cQwv+AjiasHA=AAACRnicbVBBSxtBGP02ttau1 ab16GVoKESEsBuElkJAKAWPCk0Usmv4dvJtHJydWWZmS8OSX9dL z976E3rpoUW8djZGqNoHA4/33sf3zctKKayLoh9Ba+3J0/VnG8 /DzRdb2y/br16PrK4MpyHXUpuzDC1JoWjohJN0VhrCIpN0ml1+b PzTL2Ss0Oqzm5eUFjhTIhccnZcm7TScD/Lu1z22zxIqrZBasSQJ E12SQaeNwoLqT4vxnZkOIu+z+/4IzaJ7l9gbJFbMCjzvh+Gk3Yl 60RLsMYlXpAMrHE/aV8lU86og5bhEa8dxVLq0RuMEl7QIk8pSif wSZzT2tNlu03pZw4K99cqU5dr4pxxbqv9O1FhYOy8ynyzQXdiHX iP+zxtXLn+f1kKVlSPFbxfllWROs6ZTNhWGuJNzT5Ab4W9l/AIN cuebb0qIH375MRn1e3HUi08OOocfVnVswC68gS7E8A4O4QiOYQ gcvsFP+A1/gu/Br+A6uLmNtoLVzA7cQwv+AjiasHA=AAACRnicbVBBSxtBGP02ttau1 ab16GVoKESEsBuElkJAKAWPCk0Usmv4dvJtHJydWWZmS8OSX9dL z976E3rpoUW8djZGqNoHA4/33sf3zctKKayLoh9Ba+3J0/VnG8 /DzRdb2y/br16PrK4MpyHXUpuzDC1JoWjohJN0VhrCIpN0ml1+b PzTL2Ss0Oqzm5eUFjhTIhccnZcm7TScD/Lu1z22zxIqrZBasSQJ E12SQaeNwoLqT4vxnZkOIu+z+/4IzaJ7l9gbJFbMCjzvh+Gk3Yl 60RLsMYlXpAMrHE/aV8lU86og5bhEa8dxVLq0RuMEl7QIk8pSif wSZzT2tNlu03pZw4K99cqU5dr4pxxbqv9O1FhYOy8ynyzQXdiHX iP+zxtXLn+f1kKVlSPFbxfllWROs6ZTNhWGuJNzT5Ab4W9l/AIN cuebb0qIH375MRn1e3HUi08OOocfVnVswC68gS7E8A4O4QiOYQ gcvsFP+A1/gu/Br+A6uLmNtoLVzA7cQwv+AjiasHA=AAACRnicbVBBSxtBGP02ttau1 ab16GVoKESEsBuElkJAKAWPCk0Usmv4dvJtHJydWWZmS8OSX9dL z976E3rpoUW8djZGqNoHA4/33sf3zctKKayLoh9Ba+3J0/VnG8 /DzRdb2y/br16PrK4MpyHXUpuzDC1JoWjohJN0VhrCIpN0ml1+b PzTL2Ss0Oqzm5eUFjhTIhccnZcm7TScD/Lu1z22zxIqrZBasSQJ E12SQaeNwoLqT4vxnZkOIu+z+/4IzaJ7l9gbJFbMCjzvh+Gk3Yl 60RLsMYlXpAMrHE/aV8lU86og5bhEa8dxVLq0RuMEl7QIk8pSif wSZzT2tNlu03pZw4K99cqU5dr4pxxbqv9O1FhYOy8ynyzQXdiHX iP+zxtXLn+f1kKVlSPFbxfllWROs6ZTNhWGuJNzT5Ab4W9l/AIN cuebb0qIH375MRn1e3HUi08OOocfVnVswC68gS7E8A4O4QiOYQ gcvsFP+A1/gu/Br+A6uLmNtoLVzA7cQwv+AjiasHA= Var[h(x0)] = E[h(x0) 2] E[h(x0)]2 AAACPHicdZBL S8NAFIUn9VXrK+rSzWAR6sKSF EERhIIILivaB6RpmEwn7dBJJsx MxBL6w9z4I9y5cuNCEbeuTdoI ttULA4fz3cude9yQUakM41nLLS wuLa/kVwtr6xubW/r2TkPySGB Sx5xx0XKRJIwGpK6oYqQVCoJ8l 5GmO7hIefOOCEl5cKuGIbF91A uoRzFSieXoN20eEoEUFwHySdx AYmT1S/eOcWifT6PLH9Cp2PDoH 2Z3KrDg6EWjbIwLzgszE0WQVc 3Rn9pdjiOfBAozJKVlGqGyYyQU xYyMCu1IkhDhAeoRK5HpSmnH4 +NH8CBxutDjInmBgmP390SMfCm Hvpt0+kj15SxLzb+YFSnv1I5p EEaKBHiyyIsYVBymScIuFQQrN kwEwoImf4W4jwTCKsk7DcGcPXl eNCpl0yib18fF6lkWRx7sgX1Q AiY4AVVwBWqgDjB4AC/gDbxrj9 qr9qF9TlpzWjazC6ZK+/oGQKi t8w==AAACPHicdZBL S8NAFIUn9VXrK+rSzWAR6sKSF EERhIIILivaB6RpmEwn7dBJJsx MxBL6w9z4I9y5cuNCEbeuTdoI ttULA4fz3cude9yQUakM41nLLS wuLa/kVwtr6xubW/r2TkPySGB Sx5xx0XKRJIwGpK6oYqQVCoJ8l 5GmO7hIefOOCEl5cKuGIbF91A uoRzFSieXoN20eEoEUFwHySdx AYmT1S/eOcWifT6PLH9Cp2PDoH 2Z3KrDg6EWjbIwLzgszE0WQVc 3Rn9pdjiOfBAozJKVlGqGyYyQU xYyMCu1IkhDhAeoRK5HpSmnH4 +NH8CBxutDjInmBgmP390SMfCm Hvpt0+kj15SxLzb+YFSnv1I5p EEaKBHiyyIsYVBymScIuFQQrN kwEwoImf4W4jwTCKsk7DcGcPXl eNCpl0yib18fF6lkWRx7sgX1Q AiY4AVVwBWqgDjB4AC/gDbxrj9 qr9qF9TlpzWjazC6ZK+/oGQKi t8w==AAACPHicdZBL S8NAFIUn9VXrK+rSzWAR6sKSF EERhIIILivaB6RpmEwn7dBJJsx MxBL6w9z4I9y5cuNCEbeuTdoI ttULA4fz3cude9yQUakM41nLLS wuLa/kVwtr6xubW/r2TkPySGB Sx5xx0XKRJIwGpK6oYqQVCoJ8l 5GmO7hIefOOCEl5cKuGIbF91A uoRzFSieXoN20eEoEUFwHySdx AYmT1S/eOcWifT6PLH9Cp2PDoH 2Z3KrDg6EWjbIwLzgszE0WQVc 3Rn9pdjiOfBAozJKVlGqGyYyQU xYyMCu1IkhDhAeoRK5HpSmnH4 +NH8CBxutDjInmBgmP390SMfCm Hvpt0+kj15SxLzb+YFSnv1I5p EEaKBHiyyIsYVBymScIuFQQrN kwEwoImf4W4jwTCKsk7DcGcPXl eNCpl0yib18fF6lkWRx7sgX1Q AiY4AVVwBWqgDjB4AC/gDbxrj9 qr9qF9TlpzWjazC6ZK+/oGQKi t8w==AAACPHicdZBL S8NAFIUn9VXrK+rSzWAR6sKSF EERhIIILivaB6RpmEwn7dBJJsx MxBL6w9z4I9y5cuNCEbeuTdoI ttULA4fz3cude9yQUakM41nLLS wuLa/kVwtr6xubW/r2TkPySGB Sx5xx0XKRJIwGpK6oYqQVCoJ8l 5GmO7hIefOOCEl5cKuGIbF91A uoRzFSieXoN20eEoEUFwHySdx AYmT1S/eOcWifT6PLH9Cp2PDoH 2Z3KrDg6EWjbIwLzgszE0WQVc 3Rn9pdjiOfBAozJKVlGqGyYyQU xYyMCu1IkhDhAeoRK5HpSmnH4 +NH8CBxutDjInmBgmP390SMfCm Hvpt0+kj15SxLzb+YFSnv1I5p EEaKBHiyyIsYVBymScIuFQQrN kwEwoImf4W4jwTCKsk7DcGcPXl eNCpl0yib18fF6lkWRx7sgX1Q AiY4AVVwBWqgDjB4AC/gDbxrj9 qr9qF9TlpzWjazC6ZK+/oGQKi t8w== The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. = 2 +Var[h(x0)] + (Bias[h(x0)]) 2 AAACNHicbZDN SsNAFIUn/lv/oi7dDBahIpSkC IogiG4ENxVsFZoYbqbTdugkE2Y mYgl9KDc+iBsRXCji1mdw0hax 1QsDh/Pdy517woQzpR3nxZqanp mdm19YLCwtr6yu2esbdSVSSWi NCC7kTQiKchbTmmaa05tEUohCT q/D7lnOr++oVEzEV7qXUD+Cds xajIA2VmBfHGNPsXYEtxW8hz2 RUAlayBgimtVB9hud0n3g7PoGl sbpKQP1g3dvK4VCYBedsjMo/F e4I1FEo6oG9pPXFCSNaKwJB6Ua rpNoPwOpGeG0X/BSRRMgXWjTh pH5WuVng6P7eMc4TdwS0rxY44H 7eyKDSKleFJrOCHRHTbLc/I81 Ut069DMWJ6mmMRkuaqUca4HzB HGTSUo07xkBRDLzV0w6IIFok3M egjt58l9Rr5Rdp+xe7hdPjkZx LKAttI1KyEUH6ASdoyqqIYIe0D N6Q+/Wo/VqfVifw9YpazSzicb K+voG24OpEA==AAACNHicbZDN SsNAFIUn/lv/oi7dDBahIpSkC IogiG4ENxVsFZoYbqbTdugkE2Y mYgl9KDc+iBsRXCji1mdw0hax 1QsDh/Pdy517woQzpR3nxZqanp mdm19YLCwtr6yu2esbdSVSSWi NCC7kTQiKchbTmmaa05tEUohCT q/D7lnOr++oVEzEV7qXUD+Cds xajIA2VmBfHGNPsXYEtxW8hz2 RUAlayBgimtVB9hud0n3g7PoGl sbpKQP1g3dvK4VCYBedsjMo/F e4I1FEo6oG9pPXFCSNaKwJB6Ua rpNoPwOpGeG0X/BSRRMgXWjTh pH5WuVng6P7eMc4TdwS0rxY44H 7eyKDSKleFJrOCHRHTbLc/I81 Ut069DMWJ6mmMRkuaqUca4HzB HGTSUo07xkBRDLzV0w6IIFok3M egjt58l9Rr5Rdp+xe7hdPjkZx LKAttI1KyEUH6ASdoyqqIYIe0D N6Q+/Wo/VqfVifw9YpazSzicb K+voG24OpEA==AAACNHicbZDN SsNAFIUn/lv/oi7dDBahIpSkC IogiG4ENxVsFZoYbqbTdugkE2Y mYgl9KDc+iBsRXCji1mdw0hax 1QsDh/Pdy517woQzpR3nxZqanp mdm19YLCwtr6yu2esbdSVSSWi NCC7kTQiKchbTmmaa05tEUohCT q/D7lnOr++oVEzEV7qXUD+Cds xajIA2VmBfHGNPsXYEtxW8hz2 RUAlayBgimtVB9hud0n3g7PoGl sbpKQP1g3dvK4VCYBedsjMo/F e4I1FEo6oG9pPXFCSNaKwJB6Ua rpNoPwOpGeG0X/BSRRMgXWjTh pH5WuVng6P7eMc4TdwS0rxY44H 7eyKDSKleFJrOCHRHTbLc/I81 Ut069DMWJ6mmMRkuaqUca4HzB HGTSUo07xkBRDLzV0w6IIFok3M egjt58l9Rr5Rdp+xe7hdPjkZx LKAttI1KyEUH6ASdoyqqIYIe0D N6Q+/Wo/VqfVifw9YpazSzicb K+voG24OpEA==AAACNHicbZDN SsNAFIUn/lv/oi7dDBahIpSkC IogiG4ENxVsFZoYbqbTdugkE2Y mYgl9KDc+iBsRXCji1mdw0hax 1QsDh/Pdy517woQzpR3nxZqanp mdm19YLCwtr6yu2esbdSVSSWi NCC7kTQiKchbTmmaa05tEUohCT q/D7lnOr++oVEzEV7qXUD+Cds xajIA2VmBfHGNPsXYEtxW8hz2 RUAlayBgimtVB9hud0n3g7PoGl sbpKQP1g3dvK4VCYBedsjMo/F e4I1FEo6oG9pPXFCSNaKwJB6Ua rpNoPwOpGeG0X/BSRRMgXWjTh pH5WuVng6P7eMc4TdwS0rxY44H 7eyKDSKleFJrOCHRHTbLc/I81 Ut069DMWJ6mmMRkuaqUca4HzB HGTSUo07xkBRDLzV0w6IIFok3M egjt58l9Rr5Rdp+xe7hdPjkZx LKAttI1KyE
UH6ASdoyqqIYIe0D N6Q+/Wo/VqfVifw9YpazSzicb K+voG24OpEA== The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again. Understanding boosting • There are four different kinds of errors in the boosting procedure that we can try to understand -  weighted error that the base learner achieves at each iteration -  weighted error of the base learner relative to just updated weights (i.e., trying the same base learner again) -  training error of the ensemble as a function of the number of boosting iterations -  generalization error of the ensemble

辅导案例-CSCI 4622

Related

Previous Post辅导案例-ECS 171

Next Post辅导案例-QUIZ 3

Author admin