active learning and the importance of feedback in sampling rui castro rebecca willett and robert...

Download Active Learning and the Importance of Feedback in Sampling Rui Castro Rebecca Willett and Robert Nowak

If you can't read please download the document

Upload: harold-foster

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Learning by asking carefully chosen questions, constructed using information gleaned from previous observations Active Sampling in Regression

TRANSCRIPT

Active Learning and the Importance of Feedback in Sampling Rui Castro Rebecca Willett and Robert Nowak Motivation twenty questions Goal: Accurately learn a concept, as fast as possible, by strategically focusing in regions of interest Learning by asking carefully chosen questions, constructed using information gleaned from previous observations Active Sampling in Regression Sample locations are chosen a priori, before any observations are made Passive Sampling Sample locations are chosen as a function of previous observations Active Sampling Problem Formulation Passive vs. Active Passive Sampling: Active Sampling: Estimation and Sampling Strategies Goal: The estimator : The sampling strategy : Classical Smoothness Spaces Functions with homogeneous complexity over the entire domain - Hlder smooth function class Smooth Functions minimax lower bound Theorem (Castro, RW, Nowak 05) The performance one can achieve with active learning is the same achievable with passive learning!!! Inhomogeneous Functions Homogenous functions spread-out complexity Inhomogeneous functions localized complexity The relevant features of inhomogeneous functions are very localized in space, making active sampling promising Piecewise Constant Functions d 2 best possible rate Passive Learning in the PC Class Estimation using Recursive Dyadic Partitions (RDP) Prune the partition, adapting to the dataRecursively divide the domain into hypercubesDecorate each partition set with a constantDistribute sample points uniformly over [0,1] d RDP-based Algorithm Choose an RDP that fits the data well, but it is not overly complicated empirical risk measures fit of the data Complexity penalty This estimator can be computed efficiently using a tree-pruning algorithm. Error Bounds Oracle bounding techniques, akin to the work of Barron91, can be used to upper bound the performance of our estimator approximation errorcomplexity penalty balancing the two terms Active Sampling in the PC class Active Sampling Key: learn the location of the boundary Use Recursive Dyadic Partitions to find the boundary Active Sampling in the PC Class Stage 1: Oversample at coarse resolution n/2 samples uniformly distributed Limit the resolution: many more samples than cells biased, but very low variance result (high approximation error, but low estimation error) boundary zone is reliably detected Active Sampling in the PC Class Stage 2: Critically sample in boundary zone n/2 samples uniformly distributed within boundary zone construct fine partition around boundary prune partition according to standard multiscale methods high resolution estimate of boundary Main Theorem * Cusp-free boundaries cannot behave like the graph of |x| 1/2 at the origin, but milder kinks like |x| at 0 are allowable. Main Theorem (Castro 05): * Sketch of the Proof - Approach Controlling the Bias Not a problem after shift Potential Problem Area Cells intersecting the boundary may be pruned if aligned with cell edge Solution: Repeat Stage 1 d-times, using d slightly offset partitions Small cells remaining in any of the d+1 partitions are passed on to Stage 2 Iterating the approach yields a L-step method Compare with minimax lower bound: Multi-Stage Approach Passive Sampling: Learning PC Functions - Summary Active Sampling: This rates are nearly achieved using RDP-based estimators, that are easily implemented and have low computational complexity. Spatially adaptive estimators based on sparse model selection (e.g., wavelet thresholding) may provide automatic mechanisms for guiding active learning processes Instead of choosing where-to-sample one can also choose where-to- compute to actively reduce computation. Can active learning provably work in even more realistic situations and under little or no prior assumptions ? Spatial Adaptivity and Active Learning Piecewise Constant Functions d =1 Consider first the simplest non-homogenous function class step function This is a parametric class Passive Sampling Distribute sample points uniformly over [0,1] and use a maximum likelihood estimator Active Sampling Learning Rates d =1 Passive Sampling: Active Sampling: (Burnashev & Zigangirov 74) Sketch of the Proof - Stage 1 Intuition tells us that this should be the error we experience away from the boundary Error due to approximation of the boundary regions estimation error Sketch of the Proof - Stage 1 Key: Limit the resolution of the RDPs 1/k This is the performance away from the boundary 1/k Sketch of the Proof - Stage 1 Are we finding more than the boundary? Lemma: At least we are not detecting too many areas outside the boundary. Sketch of the Proof - Stage 2 n/2 more samples distributed uniformly over the boundary Total error contribution from boundary zone: Sketch of the Proof Overall Error Error away from the boundary Balancing the two errors yields Error in the boundary region