Date:
Wed 11 Jun
Time:
15.00
Place:
room 214
Speaker:
David M. Mount (Univ. of Maryland)
Title:
An Algorithm for Robust Line Fitting
Reference:
Paola Magillo
Abstract.
The problem of fitting a straight line to a finite collection of
points in the plane is an important problem in statistical estimation
and pattern analysis. In many applications a significant fraction of
the data points, called outliers, do no lie on or even near the line
of fit. Traditional line fitting methods such as least squares can be
corrupted by even a small number of outliers. A class of estimators,
called robust estimators, have been developed to address this problem.
The basic measure of the robustness of an estimator is its breakdown
point, which is defined to be the fraction (up to 50 percent) of
outlying data points that can corrupt the estimator. Rousseeuw's
least median-of-squares (LMS) regression (line) estimator is among the
best known 50-percent breakdown-point estimators. The best exact
algorithms known for this problem run in O(n^2) time, where n is the
number of data points. Because of this high running time, many
practitioners prefer to use a simple O(n log n) Monte Carlo randomized
algorithm, which is quite efficient but provides no guarantees of
accuracy (even probabilistic) unless the data set satisfies certain
assumptions.
In an attempt to close the gap between theory and practice, we present
a simple randomized approximation algorithm for the LMS problem.
Given any fixed error factor, in O(n log n) time the algorithm returns
an answer that is within this factor. When the error factor is set to
0, the algorithm computes the exact answer and runs no slower than
O(n^2) time. We present empirical evidence that its running time on
realistic data sets is much better. This algorithm provides an
attractive option for practitioners, combining both the efficiency of
a Monte Carlo algorithm and guarantees on the accuracy of the result.