Detecting and classifying outliers in big functional data
Fecha
2021-08-30Resumen
We propose two new outlier detection methods, for identifying and classifying different
types of outliers in (big) functional data sets. The proposed methods are based on an
existing method called Massive Unsupervised Outlier Detection (MUOD). MUOD
detects and classifies outliers by computing for each curve, three indices, all based
on the concept of linear regression and correlation, which measure outlyingness in
terms of shape, magnitude and amplitude, relative to the other curves in the data.
‘Semifast-MUOD’, the first method, uses a sample of the observations in computing
the indices, while ‘Fast-MUOD’, the second method, uses the point-wise or L1 median
in computing the indices. The classical boxplot is used to separate the indices of
the outliers from those of the typical observations. Performance evaluation of the
proposed methods using simulated data show significant improvements compared to
MUOD, both in outlier detection and computational time. We show that Fast-MUOD
is especially well suited to handling big and dense functional datasets with very small
computational time compared to other methods. Further comparisons with some recent
outlier detection methods for functional data also show superior or comparable outlier
detection accuracy of the proposed methods. We apply the proposed methods on
weather, population growth, and video data.