Improvements to the Massive Unsupervised Outlier Detection (MUOD) Algorithm
Fecha
2019-05-24Resumen
We present improvements to the Massive Unsupervised Outlier Detection (MUOD) algorithm, a scalable and unsupervised outlier detection method, especially useful for identifying outliers for functional data. MUOD identifies different types of outliers in samples of curves including shape, magnitude and amplitude outliers. This is done by computing for each curve three indices, which measure outlyingness in terms of shape, magnitude and amplitude relative to the other curves. These indices are then sorted and observations with extremely high indices are labelled as outliers. To further improve the scalability MUOD, we introduce ``fastMUOD", a fast implementation of MUOD which uses the component-wise or the $L_1-$median in the computation of the indices instead of using the whole observation. We also present ``semi-fastMUOD", which uses a sample of the observations in the computation of the indices. As further improvements to MUOD, we discuss a new method for identifying extreme indices which entails the use of a classical boxplot or its adjusted version for skewed distributions. We analyse the performance of the proposed improvements using real and simulated data, and show that outlier detection accuracy is not compromised even with the gains in scalability.