Repository of the paper:
Dimensionality-Aware Outlier Detection
Alastair Anderberg, James Bailey, Ricardo J. G. B. Campello,
Michael E. Houle, Henrique O. Marques, Miloš Radovanović, Arthur Zimek
SDM24
In this paper, we present a nonparametric method for outlier detection that takes full account of local variations in intrinsic dimensionality within the dataset. Using the theory of Local Intrinsic Dimensionality (LID), our 'dimensionality-aware' outlier detection method, DAO, is derived as an estimator of an asymptotic local expected density ratio involving the query point and a close neighbor drawn at random. The dimensionality-aware behavior of DAO is due to its use of local estimation of LID values in a theoretically-justified way.
Through comprehensive experimentation on more than 800 synthetic and real datasets, we show that DAO significantly outperforms three popular and important benchmark outlier detection methods: Local Outlier Factor (LOF), Simplified LOF, and kNN.
Detailed numbers for all experiments are given in tables in the Supplementary Material
pip install -r requirements.txt
Rscript R/downloadRealDatasets.r
Rscript R/preprocessing.r
Rscript R/compileResults.r 'summaryRealDatasets'
python run_synthetic.py
Rscript R/compileResults.r 'summaryResultsSyntheticDatasets'
Rscript R/compileResults.r 'lrSyntheticDatasets'
python run_real.py
python stats.py
Rscript R/compileResults.r 'lrRealDatasets'
Rscript R/compileResults.r 'plot_R_MoransI'
Rscript R/compileResults.r 'plotCDRealDatasets'
python runtime.py
Rscript R/compileResults.r 'printRuntime'