Advances in Neural Information Processing Systems
Given an unsupervised novelty detection task on a new dataset, how can we automatically select a “best” detection model while simultaneously controlling the error rate of the best model? For novelty detection analysis, numerous detectors have been proposed to detect outliers on a new unseen dataset based on a score function trained on available clean data. However, due to the absence of labeled anomalous data for model evaluation and comparison, there is a lack of systematic approaches that are able to select the “best” model/detector (i.e., the algorithm as well as its hyperparameters) and achieve certain error rate control simultaneously. In this paper, we introduce a unified data-driven procedure to address this issue. The key idea is to maximize the number of detected outliers while controlling the false discovery rate (FDR) with the help of Jackknife prediction. We establish non-asymptotic bounds for the false discovery proportions and show that the proposed procedure yields valid FDR control under some mild conditions. Numerical experiments on both synthetic and real data validate the theoretical results and demonstrate the effectiveness of our proposed AutoMS method. The code is available at: https://github.com/ZhangYifan1996/AutoMS.
Error detection, Petroleum reservoir evaluation, Statistics
Y. Zhang, et al, "AutoMS: Automatic Model Selection for Novelty Detection with Error Rate Control", in 36th Conference on Neural Info. Processing Systems (NeurIPS 2022), Advances in Neural Information Processing Systems, vol. 35, Dec 2022.