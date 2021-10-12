CreatorsPublishersAdvertisers
View more in
Science

Uncertainty-based out-of-distribution detection requires suitable function space priors

By Francesco D'Angelo, Christian Henning
arxiv.org
 10 days ago

The need to avoid confident predictions on unfamiliar data has sparked interest in out-of-distribution (OOD) detection. It is widely assumed that Bayesian neural networks (BNNs) are well suited for this task, as the endowed epistemic uncertainty should lead to disagreement in predictions on outliers. In this paper, we question this

arxiv.org

Comments / 0

Related
arxiv.org

Tracking the risk of a deployed model and detecting harmful distribution shifts

When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain -- but not all -- distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model retraining) unnecessary. While several works have developed tests for distribution shifts, these typically either use non-sequential methods, or detect arbitrary shifts (benign or harmful), or both. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate. In this work, we design simple sequential tools for testing if the difference between source (training) and target (test) distributions leads to a significant drop in a risk function of interest, like accuracy or calibration. Recent advances in constructing time-uniform confidence sequences allow efficient aggregation of statistical evidence accumulated during the tracking process. The designed framework is applicable in settings where (some) true labels are revealed after the prediction is performed, or when batches of labels become available in a delayed fashion. We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets.
COMPUTERS
arxiv.org

Improved Heatmap-based Landmark Detection

Mitral valve repair is a very difficult operation, often requiring experienced surgeons. The doctor will insert a prosthetic ring to aid in the restoration of heart function. The location of the prosthesis' sutures is critical. Obtaining and studying them during the procedure is a valuable learning experience for new surgeons. This paper proposes a landmark detection network for detecting sutures in endoscopic pictures, which solves the problem of a variable number of suture points in the images. Because there are two datasets, one from the simulated domain and the other from real intraoperative data, this work uses cycleGAN to interconvert the images from the two domains to obtain a larger dataset and a better score on real intraoperative data. This paper performed the tests using a simulated dataset of 2708 photos and a real dataset of 2376 images. The mean sensitivity on the simulated dataset is about 75.64% and the precision is about 73.62%. The mean sensitivity on the real dataset is about 50.23% and the precision is about 62.76%. The data is from the AdaptOR MICCAI Challenge 2021, which can be found at this https URL\#.YO1zLUxCQ2x.
SCIENCE
arxiv.org

Coupling rare event algorithms with data-based learned committor functions using the analogue Markov chain

Rare events play a crucial role in many physics, chemistry, and biology phenomena, when they change the structure of the system, for instance in the case of multistability, or when they have a huge impact. Rare event algorithms have been devised to simulate them efficiently, avoiding the computation of long periods of typical fluctuations. We consider here the family of splitting or cloning algorithms, which are versatile and specifically suited for far-from-equilibrium dynamics. To be efficient, these algorithms need to use a smart score function during the selection stage. Committor functions are the optimal score functions. In this work we propose a new approach, based on the analogue Markov chain, for a data-based learning of approximate committor functions. We demonstrate that such learned committor functions are extremely efficient score functions when used with the Adaptive Multilevel Splitting algorithm. We illustrate our approach for a gradient dynamics in a three-well potential, and for the Charney-DeVore model, which is a paradigmatic toy model of multistability for atmospheric dynamics. For these two dynamics, we show that having observed a few transitions is enough to have a very efficient data-based score function for the rare event algorithm. This new approach is promising for use for complex dynamics: the rare events can be simulated with a minimal prior knowledge and the results are much more precise than those obtained with a user-designed score function.
COMPUTERS
arxiv.org

Stability of (eventually) positive semigroups on spaces of continuous functions

We present a new and very short proof of the fact that, for positive $C_0$-semigroups on spaces of continuous functions, the spectral and the growth bound coincide. Our argument, inspired by an idea of Vogt, makes the role of the underlying space completely transparent and also works if the space does not contain the constant functions - a situation in which all earlier proofs become technically quite involved.
MATHEMATICS
IN THIS ARTICLE
#Function Space#Ai#Ood#Lg#Machine Learning
arxiv.org

Improving the Scaling and Performance of Multiple Time Stepping based Molecular Dynamics with Hybrid Density Functionals

Density functionals at the level of the Generalized Gradient Approximation (GGA) and a plane-wave basis set are widely used today to perform ab initio molecular dynamics (AIMD) simulations. Going up in the ladder of accuracy of density functionals from GGA (2nd rung) to hybrid density functionals (4th rung) is much desired pertaining to the accuracy of the latter in describing structure, dynamics, and energetics of molecular and condensed matter systems. On the other hand, hybrid density functional based AIMD simulations are about two orders of magnitude slower than GGA based AIMD for systems containing ~100 atoms using ~100 compute cores. Two methods, namely MTACE and s-MTACE, based on a multiple time step integrator and adaptively compressed exchange operator formalism are able to provide a speed-up of about 7-9 in performing hybrid density functional based AIMD. In this work, we report an implementation of these methods using a task-group based parallelization within the CPMD program package, with the intention to take advantage of the large number of compute cores available on modern high-performance computing platforms. We present here the boost in performance achieved through this algorithm. This work also identifies the computational bottleneck in the s-MTACE method, and proposes a way to overcome that.
SCIENCE
arxiv.org

Why Out-of-distribution Detection in CNNs Does Not Like Mahalanobis -- and What to Use Instead

Convolutional neural networks applied for real-world classification tasks need to recognize inputs that are far or out-of-distribution (OoD) with respect to the known or training data. To achieve this, many methods estimate class-conditional posterior probabilities and use confidence scores obtained from the posterior distributions. Recent works propose to use multivariate Gaussian distributions as models of posterior distributions at different layers of the CNN (i.e., for low- and upper-level features), which leads to the confidence scores based on the Mahalanobis distance. However, this procedure involves estimating probability density in high dimensional data using the insufficient number of observations (e.g. the dimensionality of features at the last two layers in the ResNet-101 model are 2048 and 1024, with ca. 1000 observations per class used to estimate density). In this work, we want to address this problem. We show that in many OoD studies in high-dimensional data, LOF-based (Local Outlierness-Factor) methods outperform the parametric, Mahalanobis distance-based methods. This motivates us to propose the nonparametric, LOF-based method of generating the confidence scores for CNNs. We performed several feasibility studies involving ResNet-101 and EffcientNet-B3, based on CIFAR-10 and ImageNet (as known data), and CIFAR-100, SVHN, ImageNet2010, Places365, or ImageNet-O (as outliers). We demonstrated that nonparametric LOF-based confidence estimation can improve current Mahalanobis-based SOTA or obtain similar performance in a simpler way.
COMPUTERS
arxiv.org

Out-of-Distribution Robustness in Deep Learning Compression

In recent years, deep neural network (DNN) compression systems have proved to be highly effective for designing source codes for many natural sources. However, like many other machine learning systems, these compressors suffer from vulnerabilities to distribution shifts as well as out-of-distribution (OOD) data, which reduces their real-world applications. In this paper, we initiate the study of OOD robust compression. Considering robustness to two types of ambiguity sets (Wasserstein balls and group shifts), we propose algorithmic and architectural frameworks built on two principled methods: one that trains DNN compressors using distributionally-robust optimization (DRO), and the other which uses a structured latent code. Our results demonstrate that both methods enforce robustness compared to a standard DNN compressor, and that using a structured code can be superior to the DRO compressor. We observe tradeoffs between robustness and distortion and corroborate these findings theoretically for a specific class of sources.
COMPUTERS
arxiv.org

Privacy-Preserving Phishing Email Detection Based on Federated Learning and LSTM

Phishing emails that appear legitimate lure people into clicking on the attached malicious links or documents. Increasingly more sophisticated phishing campaigns in recent years necessitate a more adaptive detection system other than traditional signature-based methods. In this regard, natural language processing (NLP) with deep neural networks (DNNs) is adopted for knowledge acquisition from a large number of emails. However, such sensitive daily communications containing personal information are difficult to collect on a server for centralized learning in real life due to escalating privacy concerns. To this end, we propose a decentralized phishing email detection method called the Federated Phish Bowl (FPB) leveraging federated learning and long short-term memory (LSTM). FPB allows common knowledge representation and sharing among different clients through the aggregation of trained models to safeguard the email security and privacy. A recent phishing email dataset was collected from an intergovernmental organization to train the model. Moreover, we evaluated the model performance based on various assumptions regarding the total client number and the level of data heterogeneity. The comprehensive experimental results suggest that FPB is robust to a continually increasing client number and various data heterogeneity levels, retaining a detection accuracy of 0.83 and protecting the privacy of sensitive email communications.
INTERNET
YOU MAY ALSO LIKE
NewsBreak
Artificial Intelligence
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

Value-Function-based Sequential Minimization for Bi-level Optimization

Gradient-based Bi-Level Optimization (BLO) methods have been widely applied to solve modern machine learning problems. However, most existing solution strategies are theoretically designed based on restrictive assumptions (e.g., convexity of the lower-level sub-problem), and computationally not applicable for high-dimensional tasks. Moreover, there are almost no gradient-based methods that can efficiently handle BLO in those challenging scenarios, such as BLO with functional constraints and pessimistic BLO. In this work, by reformulating BLO into an approximated single-level problem based on the value-function, we provide a new method, named Bi-level Value-Function-based Sequential Minimization (BVFSM), to partially address the above issues. To be specific, BVFSM constructs a series of value-function-based approximations, and thus successfully avoids the repeated calculations of recurrent gradient and Hessian inverse required by existing approaches, which are time-consuming (especially for high-dimensional tasks). We also extend BVFSM to address BLO with additional upper- and lower-level functional constraints. More importantly, we demonstrate that the algorithmic framework of BVFSM can also be used for the challenging pessimistic BLO, which has never been properly solved by existing gradient-based methods. On the theoretical side, we strictly prove the convergence of BVFSM on these types of BLO, in which the restrictive lower-level convexity assumption is completely discarded. To our best knowledge, this is the first gradient-based algorithm that can solve different kinds of BLO problems (e.g., optimistic, pessimistic and with constraints) all with solid convergence guarantees. Extensive experiments verify our theoretical investigations and demonstrate the superiority of BVFSM on various real-world applications.
SCIENCE
Nature.com

A trajectory-based loss function to learn missing terms in bifurcating dynamical systems

Missing terms in dynamical systems are a challenging problem for modeling. Recent developments in the combination of machine learning and dynamical system theory open possibilities for a solution. We show how physics-informed differential equations and machine learning-combined in the Universal Differential Equation (UDE) framework by Rackauckas et al.-can be modified to discover missing terms in systems that undergo sudden fundamental changes in their dynamical behavior called bifurcations. With this we enable the application of the UDE approach to a wider class of problems which are common in many real world applications. The choice of the loss function, which compares the training data trajectory in state space and the current estimated solution trajectory of the UDE to optimize the solution, plays a crucial role within this approach. The Mean Square Error as loss function contains the risk of a reconstruction which completely misses the dynamical behavior of the training data. By contrast, our suggested trajectory-based loss function which optimizes two largely independent components, the length and angle of state space vectors of the training data, performs reliable well in examples of systems from neuroscience, chemistry and biology showing Saddle-Node, Pitchfork, Hopf and Period-doubling bifurcations.
SCIENCE
arxiv.org

Skew-Kappa distribution functions & whistler-heat-flux instability in the solar wind: the core-strahlo model

Electron velocity distributions in the solar wind are known to have field-aligned skewness, which has been characterized by the presence of secondary populations such as the halo and strahl. Skewness may provide energy for the excitation of electromagnetic instabilities, such as the whistler heat-flux instability (WHFI), that may play an important role in regulating the electron heat-flux in the solar wind. Here we use kinetic theory to analyze the stability of the WHFI in a solar-wind-like plasma where solar wind core, halo and strahl electrons are described as a superposition of two distributions: a Maxwellian core, and another population modeled by a Kappa distribution to which an asymmetry term has been added, representing the halo and also the strahl. Considering distributions with small skewness we solve the dispersion relation for the parallel propagating whistler-mode and study its linear stability for different plasma parameters. Our results show that the WHFI can develop in this system, and provide stability thresholds for this instability, as a function of the electron beta and the parallel electron heat-flux, to be compared with observational data. However, since different plasma states, with different stability level to the WHFI, can have the same moment heat-flux value, it is the skewness (i.e. the asymmetry of the distribution along the magnetic field), and not the heat-flux, the best indicator of instabilities. Thus, systems with high heat-flux can be stable enough to WHFI, so that it is not clear if the instability can effectively regulate the heat-flux values through wave-particle interactions.
SCIENCE
arxiv.org

Incremental Community Detection in Distributed Dynamic Graph

Community detection is an important research topic in graph analytics that has a wide range of applications. A variety of static community detection algorithms and quality metrics were developed in the past few years. However, most real-world graphs are not static and often change over time. In the case of streaming data, communities in the associated graph need to be updated either continuously or whenever new data streams are added to the graph, which poses a much greater challenge in devising good community detection algorithms for maintaining dynamic graphs over streaming data. In this paper, we propose an incremental community detection algorithm for maintaining a dynamic graph over streaming data. The contributions of this study include (a) the implementation of a Distributed Weighted Community Clustering (DWCC) algorithm, (b) the design and implementation of a novel Incremental Distributed Weighted Community Clustering (IDWCC) algorithm, and (c) an experimental study to compare the performance of our IDWCC algorithm with the DWCC algorithm. We validate the functionality and efficiency of our framework in processing streaming data and performing large in-memory distributed dynamic graph analytics. The results demonstrate that our IDWCC algorithm performs up to three times faster than the DWCC algorithm for a similar accuracy.
TECHNOLOGY
arxiv.org

Understanding of a brain spatial map based on threshold-free function dendrogramization

Linear matrix factorizations (LMFs) such as independent component analysis (ICA), principal component analysis (PCA), and their extensions, have been widely used for finding relevant spatial maps in brain imaging data. The last step of an LMF before interpretation is usually to extract the activated brain regions from the map by thresholding. However, it is difficult to determine an appropriate threshold level. Thresholding can remove the underlying properties of spatial maps and their features imposed by the model. In this study, we propose a threshold-free activated region extraction method which involves simplifying a brain spatial map to a dendrogram through Morse filtration. Since a dendrogram is related to the change of clustering structure in Rips filtration, we first show the relationship between the Rips filtration of a graph and the Morse filtration of a function. Then, we dendrogramize a spatial map in order to visualize the activated brain regions and the range of their importance in a spatial map. The proposed method can be applied to any spatial maps that a user wants to threshold and interpret. In experiments, we applied the proposed method to independent component maps (ICMs) obtained from resting-state fMRI data, and the dominant subnetworks obtained by the PCA of a correlation-based functional connectivity of FDG PET Alzheimer's disease neuroimaging initiative (ADNI) data. We found that dendrogramization can help to understand a brain spatial map without thresholding.
SCIENCE
astrobiology.com

Climate Uncertainties Caused By Unknown Land Distribution On Habitable M-Earths

Dayside orthographic projections of sample RandCont models with 40-60% dayside land cover (row labels). Left to right: landmap, net precipitation (defined as precipitation minus evaporation), cloud fraction, and surface temperature. Clouds and precipitation are always concentrated near the substellar point, but their exact distribution depends on the shape of the land. Evaporation takes place over ocean elsewhere on the dayside. Surface temperatures are highest on dry land.
ENVIRONMENT
arxiv.org

Span Detection for Aspect-Based Sentiment Analysis in Vietnamese

Aspect-based sentiment analysis plays an essential role in natural language processing and artificial intelligence. Recently, researchers only focused on aspect detection and sentiment classification but ignoring the sub-task of detecting user opinion span, which has enormous potential in practical applications. In this paper, we present a new Vietnamese dataset (UIT-ViSD4SA) consisting of 35,396 human-annotated spans on 11,122 feedback comments for evaluating the span detection in aspect-based sentiment analysis. Besides, we also propose a novel system using Bidirectional Long Short-Term Memory (BiLSTM) with a Conditional Random Field (CRF) layer (BiLSTM-CRF) for the span detection task in Vietnamese aspect-based sentiment analysis. The best result is a 62.76% F1 score (macro) for span detection using BiLSTM-CRF with embedding fusion of syllable embedding, character embedding, and contextual embedding from XLM-RoBERTa. In future work, span detection will be extended in many NLP tasks such as constructive detection, emotion recognition, complaint analysis, and opinion mining. Our dataset is freely available at this https URL for research purposes.
TECHNOLOGY
arxiv.org

SleepPriorCL: Contrastive Representation Learning with Prior Knowledge-based Positive Mining and Adaptive Temperature for Sleep Staging

The objective of this paper is to learn semantic representations for sleep stage classification from raw physiological time series. Although supervised methods have gained remarkable performance, they are limited in clinical situations due to the requirement of fully labeled data. Self-supervised learning (SSL) based on contrasting semantically similar (positive) and dissimilar (negative) pairs of samples have achieved promising success. However, existing SSL methods suffer the problem that many semantically similar positives are still uncovered and even treated as negatives. In this paper, we propose a novel SSL approach named SleepPriorCL to alleviate the above problem. Advances of our approach over existing SSL methods are two-fold: 1) by incorporating prior domain knowledge into the training regime of SSL, more semantically similar positives are discovered without accessing ground-truth labels; 2) via investigating the influence of the temperature in contrastive loss, an adaptive temperature mechanism for each sample according to prior domain knowledge is further proposed, leading to better performance. Extensive experiments demonstrate that our method achieves state-of-the-art performance and consistently outperforms baselines.
arxiv.org

An Artificial Bee Colony Based Algorithm for Continuous Distributed Constraint Optimization Problems

Distributed Constraint Optimization Problems (DCOPs) are a frequently used framework in which a set of independent agents choose values from their respective discrete domains to maximize their utility. Although this formulation is typically appropriate, there are a number of real-world applications in which the decision variables are continuous-valued and the constraints are represented in functional form. To address this, Continuous Distributed Constraint Optimization Problems (C-DCOPs), an extension of the DCOPs paradigm, have recently grown the interest of the multi-agent systems field. To date, among different approaches, population-based algorithms are shown to be most effective for solving C-DCOPs. Considering the potential of population-based approaches, we propose a new C-DCOPs solver inspired by a well-known population-based algorithm Artificial Bee Colony (ABC). Additionally, we provide a new exploration method that aids in the further improvement of the algorithm's solution quality. Finally, We theoretically prove that our approach is an anytime algorithm and empirically show it produces significantly better results than the state-of-the-art C-DCOPs algorithms.
COMPUTERS
arxiv.org

A dimension-oblivious domain decomposition method based on space-filling curves

In this paper we present an algebraic dimension-oblivious two-level domain decomposition solver for discretizations of elliptic partial differential equations. The proposed parallel solver is based on a space-filling curve partitioning approach that is applicable to any discretization, i.e. it directly operates on the assembled matrix equations. Moreover, it allows for the effective use of arbitrary processor numbers independent of the dimension of the underlying partial differential equation while maintaining optimal convergence behavior. This is the core property required to attain a sparse grid based combination method with extreme scalability which can utilize exascale parallel systems efficiently. Moreover, this approach provides a basis for the development of a fault-tolerant solver for the numerical treatment of high-dimensional problems. To achieve the required data redundancy we are therefore concerned with large overlaps of our domain decomposition which we construct via space-filling curves. In this paper, we propose our space-filling curve based domain decomposition solver and present its convergence properties and scaling behavior. The results of numerical experiments clearly show that our approach provides optimal convergence and scaling behavior in arbitrary dimension utilizing arbitrary processor numbers.
MATHEMATICS

Comments / 0

Community Policy