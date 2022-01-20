ContributorsPublishersAdvertisers
Ephemeral data derived potentials for random structure search

By Chris J. Pickard
arxiv.org
 4 days ago

Structure prediction has become a key task of the modern atomistic sciences, and depends on the rapid and reliable computation of the energy landscape. First principles density functional based calculations are highly reliable, faithfully describing the entire energy landscape. They are, however, computationally intensive and slow compared to interatomic potentials. Great...

arxiv.org

Training Data Selection for Accuracy and Transferability of Interatomic Potentials

David Montes de Oca Zapiain, Mitchell A. Wood, Nicholas Lubbers, Carlos Z. Pereyra, Aidan P. Thompson, Danny Perez. Advances in machine learning (ML) techniques have enabled the development of interatomic potentials that promise both the accuracy of first principles methods and the low-cost, linear scaling, and parallel efficiency of empirical potentials. Despite rapid progress in the last few years, ML-based potentials often struggle to achieve transferability, that is, to provide consistent accuracy across configurations that significantly differ from those used to train the model. In order to truly realize the promise of ML-based interatomic potentials, it is therefore imperative to develop systematic and scalable approaches for the generation of diverse training sets that ensure broad coverage of the space of atomic environments. This work explores a diverse-by-construction approach that leverages the optimization of the entropy of atomic descriptors to create a very large ($>2\cdot10^{5}$ configurations, $>7\cdot10^{6}$ atomic environments) training set for tungsten in an automated manner, i.e., without any human intervention. This dataset is used to train polynomial as well as multiple neural network potentials with different architectures. For comparison, a corresponding family of potentials were also trained on an expert-curated dataset for tungsten. The models trained to entropy-optimized data exhibited vastly superior transferability compared to the expert-curated models. Furthermore, while the models trained with heavy user input (i.e., domain expertise) yield the lowest errors when tested on similar configurations, out-sample predictions are dramatically more robust when the models are trained on a deliberately diverse set of training data. Herein we demonstrate the development of both accurate and transferable ML potentials using automated and data-driven approaches for generating large and diverse training sets.
The Independent

Tool use may be socially learned in wild chimpanzees, research suggests

Chimpanzees do not automatically know how to crack nuts with stone tools, but instead learn this behaviour from others, new research suggests.The findings indicate the culture of the animals may be more similar to humans than often assumed.Humans learn to use tools and other skills from watching each other, and through this form of social learning, human culture has become increasingly complexIt has been suggested that while chimpanzees do not learn in this way, they can reinvent cultural behaviours individually.Our findings on wild chimpanzees, our closest living relatives, help to shed light on what it is (and isn’t) that makes...
arxiv.org

Investigating the Potential of Auxiliary-Classifier GANs for Image Classification in Low Data Regimes

Generative Adversarial Networks (GANs) have shown promise in augmenting datasets and boosting convolutional neural networks' (CNN) performance on image classification tasks. But they introduce more hyperparameters to tune as well as the need for additional time and computational power to train supplementary to the CNN. In this work, we examine the potential for Auxiliary-Classifier GANs (AC-GANs) as a 'one-stop-shop' architecture for image classification, particularly in low data regimes. Additionally, we explore modifications to the typical AC-GAN framework, changing the generator's latent space sampling scheme and employing a Wasserstein loss with gradient penalty to stabilize the simultaneous training of image synthesis and classification. Through experiments on images of varying resolutions and complexity, we demonstrate that AC-GANs show promise in image classification, achieving competitive performance with standard CNNs. These methods can be employed as an 'all-in-one' framework with particular utility in the absence of large amounts of training data.
arxiv.org

Improved Random Features for Dot Product Kernels

Dot product kernels, such as polynomial and exponential (softmax) kernels, are among the most widely used kernels in machine learning, as they enable modeling the interactions between input features, which is crucial in applications like computer vision, natural language processing, and recommender systems. We make several novel contributions for improving the efficiency of random feature approximations for dot product kernels, to make these kernels more useful in large scale learning. First, we present a generalization of existing random feature approximations for polynomial kernels, such as Rademacher and Gaussian sketches and TensorSRHT, using complex-valued random features. We show empirically that the use of complex features can significantly reduce the variances of these approximations. Second, we provide a theoretical analysis for understanding the factors affecting the efficiency of various random feature approximations, by deriving closed-form expressions for their variances. These variance formulas elucidate conditions under which certain approximations (e.g., TensorSRHT) achieve lower variances than others (e.g, Rademacher sketch), and conditions under which the use of complex features leads to lower variances than real features. Third, by using these variance formulas, which can be evaluated in practice, we develop a data-driven optimization approach to random feature approximations for general dot product kernels, which is also applicable to the Gaussian kernel. We describe the improvements brought by these contributions with extensive experiments on a variety of tasks and datasets.
arxiv.org

alpha-Deep Probabilistic Inference (alpha-DPI): efficient uncertainty quantification from exoplanet astrometry to black hole feature extraction

Inference is crucial in modern astronomical research, where hidden astrophysical features and patterns are often estimated from indirect and noisy measurements. Inferring the posterior of hidden features, conditioned on the observed measurements, is essential for understanding the uncertainty of results and downstream scientific interpretations. Traditional approaches for posterior estimation include sampling-based methods and variational inference. However, sampling-based methods are typically slow for high-dimensional inverse problems, while variational inference often lacks estimation accuracy. In this paper, we propose alpha-DPI, a deep learning framework that first learns an approximate posterior using alpha-divergence variational inference paired with a generative neural network, and then produces more accurate posterior samples through importance re-weighting of the network samples. It inherits strengths from both sampling and variational inference methods: it is fast, accurate, and scalable to high-dimensional problems. We apply our approach to two high-impact astronomical inference problems using real data: exoplanet astrometry and black hole feature extraction.
arxiv.org

Extended Randomized Kaczmarz Method for Sparse Least Squares and Impulsive Noise Problems

The Extended Randomized Kaczmarz method is a well known iterative scheme which can find the Moore-Penrose inverse solution of a possibly inconsistent linear system and requires only one additional column of the system matrix in each iteration in comparison with the standard randomized Kaczmarz method. Also, the Sparse Randomized Kaczmarz method has been shown to converge linearly to a sparse solution of a consistent linear system. Here, we combine both ideas and propose an Extended Sparse Randomized Kaczmarz method. We show linear expected convergence to a sparse least squares solution in the sense that an extended variant of the regularized basis pursuit problem is solved. Moreover, we generalize the additional step in the method and prove convergence to a more abstract optimization problem. We demonstrate numerically that our method can find sparse least squares solutions of real and complex systems if the noise is concentrated in the complement of the range of the system matrix and that our generalization can handle impulsive noise.
arxiv.org

Occupancy Information Ratio: Infinite-Horizon, Information-Directed, Parameterized Policy Search

We develop a new measure of the exploration/exploitation trade-off in infinite-horizon reinforcement learning problems called the occupancy information ratio (OIR), which is comprised of a ratio between the infinite-horizon average cost of a policy and the entropy of its long-term state occupancy measure. The OIR ensures that no matter how many trajectories an RL agent traverses or how well it learns to minimize cost, it maintains a healthy skepticism about its environment, in that it defines an optimal policy which induces a high-entropy occupancy measure. Different from earlier information ratio notions, OIR is amenable to direct policy search over parameterized families, and exhibits hidden quasiconcavity through invocation of the perspective transformation. This feature ensures that under appropriate policy parameterizations, the OIR optimization problem has no spurious stationary points, despite the overall problem's nonconvexity. We develop for the first time policy gradient and actor-critic algorithms for OIR optimization based upon a new entropy gradient theorem, and establish both asymptotic and non-asymptotic convergence results with global optimality guarantees. In experiments, these methodologies outperform several deep RL baselines in problems with sparse rewards, where many trajectories may be uninformative and skepticism about the environment is crucial to success.
arxiv.org

A Matheuristic Approach for Solving a Simultaneous Lot Sizing and Scheduling Problem with Client Prioritization in Tire Industry

Cyril Koch, Taha Arbaoui, Yassine Ouazene, Farouk Yalaoui, Humbert De Brunier, Nicolas Jaunet, Antoine De Wulf. This paper introduces an integrated lot sizing and scheduling problem inspired from a real-world application in off-the-road tire industry. This problem considers the assignment of different items on parallel machines with complex eligibility constraints within a finite planning horizon. It also considers a large panel of specific constraints such as: backordering, a limited number of setups, upstream resources saturation and customers prioritization. A novel mixed integer formulation is proposed with the objective of optimizing different normalized criteria related to the inventory and service level performance. Based on this mathematical formulation, a problem-based matheuristic method that solves the lot sizing and assignment problems separately is proposed to solve the industrial case. A computational study and sensitivity analysis are carried out based on real-world data with up to 170 products, 70 unrelated parallel machines and 42 periods. The obtained results show the effectiveness of the proposed approach on improving the company's solution. Indeed, the two most important KPIs for the management have been optimized of respectively 32% for the backorders and 13% for the overstock. Moreover, the computational time have been reduced significantly.
arxiv.org

Towards energy discretization for muon scattering tomography in GEANT4 simulations: A discrete probabilistic approach

In this study, by attempting to eliminate the disadvantageous complexity of the existing particle generators, we present a discrete probabilistic scheme adapted for the discrete energy spectra in the GEANT4 simulations. In our multi-binned approach, we initially compute the discrete probabilities for each energy bin, the number of which is flexible depending on the computational goal, and we solely satisfy the imperative condition that requires the sum of the discrete probabilities to be the unity. Regarding the implementation in GEANT4, we construct a one-dimensional probability grid that consists of sub-cells equaling the number of the energy bin, and each cell represents the discrete probability of each energy bin by fulfilling the unity condition. Through uniformly generating random numbers between 0 and 1, we assign the discrete energy in accordance with the associated generated random number that corresponds to a specific cell in the probability grid. This probabilistic methodology does not only permits us to discretize the continuous energy spectra based on the Monte Carlo generators, but it also gives a unique access to utilize the experimental energy spectra measured at the distinct particle flux values. Ergo, we initially perform our simulations by discretizing the muon energy spectrum acquired via the CRY generator over the energy interval between 0 and 8 GeV along with the measurements from the BESS spectrometer and we determine the average scattering angle, the root-mean-square of the scattering angle, and the number of the muon absorption by using a series of slabs consisting of aluminum, copper, iron, lead, and uranium. Eventually, we express a computational strategy in the GEANT4 simulations that grants us the ability to verify as well as to modify the energy spectrum depending on the nature of the information source in addition to the exceptional tracking speed.
SCIENCE
arxiv.org

Bottoms Up: Standard Model Effective Field Theory from a Model Perspective

Experiments in particle physics have hitherto failed to produce any significant evidence for the many explicit models of physics beyond the Standard Model (BSM) that had been proposed over the past decades. As a result, physicists have increasingly turned to model-independent strategies as tools in searching for a wide range of possible BSM effects. In this paper, we describe the Standard Model Effective Field Theory (SM-EFT) and analyse it in the context of the philosophical discussions about models, theories, and (bottom-up) effective field theories. We find that while the SM-EFT is a quantum field theory, assisting experimentalists in searching for deviations from the SM, in its general form it lacks some of the characteristic features of models. Those features only come into play if put in by hand or prompted by empirical evidence for deviations. Employing different philosophical approaches to models, we argue that the case study suggests not to take a view on models that is overly permissive because it blurs the lines between the different stages of the SM-EFT research strategies and glosses over particle physicists' motivations for undertaking this bottom-up approach in the first place. Looking at EFTs from the perspective of modelling does not require taking a stance on some specific brand of realism or taking sides in the debate between reduction and emergence into which EFTs have recently been embedded.
SCIENCE
arxiv.org

Phase Diagram of the Contact Process on Barabasi-Albert Networks

We show results for the contact process on Barabasi networks. The contact process is a model for an epidemic spreading without permanent immunity that has an absorbing state. For finite lattices, the absorbing state is the true stationary state, which leads to the need for simulation of quasi-stationary states, which we did in two ways: reactivation by inserting spontaneous infected individuals, or by the quasi-stationary method, where we store a list of active states to continue the simulation when the system visits the absorbing state. The system presents an absorbing phase transition where the critical behavior obeys the Mean Field exponents $\beta=1$, $\gamma'=0$, and $\nu=2$. However, the different quasi-stationary states present distinct finite-size logarithmic corrections. We also report the critical thresholds of the model as a linear function of the network connectivity inverse $1/z$, and the extrapolation of the critical threshold function for $z \to \infty$ yields the basic reproduction number $R_0=1$ of the complete graph, as expected. Decreasing the network connectivity leads to the increase of the critical basic reproduction number $R_0$ for this model.
COMPUTERS
arxiv.org

A new functional space related to Riesz fractional gradients in bounded domains

We present a new functional space suitable for nonlocal models in Calculus of Variations and partial differential equations. Our inspiration are the Bessel spaces Hsp (Rn), which can be regarded as the completion of smooth functions under the norm sum of the Lp norms of a function and its Riesz fractional gradient. Having in mind models in which it is essential to work in bounded domains of Rn, we consider a similar nonlocal gradient to the Riesz fractional one with a variation that makes it defined over bounded domains. The corresponding functional space is defined as the completion of smooth functions under the natural norm, sum of the Lp norms of a function and its nonlocal gradient. We prove a nonlocal fundamental theorem of Calculus, according to which u can be expressed as a convolution of its nonlocal gradient with a suitable kernel. As a consequence, we show inequalities in the spirit of Poincaré, Morrey, Trudinger and Hardy. Compact embeddings into Lq spaces are also proved. As an application of the direct method of Calculus of Variations, we show the existence of minimizers of the associated energy functionals under the assumption of convexity of the integrand, as well as the corresponding Euler Lagrange equation.
MATHEMATICS
arxiv.org

ANOVA for Data in Metric Spaces, with Applications to Spatial Point Patterns

We give a review of recent ANOVA-like procedures for testing group differences based on data in a metric space and present a new such procedure. Our statistic is based on the classic Levene's test for detecting differences in dispersion. It uses only pairwise distances of data points and and can be computed quickly and precisely in situations where the computation of barycenters ("generalized means") in the data space is slow, only by approximation or even infeasible. We show the asymptotic normality of our test statistic and present simulation studies for spatial point pattern data, in which we compare the various procedures in a 1-way ANOVA setting. As an application, we perform a 2-way ANOVA on a data set of bubbles in a mineral flotation process.
SCIENCE
arxiv.org

On the adaptation of recurrent neural networks for system identification

This paper presents a transfer learning approach which enables fast and efficient adaptation of Recurrent Neural Network (RNN) models of dynamical systems. A nominal RNN model is first identified using available measurements. The system dynamics are then assumed to change, leading to an unacceptable degradation of the nominal model performance on the perturbed system. To cope with the mismatch, the model is augmented with an additive correction term trained on fresh data from the new dynamic regime. The correction term is learned through a Jacobian Feature Regression (JFR) method defined in terms of the features spanned by the model's Jacobian with respect to its nominal parameters. A non-parametric view of the approach is also proposed, which extends recent work on Gaussian Process (GP) with Neural Tangent Kernel (NTK-GP) to the RNN case (RNTK-GP). This can be more efficient for very large networks or when only few data points are available. Implementation aspects for fast and efficient computation of the correction term, as well as the initial state estimation for the RNN model are described. Numerical examples show the effectiveness of the proposed methodology in presence of significant system variations.
COMPUTERS
arxiv.org

SparseAlign: A Super-Resolution Algorithm for Automatic Marker Localization and Deformation Estimation in Cryo-Electron Tomography

Tilt-series alignment is crucial to obtaining high-resolution reconstructions in cryo-electron tomography. Beam-induced local deformation of the sample is hard to estimate from the low-contrast sample alone, and often requires fiducial gold bead markers. The state-of-the-art approach for deformation estimation uses (semi-)manually labelled marker locations in projection data to fit the parameters of a polynomial deformation model. Manually-labelled marker locations are difficult to obtain when data are noisy or markers overlap in projection data. We propose an alternative mathematical approach for simultaneous marker localization and deformation estimation by extending a grid-free super-resolution algorithm first proposed in the context of single-molecule localization microscopy. Our approach does not require labelled marker locations; instead, we use an image-based loss where we compare the forward projection of markers with the observed data. We equip this marker localization scheme with an additional deformation estimation component and solve for a reduced number of deformation parameters. Using extensive numerical studies on marker-only samples, we show that our approach automatically finds markers and reliably estimates sample deformation without labelled marker data. We further demonstrate the applicability of our approach for a broad range of model mismatch scenarios, including experimental electron tomography data of gold markers on ice.
SCIENCE
arxiv.org

Spatiotemporal Analysis Using Riemannian Composition of Diffusion Operators

Multivariate time-series have become abundant in recent years, as many data-acquisition systems record information through multiple sensors simultaneously. In this paper, we assume the variables pertain to some geometry and present an operator-based approach for spatiotemporal analysis. Our approach combines three components that are often considered separately: (i) manifold learning for building operators representing the geometry of the variables, (ii) Riemannian geometry of symmetric positive-definite matrices for multiscale composition of operators corresponding to different time samples, and (iii) spectral analysis of the composite operators for extracting different dynamic modes. We propose a method that is analogous to the classical wavelet analysis, which we term Riemannian multi-resolution analysis (RMRA). We provide some theoretical results on the spectral analysis of the composite operators, and we demonstrate the proposed method on simulations and on real data.
COMPUTERS
arxiv.org

Representing Long-Range Context for Graph Neural Networks with Global Attention

Graph neural networks are powerful architectures for structured datasets. However, current methods struggle to represent long-range dependencies. Scaling the depth or width of GNNs is insufficient to broaden receptive fields as larger GNNs encounter optimization instabilities such as vanishing gradients and representation oversmoothing, while pooling-based approaches have yet to become as universally useful as in computer vision. In this work, we propose the use of Transformer-based self-attention to learn long-range pairwise relationships, with a novel "readout" mechanism to obtain a global graph embedding. Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module. This simple architecture leads to state-of-the-art results on several graph classification tasks, outperforming methods that explicitly encode graph structure. Our results suggest that purely-learning-based approaches without graph structure may be suitable for learning high-level, long-range relationships on graphs. Code for GraphTrans is available at this https URL.
COMPUTERS
arxiv.org

A nonlinear conjugate gradient method with complexity guarantees and its application to nonconvex regression

Nonlinear conjugate gradients are among the most popular techniques for solving continuous optimization problems. Although these schemes have long been studied from a global convergence standpoint, their worst-case complexity properties have yet to be fully understood, especially in the nonconvex setting. In particular, it is unclear whether such methods possess better guarantees than first-order methods such as gradient descent. On the other hand, recent results have shown good performance of standard nonlinear conjugate gradient methods on nonconvex problems, even when compared with methods endowed with the best known complexity guarantees.
COMPUTERS
arxiv.org

Minrank of Embedded Index Coding Problems and its Relation to Connectedness of a Bipartite Graph

This paper deals with embedded index coding problem (EICP), introduced by A. Porter and M. Wootters, which is a decentralized communication problem among users with side information. An alternate definition of the parameter minrank of an EICP, which has reduced computational complexity compared to the existing definition, is presented. A graphical representation for an EICP is given using directed bipartite graphs, called bipartite problem graph, and the side information alone is represented using an undirected bipartite graph called the side information bipartite graph. Inspired by the well-studied single unicast index coding problem (SUICP), graphical structures, similar to cycles and cliques in the side information graph of an SUICP, are identified in the side information bipartite graph of a single unicast embedded index coding problem (SUEICP). Transmission schemes based on these graphical structures, called tree cover scheme and bi-clique cover scheme are also presented for an SUEICP. Also, a relation between connectedness of the side information bipartite graph and the number of transmissions required in a scalar linear solution of an EICP is established.
COMPUTERS
arxiv.org

How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and proves the benefits of unlabeled data in both training convergence and generalization ability. To make our theoretical analysis feasible, we focus on the case of one-hidden-layer neural networks. However, theoretical understanding of iterative self-training is non-trivial even for a shallow neural network. One of the key challenges is that existing neural network landscape analysis built upon supervised learning no longer holds in the (semi-supervised) self-training paradigm. We address this challenge and prove that iterative self-training converges linearly with both convergence rate and generalization accuracy improved in the order of $1/\sqrt{M}$, where $M$ is the number of unlabeled samples. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
CODING & PROGRAMMING

