Prediction of the electron density of states for crystalline compounds with Atomistic Line Graph Neural Networks (ALIGNN)

By Prathik R Kaundinya, Kamal Choudhary, Surya R. Kalidindi
arxiv.org
 4 days ago

Machine learning (ML) based models have greatly enhanced the traditional materials discovery and design pipeline. Specifically, in recent years, surrogate ML models for material property prediction have demonstrated success in predicting discrete scalar-valued target properties to within reasonable accuracy...

Nature.com

Electronic nature of charge density wave and electron-phonon coupling in kagome superconductor KVSb

The Kagome superconductors AV3Sb5 (A = K, Rb, Cs) have received enormous attention due to their nontrivial topological electronic structure, anomalous physical properties and superconductivity. Unconventional charge density wave (CDW) has been detected in AV3Sb5. High-precision electronic structure determination is essential to understand its origin. Here we unveil electronic nature of the CDW phase in our high-resolution angle-resolved photoemission measurements on KV3Sb5. We have observed CDW-induced Fermi surface reconstruction and the associated band folding. The CDW-induced band splitting and the associated gap opening have been revealed at the boundary of the pristine and reconstructed Brillouin zones. The Fermi surface- and momentum-dependent CDW gap is measured and the strongly anisotropic CDW gap is observed for all the V-derived Fermi surface. In particular, we have observed signatures of the electron-phonon coupling in KV3Sb5. These results provide key insights in understanding the nature of the CDW state and its interplay with superconductivity in AV3Sb5 superconductors.
PHYSICS
arxiv.org

Emission-line diagnostics of HII regions using conditional Invertible Neural Networks

Da Eun Kang, Eric W. Pellegrini, Lynton Ardizzone, Ralf S. Klessen, Ullrich Koethe, Simon C. O. Glover, Victor F. Ksoll. Young massive stars play an important role in the evolution of the interstellar medium (ISM) and the self-regulation of star formation in giant molecular clouds (GMCs) by injecting energy, momentum, and radiation (stellar feedback) into surrounding environments, disrupting the parental clouds, and regulating further star formation. Information of the stellar feedback inheres in the emission we observe, however inferring the physical properties from photometric and spectroscopic measurements is difficult, because stellar feedback is a highly complex and non-linear process, so that the observational data are highly degenerate. On this account, we introduce a novel method that couples a conditional invertible neural network (cINN) with the WARPFIELD-emission predictor (WARPFIELD-EMP) to estimate the physical properties of star-forming regions from spectral observations. We present a cINN that predicts the posterior distribution of seven physical parameters (cloud mass, star formation efficiency, cloud density, cloud age which means age of the first generation stars, age of the youngest cluster, the number of clusters, and the evolutionary phase of the cloud) from the luminosity of 12 optical emission lines, and test our network with synthetic models that are not used during training. Our network is a powerful and time-efficient tool that can accurately predict each parameter, although degeneracy sometimes remains in the posterior estimates of the number of clusters. We validate the posteriors estimated by the network and confirm that they are consistent with the input observations. We also evaluate the influence of observational uncertainties on the network performance.
ASTRONOMY
towardsdatascience.com

Facial Expression Recognition (FER) without Artificial Neural Networks

When it comes to talking about Machine Learning, it’s clear that it is the science (and art) of programming computers that learn from data [1]. However, this definition raises some questions, and the first one is: data? Excel spreadsheets?. The first thing people think (or at least that’s the...
COMPUTERS
arxiv.org

How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map Generation

Graph neural networks (GNNs), as a group of powerful tools for representation learning on irregular data, have manifested superiority in various downstream tasks. With unstructured texts represented as concept maps, GNNs can be exploited for tasks like document retrieval. Intrigued by how can GNNs help document retrieval, we conduct an empirical study on a large-scale multi-discipline dataset CORD-19. Results show that instead of the complex structure-oriented GNNs such as GINs and GATs, our proposed semantics-oriented graph functions achieve better and more stable performance based on the BM25 retrieved candidates. Our insights in this case study can serve as a guideline for future work to develop effective GNNs with appropriate semantics-oriented inductive biases for textual reasoning tasks like document retrieval and classification. All code for this case study is available at this https URL.
COMPUTERS
arxiv.org

An Efficient Lorentz Equivariant Graph Neural Network for Jet Tagging

Machine learning methods especially deep learning have become popular on jet representation in particle physics. Most of these methods focus on the handcrafted feature design or tuning the structure of existing black-box deep neural networks, while they ignore the Lorentz group equivariance, a fundamental space-time symmetry in the law for jet production. Inspired by the spirit that new physics insights may emerge more easily with the inclusion of underlying symmetry, we propose a new design of symmetry-preserving deep learning model named LorentzNet for jet tagging in this paper. Specifically, LorentzNet updates the geometric tensors via the Minkowski dot product attention, which aggregates the tensors with the embedding of the pairwise Minkowski dot product as the weights. The construction of LorentzNet is guided by the universal approximation theory on Lorentz equivariant mapping which ensures the equivariance and the universality of the LorentzNet. Experiments on two representative jet tagging datasets show that LorentzNet can achieve the best tagging performance compared with the baselines (e.g., ParticleNet) on both clean and Lorentz rotated test data. Even with $0.5\%$ fraction of training samples, LorentzNet still achieves competitive performance, which shows the benefit of the inductive bias brought by the Lorentz group symmetry.
SCIENCE
towardsdatascience.com

Exploring the LSTM Neural Network Model for Time Series

Practical, straightforward implementation with the scalecast library. One of the most advanced models out there to forecast time series is the Long Short-Term Memory (LSTM) Neural Network. According to Korstanje in his book, Advanced Forecasting with Python:. “The LSTM cell adds long-term memory in an even more performant way because...
CODING & PROGRAMMING
Nature.com

Learning self-driven collective dynamics with graph networks

Despite decades of theoretical research, the nature of the self-driven collective motion remains indigestible and controversial, while the phase transition process of its dynamic is a major research issue. Recent methods propose to infer the phase transition process from various artificially extracted features using machine learning. In this thesis, we propose a new order parameter by using machine learning to quantify the synchronization degree of the self-driven collective system from the perspective of the number of clusters. Furthermore, we construct a powerful model based on the graph network to determine the long-term evolution of the self-driven collective system from the initial position of the particles, without any manual features. Results show that this method has strong predictive power, and is suitable for various noises. Our method can provide reference for the research of other physical systems with local interactions.
COMPUTERS
arxiv.org

Density Functional Theory Transformed into a One-electron Reduced Density Matrix Functional Theory for the Capture of Static Correlation

Density functional theory (DFT), the most widely adopted method in modern computational chemistry, fails to describe accurately the electronic structure of strongly correlated systems. Here we show that DFT can be formally and practically transformed into a one-electron reduced-density-matrix (1-RDM) functional theory, which can address the limitations of DFT while retaining favorable computational scaling compared to wavefunction-based approaches. In addition to relaxing the idempotency restriction on the 1-RDM in the kinetic energy term, we add a quadratic 1-RDM-based term to DFT's density-based exchange-correlation functional. Our approach, which we implement by quadratic semidefinite programming at DFT's computational scaling of $O(r^{3})$, yields substantial improvements over traditional DFT in the description of static correlation in chemical structures and processes such as singlet biradicals and bond dissociations.
MATHEMATICS
arxiv.org

Quantum activation functions for quantum neural networks

The field of artificial neural networks is expected to strongly benefit from recent developments of quantum computers. In particular, quantum machine learning, a class of quantum algorithms which exploit qubits for creating trainable neural networks, will provide more power to solve problems such as pattern recognition, clustering and machine learning in general. The building block of feed-forward neural networks consists of one layer of neurons connected to an output neuron that is activated according to an arbitrary activation function. The corresponding learning algorithm goes under the name of Rosenblatt perceptron. Quantum perceptrons with specific activation functions are known, but a general method to realize arbitrary activation functions on a quantum computer is still lacking. Here we fill this gap with a quantum algorithm which is capable to approximate any analytic activation functions to any given order of its power series. Unlike previous proposals providing irreversible measurement--based and simplified activation functions, here we show how to approximate any analytic function to any required accuracy without the need to measure the states encoding the information. Thanks to the generality of this construction, any feed-forward neural network may acquire the universal approximation properties according to Hornik's theorem. Our results recast the science of artificial neural networks in the architecture of gate-model quantum computers.
COMPUTERS
arxiv.org

Wind Park Power Prediction: Attention-Based Graph Networks and Deep Learning to Capture Wake Losses

With the increased penetration of wind energy into the power grid, it has become increasingly important to be able to predict the expected power production for larger wind farms. Deep learning (DL) models can learn complex patterns in the data and have found wide success in predicting wake losses and expected power production. This paper proposes a modular framework for attention-based graph neural networks (GNN), where attention can be applied to any desired component of a graph block. The results show that the model significantly outperforms a multilayer perceptron (MLP) and a bidirectional LSTM (BLSTM) model, while delivering performance on-par with a vanilla GNN model. Moreover, we argue that the proposed graph attention architecture can easily adapt to different applications by offering flexibility into the desired attention operations to be used, which might depend on the specific application. Through analysis of the attention weights, it was showed that employing attention-based GNNs can provide insights into what the models learn. In particular, the attention networks seemed to realise turbine dependencies that aligned with some physical intuition about wake losses.
ENERGY INDUSTRY
arxiv.org

Training Fair Deep Neural Networks by Balancing Influence

Most fair machine learning methods either highly rely on the sensitive information of the training samples or require a large modification on the target models, which hinders their practical application. To address this issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set (second stage) where the sample weights are computed to balance the model performance across different demographic groups (first stage). FAIRIF can be applied on a wide range of models trained by stochastic gradient descent without changing the model, while only requiring group annotations on a small validation set to compute sample weights. Theoretically, we show that, in the classification setting, three notions of disparity among different groups can be mitigated by training with the weights. Experiments on synthetic data sets demonstrate that FAIRIF yields models with better fairness-utility trade-offs against various types of bias; and on real-world data sets, we show the effectiveness and scalability of FAIRIF. Moreover, as evidenced by the experiments with pretrained models, FAIRIF is able to alleviate the unfairness issue of pretrained models without hurting their performance.
CODING & PROGRAMMING
arxiv.org

Galaxy Correlation Function and Local Density from Photometric Redshifts Using the Stochastic Order Redshift Technique (SORT)

James Kakos, Joel R. Primack, Aldo Rodriguez-Puebla, Nicolas Tejos, L. Y. Aaron Yung, Rachel S. Somerville. The stochastic order redshift technique (SORT) is a simple, efficient, and robust method to improve cosmological redshift measurements. The method relies upon having a small ($\sim$10 per cent) reference sample of high-quality redshifts. Within pencil-beam-like sub-volumes surrounding each galaxy, we use the precise dN/d$z$ distribution of the reference sample to recover new redshifts and assign them one-to-one to galaxies such that the original rank order of redshifts is preserved. Preserving the rank order is motivated by the fact that random variables drawn from Gaussian probability density functions with different means but equal standard deviations satisfy stochastic ordering. The process is repeated for sub-volumes surrounding each galaxy in the survey. This results in every galaxy with an uncertain redshift being assigned multiple "recovered" redshifts from which a new redshift estimate can be determined. An earlier paper applied SORT to a mock Sloan Digital Sky Survey at $z \lesssim$ 0.2 and accurately recovered the two-point correlation function on scales $\gtrsim$4 $h^{-1}$Mpc. In this paper, we test the performance of SORT in surveys spanning the redshift range 0.75$<z<$2.25. We used two mock surveys extracted from the Small MultiDark-Planck and Bolshoi-Planck N-body simulations with dark matter haloes that were populated by the Santa Cruz semi-analytic model. We find that SORT is able to improve redshift estimates and recover distinctive large-scale features of the cosmic web. Further, it provides unbiased estimates of the redshift-space two-point correlation function $\xi(s)$ on scales $\gtrsim$2.5 $h^{-1}$Mpc, as well as local densities in regions of average or higher density. This may allow improved understanding of how galaxy properties relate to their local environments.
ASTRONOMY
arxiv.org

Diffusion Tensor Estimation with Transformer Neural Networks

Diffusion tensor imaging (DTI) is the most widely used tool for studying brain white matter development and degeneration. However, standard DTI estimation methods depend on a large number of high-quality measurements. This would require long scan times and can be particularly difficult to achieve with certain patient populations such as neonates. Here, we propose a method that can accurately estimate the diffusion tensor from only six diffusion-weighted measurements. Our method achieves this by learning to exploit the relationships between the diffusion signals and tensors in neighboring voxels. Our model is based on transformer networks, which represent the state of the art in modeling the relationship between signals in a sequence. In particular, our model consists of two such networks. The first network estimates the diffusion tensor based on the diffusion signals in a neighborhood of voxels. The second network provides more accurate tensor estimations by learning the relationships between the diffusion signals as well as the tensors estimated by the first network in neighboring voxels. Our experiments with three datasets show that our proposed method achieves highly accurate estimations of the diffusion tensor and is significantly superior to three competing methods. Estimations produced by our method with six measurements are comparable with those of standard estimation methods with 30-88 measurements. Hence, our method promises shorter scan times and more reliable assessment of brain white matter, particularly in non-cooperative patients such as neonates and infants.
SCIENCE
arxiv.org

Automatic Sparse Connectivity Learning for Neural Networks

Since sparse neural networks usually contain many zero weights, these unnecessary network connections can potentially be eliminated without degrading network performance. Therefore, well-designed sparse neural networks have the potential to significantly reduce FLOPs and computational resources. In this work, we propose a new automatic pruning method - Sparse Connectivity Learning (SCL). Specifically, a weight is re-parameterized as an element-wise multiplication of a trainable weight variable and a binary mask. Thus, network connectivity is fully described by the binary mask, which is modulated by a unit step function. We theoretically prove the fundamental principle of using a straight-through estimator (STE) for network pruning. This principle is that the proxy gradients of STE should be positive, ensuring that mask variables converge at their minima. After finding Leaky ReLU, Softplus, and Identity STEs can satisfy this principle, we propose to adopt Identity STE in SCL for discrete mask relaxation. We find that mask gradients of different features are very unbalanced, hence, we propose to normalize mask gradients of each feature to optimize mask variable training. In order to automatically train sparse masks, we include the total number of network connections as a regularization term in our objective function. As SCL does not require pruning criteria or hyper-parameters defined by designers for network layers, the network is explored in a larger hypothesis space to achieve optimized sparse connectivity for the best performance. SCL overcomes the limitations of existing automatic pruning methods. Experimental results demonstrate that SCL can automatically learn and select important network connections for various baseline network structures. Deep learning models trained by SCL outperform the SOTA human-designed and automatic pruning methods in sparsity, accuracy, and FLOPs reduction.
CODING & PROGRAMMING
IBM - United States

Digit recognition neural networks in R

Interpreting images has been a popular use case in the field of artificial intelligence (AI), and identification of handwritten digits using neural networks is commonly used in mobile applications. In this tutorial, learn how to create a web application to recognize handwritten digits using neural networks on R in Watson...
CODING & PROGRAMMING
arxiv.org

Disentangled Graph Neural Networks for Session-based Recommendation

Session-based recommendation (SBR) has drawn increasingly research attention in recent years, due to its great practical value by only exploiting the limited user behavior history in the current session. Existing methods typically learn the session embedding at the item level, namely, aggregating the embeddings of items with or without the attention weights assigned to items. However, they ignore the fact that a user's intent on adopting an item is driven by certain factors of the item (e.g., the leading actors of an movie). In other words, they have not explored finer-granularity interests of users at the factor level to generate the session embedding, leading to sub-optimal performance. To address the problem, we propose a novel method called Disentangled Graph Neural Network (Disen-GNN) to capture the session purpose with the consideration of factor-level attention on each item. Specifically, we first employ the disentangled learning technique to cast item embeddings into the embedding of multiple factors, and then use the gated graph neural network (GGNN) to learn the embedding factor-wisely based on the item adjacent similarity matrix computed for each factor. Moreover, the distance correlation is adopted to enhance the independence between each pair of factors. After representing each item with independent factors, an attention mechanism is designed to learn user intent to different factors of each item in the session. The session embedding is then generated by aggregating the item embeddings with attention weights of each item's factors. To this end, our model takes user intents at the factor level into account to infer the user purpose in a session. Extensive experiments on three benchmark datasets demonstrate the superiority of our method over existing methods.
COMPUTERS
arxiv.org

Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics

Efficient model selection for identifying a suitable pre-trained neural network to a downstream task is a fundamental yet challenging task in deep learning. Current practice requires expensive computational costs in model training for performance prediction. In this paper, we propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections. Therefore, a converged neural network is associated with an equilibrium state of a networked system composed of those edges. To this end, we construct a network mapping $\phi$, converting a neural network $G_A$ to a directed line graph $G_B$ that is defined on those edges in $G_A$. Next, we derive a neural capacitance metric $\beta_{\rm eff}$ as a predictive measure universally capturing the generalization capability of $G_A$ on the downstream task using only a handful of early training results. We carried out extensive experiments using 17 popular pre-trained ImageNet models and five benchmark datasets, including CIFAR10, CIFAR100, SVHN, Fashion MNIST and Birds, to evaluate the fine-tuning performance of our framework. Our neural capacitance metric is shown to be a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.
SCIENCE
arxiv.org

Metallic and semimetallic states of molecular crystalline hydrogen at high pressures

Ab initio molecular dynamic method within the framework of density functional theory is applied to analyze the structural and electronic properties of crystalline molecular hydrogen at temperature 100\,K. Pressure, pair correlation function and band structure are calculated. The crossover of molecular crystalline hydrogen from the state of a semiconductor to a semimetallic and metallic state is observed upon compression in the pressure range of 302-626\,GPa. At pressures below 361\,GPa, the molecular crystal with the C2/c structure is a semiconductor with an indirect gap. In the pressure range 361 - 527\,GPa, band structure of the monoclinic C2/c lattice has a characteristic semimetalic profile with partially unoccupied valence band and partially occupied conduction band. When compressed to pressures above 544\,GPa, the structure changes from monoclinic C2/c to orthorhombic Cmca, accompanied by a sharp decrease (by more than two orders of magnitude) in the value of the direct gap, which is an indication of the metallic conductivity of the resulting structure. The metallic state is metastable and exists up to a pressure of 626\,GPa.
CHEMISTRY
techxplore.com

Researchers demonstrate multimodal transistor in artificial neural networks

Researchers at the University of Surrey report a proof-of-concept demonstration of a multimodal transistor (MMT) in artificial neural networks, which mimic the human brain. This is an important step towards using thin-film transistors as artificial intelligence hardware and moves edge computing forward, with the prospect of reducing power needs and improving efficiency, rather than relying solely on computer chips.
ENGINEERING
arxiv.org

Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies

We introduce Active Predictive Coding Networks (APCNs), a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling: how can neural networks learn intrinsic reference frames for objects and parse visual scenes into part-whole hierarchies by dynamically allocating nodes in a parse tree? APCNs address this problem by using a novel combination of ideas: (1) hypernetworks are used for dynamically generating recurrent neural networks that predict parts and their locations within intrinsic reference frames conditioned on higher object-level embedding vectors, and (2) reinforcement learning is used in conjunction with backpropagation for end-to-end learning of model parameters. The APCN architecture lends itself naturally to multi-level hierarchical learning and is closely related to predictive coding models of cortical function. Using the MNIST, Fashion-MNIST and Omniglot datasets, we demonstrate that APCNs can (a) learn to parse images into part-whole hierarchies, (b) learn compositional representations, and (c) transfer their knowledge to unseen classes of objects. With their ability to dynamically generate parse trees with part locations for objects, APCNs offer a new framework for explainable AI that leverages advances in deep learning while retaining interpretability and compositionality.
CODING & PROGRAMMING

