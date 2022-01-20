ContributorsPublishersAdvertisers
Science

An Efficient Lorentz Equivariant Graph Neural Network for Jet Tagging

By Shiqi Gong, Qi Meng, Jue Zhang, Huilin Qu, Congqiao Li, Sitian Qian, Weitao Du, Zhi-Ming Ma, Tie-Yan Liu
arxiv.org
 4 days ago

Machine learning methods especially deep learning have become popular on jet representation in particle physics. Most of these methods focus on the handcrafted feature design or tuning the structure of existing black-box deep neural networks, while they ignore the Lorentz group equivariance, a fundamental space-time...

arxiv.org

Comments / 0

Related
arxiv.org

Artificial Neural Networks Modelling of Wall Pressure Spectra Beneath Turbulent Boundary Layers

We analyse and compare various empirical models of wall pressure spectra beneath turbulent boundary layers and propose an alternative machine learning approach using Artificial Neural Networks (ANN). The analysis and the training of the ANN are performed on data from experiments and high-fidelity simulations by various authors, covering a wide range of flow conditions. We present a methodology to extract all the turbulent boundary layer parameters required by these models, also considering flows experiencing strong adverse pressure gradients. Moreover, the database is explored to unveil important dependencies within the boundary layer parameters and to propose a possible set of features from which the ANN should predict the wall pressure spectra. The results show that the ANN outperforms traditional models in adverse pressure gradients, and its predictive capabilities generalise better over the range of investigated conditions. The analysis is completed with a deep ensemble approach for quantifying the uncertainties in the model prediction and integrated gradient analysis of the model sensitivity to its inputs. Uncertainties and sensitivities allow for identifying the regions where new training data would be most beneficial to the model's accuracy, thus opening the path towards a self-calibrating modelling approach.
COMPUTERS
arxiv.org

Assessing the persistence of chalcogen bonds in solution with neural network potentials

Non-covalent bonding patterns are commonly harvested as a design principle in the field of catalysis, supramolecular chemistry and functional materials to name a few. Yet, their computational description generally neglects finite temperature and environment effects, which promote competing interactions and alter their static gas-phase properties. Recently, neural network potentials (NNPs) trained on Density Functional Theory (DFT) data have become increasingly popular to simulate molecular phenomena in condensed phase with an accuracy comparable to ab initio methods. To date, most applications have centered on solid-state materials or fairly simple molecules made of a limited number of elements. Herein, we focus on the persistence and strength of chalcogen bonds involving a benzotelluradiazole in condensed phase. While the tellurium-containing heteroaromatic molecules are known to exhibit pronounced interactions with anions and lone pairs of different atoms, the relevance of competing intermolecular interactions, notably with the solvent, is complicated to monitor experimentally but also challenging to model at an accurate electronic structure level. Here, we train direct and baselined NNPs to reproduce hybrid DFT energies and forces in order to identify what are the most prevalent non-covalent interactions occurring in a solute-Cl$^-$-THF mixture. The simulations in explicit solvent highlight the clear competition with chalcogen bonds formed with the solvent and the short-range directionality of the interaction with direct consequences for the molecular properties in the solution. The comparison with other potentials (e.g., AMOEBA, direct NNP and continuum solvent model) also demonstrates that baselined NNPs offer a reliable picture of the non-covalent interaction interplay occurring in solution.
MATHEMATICS
towardsdatascience.com

Facial Expression Recognition (FER) without Artificial Neural Networks

When it comes to talking about Machine Learning, it’s clear that it is the science (and art) of programming computers that learn from data [1]. However, this definition raises some questions, and the first one is: data? Excel spreadsheets?. The first thing people think (or at least that’s the...
COMPUTERS
arxiv.org

Eigenvalue Distribution of Large Random Matrices Arising in Deep Neural Networks: Orthogonal Case

The paper deals with the distribution of singular values of the input-output Jacobian of deep untrained neural networks in the limit of their infinite width. The Jacobian is the product of random matrices where the independent rectangular weight matrices alternate with diagonal matrices whose entries depend on the corresponding column of the nearest neighbor weight matrix. The problem was considered in \cite{Pe-Co:18} for the Gaussian weights and biases and also for the weights that are Haar distributed orthogonal matrices and Gaussian biases. Basing on a free probability argument, it was claimed that in these cases the singular value distribution of the Jacobian in the limit of infinite width (matrix size) coincides with that of the analog of the Jacobian with special random but weight independent diagonal matrices, the case well known in random matrix theory. The claim was rigorously proved in \cite{Pa-Sl:21} for a quite general class of weights and biases with i.i.d. (including Gaussian) entries by using a version of the techniques of random matrix theory. In this paper we use another version of the techniques to justify the claim for random Haar distributed weight matrices and Gaussian biases.
COMPUTERS
IN THIS ARTICLE
#Tagging#Neural Network#Design#Graph#Lorentznet#Particlenet
arxiv.org

Enhancement of Healthcare Data Performance Metrics using Neural Network Machine Learning Algorithms

Patients are often encouraged to make use of wearable devices for remote collection and monitoring of health data. This adoption of wearables results in a significant increase in the volume of data collected and transmitted. The battery life of the devices is then quickly diminished due to the high processing requirements of the devices. Given the importance attached to medical data, it is imperative that all transmitted data adhere to strict integrity and availability requirements. Reducing the volume of healthcare data for network transmission may improve sensor battery life without compromising accuracy. There is a trade-off between efficiency and accuracy which can be controlled by adjusting the sampling and transmission rates. This paper demonstrates that machine learning can be used to analyse complex health data metrics such as the accuracy and efficiency of data transmission to overcome the trade-off problem. The study uses time series nonlinear autoregressive neural network algorithms to enhance both data metrics by taking fewer samples to transmit. The algorithms were tested with a standard heart rate dataset to compare their accuracy and efficiency. The result showed that the Levenbery-Marquardt algorithm was the best performer with an efficiency of 3.33 and accuracy of 79.17%, which is similar to other algorithms accuracy but demonstrates improved efficiency. This proves that machine learning can improve without sacrificing a metric over the other compared to the existing methods with high efficiency.
HEALTH
arxiv.org

Observing how deep neural networks understand physics through the energy spectrum of one-dimensional quantum mechanics

We investigated how neural networks (NNs) understand physics using one-dimensional quantum mechanics. After training an NN to accurately predict energy eigenvalues from potentials, we used it to confirm the NN's understanding of physics from four different aspects. The trained NN could predict energy eigenvalues of a different potential than the one learned, focus on minima and maxima of a potential, predict the probability distribution of the existence of particles not used during training, and reproduce untrained physical phenomena. These results show that NNs can learn the laws of physics from only a limited set of data, predict the results of experiments under conditions different from those used for training, and predict physical quantities of types not provided during training. Since NNs understand physics through a different path than humans take, and by complementing the human way of understanding, they will be a powerful tool for advancing physics.
SCIENCE
Nature.com

An optical neural network using less than 1 photon per multiplication

Deep learning has become a widespread tool in both science and industry. However, continued progress is hampered by the rapid growth in energy costs of ever-larger deep neural networks. Optical neural networks provide a potential means to solve the energy-cost problem faced by deep learning. Here, we experimentally demonstrate an optical neural network based on optical dot products that achieves 99% accuracy on handwritten-digit classification using ~3.1 detected photons per weight multiplication and ~90% accuracy using ~0.66 photons (~2.5"‰Ã—"‰10âˆ’19"‰J of optical energy) per weight multiplication. The fundamental principle enabling our sub-photon-per-multiplication demonstration-noise reduction from the accumulation of scalar multiplications in dot-product sums-is applicable to many different optical-neural-network architectures. Our work shows that optical neural networks can achieve accurate results using extremely low optical energies.
COMPUTERS
arxiv.org

3D extinction mapping of the Milky Way using Convolutional Neural Networks: Presentation of the method and demonstration in the Carina Arm region

Context. Several methods have been proposed to build 3D extinction maps of the Milky Way (MW), most often based on Bayesian approaches. Although some studies employed machine learning (ML) methods in part of their procedure, or to specific targets, no 3D extinction map of a large volume of the MW solely based on a Neural Network method has been reported so far. Aims. We aim to apply deep learning as a solution to build 3D extinction maps of the MW. Methods. We built a convolutional neural network (CNN) using the CIANNA framework, and trained it with synthetic 2MASS data. We used the Besançon Galaxy model to generate mock star catalogs, and 1D Gaussian random fields to simulate the extinction profiles. From these data we computed color-magnitude diagrams (CMDs) to train the network, using the corresponding extinction profiles as targets. A forward pass with observed 2MASS CMDs provided extinction profile estimates for a grid of lines of sight. Results. We trained our network with data simulating lines of sight in the area of the Carina spiral arm tangent and obtained a 3D extinction map for a large sector in this region ($l = 257 - 303$ deg, $|b| \le 5$ deg), with distance and angular resolutions of $100$ pc and $30$ arcmin, respectively, and reaching up to $\sim 10$ kpc. Although each sightline is computed independently in the forward phase, the so-called fingers-of-God artifacts are weaker than in many other 3D extinction maps. We found that our CNN was efficient in taking advantage of redundancy across lines of sight, enabling us to train it with only 9 sightlines simultaneously to build the whole map. Conclusions. We found deep learning to be a reliable approach to produce 3D extinction maps from large surveys. With this methodology, we expect to easily combine heterogeneous surveys without cross-matching, and therefore to exploit several surveys in a complementary fashion.
ASTRONOMY
YOU MAY ALSO LIKE
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

Efficient Modular Graph Transformation Rule Application

Jakob L. Andersen, Rolf Fagerberg, Juraj Kolčák, Christophe V.F.P. Laurent, Daniel Merkle, Nikolai Nøjgaard. Graph transformation formalisms have proven to be suitable tools for the modelling of chemical reactions. They are well established in theoretical studies and increasingly also in practical applications in chemistry. The latter is made feasible via the development of programming frameworks which makes the formalisms executable.
MATHEMATICS
arxiv.org

Disentangled Graph Neural Networks for Session-based Recommendation

Session-based recommendation (SBR) has drawn increasingly research attention in recent years, due to its great practical value by only exploiting the limited user behavior history in the current session. Existing methods typically learn the session embedding at the item level, namely, aggregating the embeddings of items with or without the attention weights assigned to items. However, they ignore the fact that a user's intent on adopting an item is driven by certain factors of the item (e.g., the leading actors of an movie). In other words, they have not explored finer-granularity interests of users at the factor level to generate the session embedding, leading to sub-optimal performance. To address the problem, we propose a novel method called Disentangled Graph Neural Network (Disen-GNN) to capture the session purpose with the consideration of factor-level attention on each item. Specifically, we first employ the disentangled learning technique to cast item embeddings into the embedding of multiple factors, and then use the gated graph neural network (GGNN) to learn the embedding factor-wisely based on the item adjacent similarity matrix computed for each factor. Moreover, the distance correlation is adopted to enhance the independence between each pair of factors. After representing each item with independent factors, an attention mechanism is designed to learn user intent to different factors of each item in the session. The session embedding is then generated by aggregating the item embeddings with attention weights of each item's factors. To this end, our model takes user intents at the factor level into account to infer the user purpose in a session. Extensive experiments on three benchmark datasets demonstrate the superiority of our method over existing methods.
COMPUTERS
arxiv.org

Neuroplastic graph attention networks for nuclei segmentation in histopathology images

Modern histopathological image analysis relies on the segmentation of cell structures to derive quantitative metrics required in biomedical research and clinical diagnostics. State-of-the-art deep learning approaches predominantly apply convolutional layers in segmentation and are typically highly customized for a specific experimental configuration; often unable to generalize to unknown data. As the model capacity of classical convolutional layers is limited by a finite set of learned kernels, our approach uses a graph representation of the image and focuses on the node transitions in multiple magnifications. We propose a novel architecture for semantic segmentation of cell nuclei robust to differences in experimental configuration such as staining and variation of cell types. The architecture is comprised of a novel neuroplastic graph attention network based on residual graph attention layers and concurrent optimization of the graph structure representing multiple magnification levels of the histopathological image. The modification of graph structure, which generates the node features by projection, is as important to the architecture as the graph neural network itself. It determines the possible message flow and critical properties to optimize attention, graph structure, and node updates in a balanced magnification loss. In experimental evaluation, our framework outperforms ensembles of state-of-the-art neural networks, with a fraction of the neurons typically required, and sets new standards for the segmentation of new nuclei datasets.
SCIENCE
arxiv.org

Development of a resource-efficient FPGA-based neural network regression model for the ATLAS muon trigger upgrades

In this paper, a resource-efficient FPGA-based neural network regression model is developed for potential applications in the future hardware muon trigger system of the ATLAS experiment at the Large Hadron Collider (LHC). Effective real-time selection of muon candidates is the cornerstone of the ATLAS physics programme. With the planned upgrades, the entirely new FPGA-based hardware muon trigger system will be installed in 2025-2026 that will process full muon detector data within a 10 ${\mu}s$ latency window. The planned large FPGA devices should have sufficient spare resources to allow deployment of machine learning methods for improving identification of muon candidates and searching for new exotic particles. Our model promises to improve the rejection of the dominant source of background events in the central detector region, which are due to muon candidates with low transverse momenta. This neural network was implemented in the hardware description language using 65 digital signal processors and about 10,000 lookup tables. The simulated network latency and deadtime are 245 and 60 ns, respectively, when implemented in the FPGA device using a 400 MHz clock frequency. These results are well within the requirements of the future ATLAS muon trigger system, therefore opening a possibility for deploying machine learning methods for data taking by the ATLAS experiment at the High Luminosity LHC.
SCIENCE
arxiv.org

Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics

Efficient model selection for identifying a suitable pre-trained neural network to a downstream task is a fundamental yet challenging task in deep learning. Current practice requires expensive computational costs in model training for performance prediction. In this paper, we propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections. Therefore, a converged neural network is associated with an equilibrium state of a networked system composed of those edges. To this end, we construct a network mapping $\phi$, converting a neural network $G_A$ to a directed line graph $G_B$ that is defined on those edges in $G_A$. Next, we derive a neural capacitance metric $\beta_{\rm eff}$ as a predictive measure universally capturing the generalization capability of $G_A$ on the downstream task using only a handful of early training results. We carried out extensive experiments using 17 popular pre-trained ImageNet models and five benchmark datasets, including CIFAR10, CIFAR100, SVHN, Fashion MNIST and Birds, to evaluate the fine-tuning performance of our framework. Our neural capacitance metric is shown to be a powerful indicator for model selection based only on early training results and is more efficient than state-of-the-art methods.
SCIENCE
arxiv.org

Formula graph self-attention network for representation-domain independent materials discovery

The success of machine learning (ML) in materials property prediction depends heavily on how the materials are represented for learning. Two dominant families of material descriptors exist, one that encodes crystal structure in the representation and the other that only uses stoichiometric information with the hope of discovering new materials. Graph neural networks (GNNs) in particular have excelled in predicting material properties within chemical accuracy. However, current GNNs are limited to only one of the above two avenues owing to the little overlap between respective material representations. Here, we introduce a new concept of formula graph which unifies both stoichiometry-only and structure-based material descriptors. We further develop a self-attention integrated GNN that assimilates a formula graph and show that the proposed architecture produces material embeddings transferable between the two domains. Our model substantially outperforms previous structure-based GNNs as well as structure-agnostic counterparts while exhibiting better sample efficiency and faster convergence. Finally, the model is applied in a challenging exemplar to predict the complex dielectric function of materials and nominate new substances that potentially exhibit epsilon-near-zero phenomena.
CHEMISTRY
arxiv.org

Edge-based Tensor prediction via graph neural networks

Message-passing neural networks (MPNN) have shown extremely high efficiency and accuracy in predicting the physical properties of molecules and crystals, and are expected to become the next-generation material simulation tool after the density functional theory (DFT). However, there is currently a lack of a general MPNN framework for directly predicting the tensor properties of the crystals. In this work, a general framework for the prediction of tensor properties was proposed: the tensor property of a crystal can be decomposed into the average of the tensor contributions of all the atoms in the crystal, and the tensor contribution of each atom can be expanded as the sum of the tensor projections in the directions of the edges connecting the atoms. On this basis, the edge-based expansions of force vectors, Born effective charges (BECs), dielectric (DL) and piezoelectric (PZ) tensors were proposed. These expansions are rotationally equivariant, while the coefficients in these tensor expansions are rotationally invariant scalars which are similar to physical quantities such as formation energy and band gap. The advantage of this tensor prediction framework is that it does not require the network itself to be equivariant. Therefore, in this work, we directly designed the edge-based tensor prediction graph neural network (ETGNN) model on the basis of the invariant graph neural network to predict tensors. The validity and high precision of this tensor prediction framework were shown by the tests of ETGNN on the extended systems, random perturbed structures and JARVIS-DFT datasets. This tensor prediction framework is general for nearly all the GNNs and can achieve higher accuracy with more advanced GNNs in the future.
COMPUTERS
arxiv.org

Label Relation Graphs Enhanced Hierarchical Residual Network for Hierarchical Multi-Granularity Classification

Hierarchical multi-granularity classification (HMC) assigns hierarchical multi-granularity labels to each object and focuses on encoding the label hierarchy, e.g., ["Albatross", "Laysan Albatross"] from coarse-to-fine levels. However, the definition of what is fine-grained is subjective, and the image quality may affect the identification. Thus, samples could be observed at any level of the hierarchy, e.g., ["Albatross"] or ["Albatross", "Laysan Albatross"], and examples discerned at coarse categories are often neglected in the conventional setting of HMC. In this paper, we study the HMC problem in which objects are labeled at any level of the hierarchy. The essential designs of the proposed method are derived from two motivations: (1) learning with objects labeled at various levels should transfer hierarchical knowledge between levels; (2) lower-level classes should inherit attributes related to upper-level superclasses. The proposed combinatorial loss maximizes the marginal probability of the observed ground truth label by aggregating information from related labels defined in the tree hierarchy. If the observed label is at the leaf level, the combinatorial loss further imposes the multi-class cross-entropy loss to increase the weight of fine-grained classification loss. Considering the hierarchical feature interaction, we propose a hierarchical residual network (HRN), in which granularity-specific features from parent levels acting as residual connections are added to features of children levels. Experiments on three commonly used datasets demonstrate the effectiveness of our approach compared to the state-of-the-art HMC approaches and fine-grained visual classification (FGVC) methods exploiting the label hierarchy.
SCIENCE
arxiv.org

Prediction of the electron density of states for crystalline compounds with Atomistic Line Graph Neural Networks (ALIGNN)

Machine learning (ML) based models have greatly enhanced the traditional materials discovery and design pipeline. Specifically, in recent years, surrogate ML models for material property prediction have demonstrated success in predicting discrete scalar-valued target properties to within reasonable accuracy of their DFT-computed values. However, accurate prediction of spectral targets such as the electron Density of States (DOS) poses a much more challenging problem due to the complexity of the target, and the limited amount of available training data. In this study, we present an extension of the recently developed Atomistic Line Graph Neural Network (ALIGNN) to accurately predict DOS of a large set of material unit cell structures, trained to the publicly available JARVIS-DFT dataset. Furthermore, we evaluate two methods of representation of the target quantity - a direct discretized spectrum, and a compressed low-dimensional representation obtained using an autoencoder. Through this work, we demonstrate the utility of graph-based featurization and modeling methods in the prediction of complex targets that depend on both chemistry and directional characteristics of material structures.
MATHEMATICS
arxiv.org

Training Fair Deep Neural Networks by Balancing Influence

Most fair machine learning methods either highly rely on the sensitive information of the training samples or require a large modification on the target models, which hinders their practical application. To address this issue, we propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set (second stage) where the sample weights are computed to balance the model performance across different demographic groups (first stage). FAIRIF can be applied on a wide range of models trained by stochastic gradient descent without changing the model, while only requiring group annotations on a small validation set to compute sample weights. Theoretically, we show that, in the classification setting, three notions of disparity among different groups can be mitigated by training with the weights. Experiments on synthetic data sets demonstrate that FAIRIF yields models with better fairness-utility trade-offs against various types of bias; and on real-world data sets, we show the effectiveness and scalability of FAIRIF. Moreover, as evidenced by the experiments with pretrained models, FAIRIF is able to alleviate the unfairness issue of pretrained models without hurting their performance.
CODING & PROGRAMMING
arxiv.org

Coherence resonance and stochastic synchronization in a small-world neural network: An interplay in the presence of spike-timing-dependent plasticity

Coherence resonance (CR), stochastic synchronization (SS), and spike-timing-dependent plasticity (STDP) are ubiquitous dynamical processes in biological neural networks. Whether enhancing CR can be associated with improving SS and vice versa is a fundamental question of interest. The effects of STDP and different network connectivity on this enhancement interplay are still elusive. In this paper, we consider a small-world network of excitable Hodgkin-Huxley neurons driven by channel noise and excitatory STDP with a Hebbian time window. Numerical simulations indicate that there exist intervals of parameter values of the network topology and the STDP learning rule in which an enhanced SS (CR) would improve CR (SS). In particular, it is found that at certain intermediate values of the average degree of the network, higher values of the potentiation adjusting rate, and lower values of the depression temporal window, an enhanced SS (CR) would improve CR (SS). Our results could shed some light on the efficient coding mechanisms based on the spatiotemporal coherence of the spiking activity in neural networks.
SCIENCE
arxiv.org

Invariant Representation Driven Neural Classifier for Anti-QCD Jet Tagging

We leverage representation learning and the inductive bias in neural-net-based Standard Model jet classification tasks, to detect non-QCD signal jets. In establishing the framework for classification-based anomaly detection in jet physics, we demonstrate that with a \emph{well-calibrated} and \emph{powerful enough feature extractor}, a well-trained \emph{mass-decorrelated} supervised neural jet tagger can serve as a strong generic anti-QCD jet tagger for effectively reducing the QCD background. Imposing \emph{data-augmented} mass-invariance (decoupling the dominant factor) not only facilitates background estimation, but also induces more substructure-aware representation learning. We are able to reach excellent tagging efficiencies for all the test signals considered. In the best case, we reach a background rejection rate around 50 and a significance improvement factor of 3.6 at 50 \% signal acceptance, with jet mass decorrelated. This study indicates that supervised Standard Model jet classifiers have great potential in general new physics searches.
SCIENCE

Comments / 0

Community Policy