ContributorsPublishersAdvertisers
Computers

Fast Axiomatic Attribution for Neural Networks

By Robin Hesse, Simone Schaub-Meyer, Stefan Roth
arxiv.org
 5 days ago

Mitigating the dependence on spurious correlations present in the training dataset is a quickly emerging and important topic of deep learning. Recent approaches include priors on the feature attribution of a deep neural network (DNN) into the training process to reduce...

arxiv.org

Comments / 0

Related
towardsdatascience.com

Modeling uncertainty in neural networks with TensorFlow Probability

This series is a brief introduction to modeling uncertainty using TensorFlow Probability library. I wrote it as a supplementary material to my PyData Global 2021 talk on uncertainty estimation in neural networks. Part 1 can be found here. Introduction. In the first part of the series we talked about motivations...
CODING & PROGRAMMING
arxiv.org

On Representation Knowledge Distillation for Graph Neural Networks

Knowledge distillation is a promising learning paradigm for boosting the performance and reliability of resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships across the student and teacher's node embedding spaces. In this paper, we make two key contributions:
COMPUTERS
arxiv.org

Learning a compass spin model with neural network quantum states

Neural network quantum states provide a novel representation of the many-body states of interacting quantum systems and open up a promising route to solve frustrated quantum spin models that evade other numerical approaches. Yet its capacity to describe complex magnetic orders with large unit cells has not been demonstrated, and its performance in a rugged energy landscape has been questioned. Here we apply restricted Boltzmann machines and stochastic gradient descent to seek the ground states of a compass spin model on the honeycomb lattice, which unifies the Kitaev model, Ising model and the quantum 120$^\circ$ model with a single tuning parameter. We report calculation results on the variational energy, order parameters and correlation functions. The phase diagram obtained is in good agreement with the predictions of tensor network ansatz, demonstrating the capacity of restricted Boltzmann machines in learning the ground states of frustrated quantum spin Hamiltonians. The limitations of the calculation are discussed. A few strategies are outlined to address some of the challenges in machine learning frustrated quantum magnets.
COMPUTERS
arxiv.org

LSP : Acceleration and Regularization of Graph Neural Networks via Locality Sensitive Pruning of Graphs

Graph Neural Networks (GNNs) have emerged as highly successful tools for graph-related tasks. However, real-world problems involve very large graphs, and the compute resources needed to fit GNNs to those problems grow rapidly. Moreover, the noisy nature and size of real-world graphs cause GNNs to over-fit if not regularized properly. Surprisingly, recent works show that large graphs often involve many redundant components that can be removed without compromising the performance too much. This includes node or edge removals during inference through GNNs layers or as a pre-processing step that sparsifies the input graph. This intriguing phenomenon enables the development of state-of-the-art GNNs that are both efficient and accurate. In this paper, we take a further step towards demystifying this phenomenon and propose a systematic method called Locality-Sensitive Pruning (LSP) for graph pruning based on Locality-Sensitive Hashing. We aim to sparsify a graph so that similar local environments of the original graph result in similar environments in the resulting sparsified graph, which is an essential feature for graph-related tasks. To justify the application of pruning based on local graph properties, we exemplify the advantage of applying pruning based on locality properties over other pruning strategies in various scenarios. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of LSP, which removes a significant amount of edges from large graphs without compromising the performance, accompanied by a considerable acceleration.
CODING & PROGRAMMING
IN THIS ARTICLE
#Attribution#Axiomatic#Deep Neural Networks#Dnn#Lg
arxiv.org

Convolutional Neural Network Dynamics: A Graph Perspective

The success of neural networks (NNs) in a wide range of applications has led to increased interest in understanding the underlying learning dynamics of these models. In this paper, we go beyond mere descriptions of the learning dynamics by taking a graph perspective and investigating the relationship between the graph structure of NNs and their performance. Specifically, we propose (1) representing the neural network learning process as a time-evolving graph (i.e., a series of static graph snapshots over epochs), (2) capturing the structural changes of the NN during the training phase in a simple temporal summary, and (3) leveraging the structural summary to predict the accuracy of the underlying NN in a classification or regression task. For the dynamic graph representation of NNs, we explore structural representations for fully-connected and convolutional layers, which are key components of powerful NN models. Our analysis shows that a simple summary of graph statistics, such as weighted degree and eigenvector centrality, over just a few epochs can be used to accurately predict the performance of NNs. For example, a weighted degree-based summary of the time-evolving graph that is constructed based on 5 training epochs of the LeNet architecture achieves classification accuracy of over 93%. Our findings are consistent for different NN architectures, including LeNet, VGG, AlexNet and ResNet.
COMPUTERS
arxiv.org

Parallel Physics-Informed Neural Networks with Bidirectional Balance

As an emerging technology in deep learning, physics-informed neural networks (PINNs) have been widely used to solve various partial differential equations (PDEs) in engineering. However, PDEs based on practical considerations contain multiple physical quantities and complex initial boundary conditions, thus PINNs often returns incorrect results. Here we take heat transfer problem in multilayer fabrics as a typical example. It is coupled by multiple temperature fields with strong correlation, and the values of variables are extremely unbalanced among different dimensions. We clarify the potential difficulties of solving such problems by classic PINNs, and propose a parallel physics-informed neural networks with bidirectional balance. In detail, our parallel solving framework synchronously fits coupled equations through several multilayer perceptions. Moreover, we design two modules to balance forward process of data and back-propagation process of loss gradient. This bidirectional balance not only enables the whole network to converge stably, but also helps to fully learn various physical conditions in PDEs. We provide a series of ablation experiments to verify the effectiveness of the proposed methods. The results show that our approach makes the PINNs unsolvable problem solvable, and achieves excellent solving accuracy.
ENGINEERING
arxiv.org

Unsupervised Learning for Identifying High Eigenvector Centrality Nodes: A Graph Neural Network Approach

The existing methods to calculate the Eigenvector Centrality(EC) tend to not be robust enough for determination of EC in low time complexity or not well-scalable for large networks, hence rendering them practically unreliable/ computationally expensive. So, it is of the essence to develop a method that is scalable in low computational time. Hence, we propose a deep learning model for the identification of nodes with high Eigenvector Centrality. There have been a few previous works in identifying the high ranked nodes with supervised learning methods, but in real-world cases, the graphs are not labelled and hence deployment of supervised learning methods becomes a hazard and its usage becomes impractical. So, we devise CUL(Centrality with Unsupervised Learning) method to learn the relative EC scores in a network in an unsupervised manner. To achieve this, we develop an Encoder-Decoder based framework that maps the nodes to their respective estimated EC scores. Extensive experiments were conducted on different synthetic and real-world networks. We compared CUL against a baseline supervised method for EC estimation similar to some of the past works. It was observed that even with training on a minuscule number of training datasets, CUL delivers a relatively better accuracy score when identifying the higher ranked nodes than its supervised counterpart. We also show that CUL is much faster and has a smaller runtime than the conventional baseline method for EC computation. The code is available at this https URL.
CODING & PROGRAMMING
arxiv.org

DropGNN: Random Dropouts Increase the Expressiveness of Graph Neural Networks

This paper studies Dropout Graph Neural Networks (DropGNNs), a new approach that aims to overcome the limitations of standard GNN frameworks. In DropGNNs, we execute multiple runs of a GNN on the input graph, with some of the nodes randomly and independently dropped in each of these runs. Then, we combine the results of these runs to obtain the final result. We prove that DropGNNs can distinguish various graph neighborhoods that cannot be separated by message passing GNNs. We derive theoretical bounds for the number of runs required to ensure a reliable distribution of dropouts, and we prove several properties regarding the expressive capabilities and limits of DropGNNs. We experimentally validate our theoretical findings on expressiveness. Furthermore, we show that DropGNNs perform competitively on established GNN benchmarks.
CODING & PROGRAMMING
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution

The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup is challenging because the search space grows combinatorially, and debugging and testing on clusters is expensive. In this work we propose DistIR, an expressive intermediate representation for distributed DNN computation that is tailored for efficient analyses, such as simulation. This enables automatically identifying the top-performing strategies without having to execute on physical hardware. Unlike prior work, DistIR can naturally express many distribution strategies including pipeline parallelism with arbitrary schedules. Our evaluation on MLP training and GPT-2 inference models demonstrates how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations, reducing optimization time by an order of magnitude for certain regimes.
CODING & PROGRAMMING
arxiv.org

Predicting Lattice Phonon Vibrational Frequencies Using Deep Graph Neural Networks

Lattice vibration frequencies are related to many important materials properties such as thermal and electrical conductivity as well as superconductivity. However, computational calculation of vibration frequencies using density functional theory (DFT) methods is too computationally demanding for a large number of samples in materials screening. Here we propose a deep graph neural network-based algorithm for predicting crystal vibration frequencies from crystal structures with high accuracy. Our algorithm addresses the variable dimension of vibration frequency spectrum using the zero padding scheme. Benchmark studies on two data sets with 15,000 and 35,552 samples show that the aggregated $R^2$ scores of the prediction reaches 0.554 and 0.724 respectively. Our work demonstrates the capability of deep graph neural networks to learn to predict phonon spectrum properties of crystal structures in addition to phonon density of states (DOS) and electronic DOS in which the output dimension is constant.
SCIENCE
arxiv.org

Simplifying approach to Node Classification in Graph Neural Networks

Graph Neural Networks have become one of the indispensable tools to learn from graph-structured data, and their usefulness has been shown in wide variety of tasks. In recent years, there have been tremendous improvements in architecture design, resulting in better performance on various prediction tasks. In general, these neural architectures combine node feature aggregation and feature transformation using learnable weight matrix in the same layer. This makes it challenging to analyze the importance of node features aggregated from various hops and the expressiveness of the neural network layers. As different graph datasets show varying levels of homophily and heterophily in features and class label distribution, it becomes essential to understand which features are important for the prediction tasks without any prior information. In this work, we decouple the node feature aggregation step and depth of graph neural network, and empirically analyze how different aggregated features play a role in prediction performance. We show that not all features generated via aggregation steps are useful, and often using these less informative features can be detrimental to the performance of the GNN model. Through our experiments, we show that learning certain subsets of these features can lead to better performance on wide variety of datasets. We propose to use softmax as a regularizer and "soft-selector" of features aggregated from neighbors at different hop distances; and L2-Normalization over GNN layers. Combining these techniques, we present a simple and shallow model, Feature Selection Graph Neural Network (FSGNN), and show empirically that the proposed model achieves comparable or even higher accuracy than state-of-the-art GNN models in nine benchmark datasets for the node classification task, with remarkable improvements up to 51.1%.
CODING & PROGRAMMING
arxiv.org

Physics-informed neural networks for understanding shear migration of particles in viscous flow

We harness the physics-informed neural network (PINN) approach to extend the utility of phenomenological models for particle migration in shear flow. Specifically, we propose to constrain the neural network training via a model for the physics of shear-induced particle migration in suspensions. Then, we train the PINN against experimental data from the literature, showing that this approach provides both better fidelity to the experiments, and novel understanding of the relative roles of the hypothesized migration fluxes. We first verify the PINN approach for solving the inverse problem of radial particle migration in a non-Brownian suspension in an annular Couette flow. In this classical case, the PINN yields the same value (as reported in the literature) for the ratio of the two parameters of the empirical model. Next, we apply the PINN approach to analyze experiments on particle migration in both non-Brownian and Brownian suspensions in Poiseuille slot flow, for which a definitive calibration of the phenomenological migration model has been lacking. Using the PINN approach, we identify the unknown/empirical parameters in the physical model through the inverse solver capability of PINNs. Specifically, the values are significantly different from those for the Couette cell, highlighting an inconsistency in the literature that uses the latter value for Poiseuille flow. Importantly, the PINN results also show that the inferred values of the empirical model's parameters vary with the shear Péclet number and the particle bulk volume fraction of the suspension, instead of being constant as assumed in previous literature.
SCIENCE
arxiv.org

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods cannot achieve this goal in practice because the chain rule based gradient (w.r.t. structure parameters) estimators adopted by previous methods require dense computation at least in the backward propagation step. This paper solves this problem by proposing an efficient sparse training method with completely sparse forward and backward passes. We first formulate the training process as a continuous minimization problem under global sparsity constraint. We then separate the optimization process into two steps, corresponding to weight update and structure parameter update. For the former step, we use the conventional chain rule, which can be sparse via exploiting the sparse structure. For the latter step, instead of using the chain rule based gradient estimators as in existing methods, we propose a variance reduced policy gradient estimator, which only requires two forward passes without backward propagation, thus achieving completely sparse training. We prove that the variance of our gradient estimator is bounded. Extensive experimental results on real-world datasets demonstrate that compared to previous methods, our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
CODING & PROGRAMMING
arxiv.org

Deep ReLU neural network approximation of parametric and stochastic elliptic PDEs with lognormal inputs

We investigate non-adaptive methods of deep ReLU neural network approximation of the solution $u$ to parametric and stochastic elliptic PDEs with lognormal inputs on non-compact set $\mathbb{R}^\infty$. The approximation error is measured in the norm of the Bochner space $L_2(\mathbb{R}^\infty, V, \gamma)$, where $\gamma$ is the tensor product standard Gaussian probability on $\mathbb{R}^\infty$ and $V$ is the energy space. The approximation is based on an $m$-term truncation of the Hermite generalized polynomial chaos expansion (gpc) of $u$. Under a certain assumption on $\ell_q$-summability condition for lognormal inputs ($0< q <\infty$), we proved that for every integer $n > 1$, one can construct a non-adaptive compactly supported deep ReLU neural network $\boldsymbol{\phi}_n$ of size not greater than $n$ on $\mathbb{R}^m$ with $m = \mathcal{O} (n/\log n)$, having $m$ outputs so that the summation constituted by replacing polynomials in the $m$-term truncation of Hermite gpc expansion by these $m$ outputs approximates $u$ with an error bound $\mathcal{O}\left(\left(n/\log n\right)^{-1/q}\right)$. This error bound is comparable to the error bound of the best approximation of $u$ by $n$-term truncations of Hermite gpc expansion which is $\mathcal{O}(n^{-1/q})$. We also obtained some results on similar problems for parametric and stochastic elliptic PDEs with affine inputs, based on the Jacobi and Taylor gpc expansions.
CODING & PROGRAMMING
arxiv.org

Smart Data Representations: Impact on the Accuracy of Deep Neural Networks

Deep Neural Networks are able to solve many complex tasks with less engineering effort and better performance. However, these networks often use data for training and evaluation without investigating its representation, i.e.~the form of the used data. In the present paper, we analyze the impact of data representations on the performance of Deep Neural Networks using energy time series forecasting. Based on an overview of exemplary data representations, we select four exemplary data representations and evaluate them using two different Deep Neural Network architectures and three forecasting horizons on real-world energy time series. The results show that, depending on the forecast horizon, the same data representations can have a positive or negative impact on the accuracy of Deep Neural Networks.
SCIENCE
arxiv.org

Learned Dynamics of Electrothermally-Actuated Soft Robot Limbs Using LSTM Neural Networks

Modeling the dynamics of soft robot limbs with electrothermal actuators is generally challenging due to thermal and mechanical hysteresis and the complex physical interactions that can arise during robot operation. This article proposes a neural network based on long short-term memory (LSTM) to address these challenges in actuator modeling. A planar soft limb, actuated by a pair of shape memory alloy (SMA) coils and containing embedded sensors for temperature and angular deflection, is used as a test platform. Data from this robot are used to train LSTM neural networks, using different combinations of sensor data, to model both unidirectional (one SMA) and bidirectional (both SMAs) motion. Open-loop rollout results show that the learned model is able to predict motions over extraordinarily long open-loop timescales (10 minutes) with little drift. Prediction errors are on the order of the soft deflection sensor's accuracy, even when using only the actuator's pulse width modulation inputs for learning. These LSTM models can be used in-situ, without extensive sensing, helping to bring soft electrothermally-actuated robots into practical application.
ENGINEERING
aithority.com

VeriSilicon Neural Network Processor IP Embedded In Over 100 AI Chips

50 customers licensed the technology for more than 100 AI chips in 10 major market segments. VeriSilicon a leading Silicon Platform as a Service company, announced its neural network processor IP designed for artificial intelligence (AI) applications now features in more than 100 AI chips supplied by 50 licensees. These chips with built-in VeriSilicon Vivante NPUs are in 10 major market segments, including Internet of Things (IoT), wearables, smart TVs, smart home, security monitoring, servers, automotive electronics, smartphones, tablets and smart healthcare.
COMPUTERS
arxiv.org

An Application of Quantum Machine Learning on Quantum Correlated Systems: Quantum Convolutional Neural Network as a Classifier for Many-Body Wavefunctions from the Quantum Variational Eigensolver

Machine learning has been applied on a wide variety of models, from classical statistical mechanics to quantum strongly correlated systems for the identification of phase transitions. The recently proposed quantum convolutional neural network (QCNN) provides a new framework for using quantum circuits instead of classical neural networks as the backbone of classification methods. We present here the results from training the QCNN by the wavefunctions of the variational quantum eigensolver for the one-dimensional transverse field Ising model (TFIM). We demonstrate that the QCNN identifies wavefunctions which correspond to the paramagnetic phase and the ferromagnetic phase of the TFIM with good accuracy. The QCNN can be trained to predict the corresponding phase of wavefunctions around the putative quantum critical point, even though it is trained by wavefunctions far away from it. This provides a basis for exploiting the QCNN to identify the quantum critical point.
COMPUTERS
techxplore.com

A neural network-based optimization technique inspired by the principle of annealing

Optimization problems involve the identification of the best possible solution among several possibilities. These problems can be encountered in real-world settings, as well as in most scientific research fields. In recent years, computer scientists have developed increasingly advanced computational methods for solving optimization problems. Some of the most promising techniques...
CODING & PROGRAMMING
arxiv.org

Observation Error Covariance Specification in Dynamical Systems for Data assimilation using Recurrent Neural Networks

Data assimilation techniques are widely used to predict complex dynamical systems with uncertainties, based on time-series observation data. Error covariance matrices modelling is an important element in data assimilation algorithms which can considerably impact the forecasting accuracy. The estimation of these covariances, which usually relies on empirical assumptions and physical constraints, is often imprecise and computationally expensive especially for systems of large dimension. In this work, we propose a data-driven approach based on long short term memory (LSTM) recurrent neural networks (RNN) to improve both the accuracy and the efficiency of observation covariance specification in data assimilation for dynamical systems. Learning the covariance matrix from observed/simulated time-series data, the proposed approach does not require any knowledge or assumption about prior error distribution, unlike classical posterior tuning methods. We have compared the novel approach with two state-of-the-art covariance tuning algorithms, namely DI01 and D05, first in a Lorenz dynamical system and then in a 2D shallow water twin experiments framework with different covariance parameterization using ensemble assimilation. This novel method shows significant advantages in observation covariance specification, assimilation accuracy and computational efficiency.
COMPUTERS

Comments / 0

Community Policy