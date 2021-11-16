ContributorsPublishersAdvertisers
Neuron-based Pruning of Deep Neural Networks with Better Generalization using Kronecker Factored Curvature Approximation

By Abdolghani Ebrahimi, Diego Klabjan
arxiv.org
 8 days ago

Existing methods of pruning deep neural networks focus on removing unnecessary parameters of the trained network and fine tuning the model afterwards to find a good solution that recovers the initial performance...

arxiv.org

arxiv.org

Predicting Lattice Phonon Vibrational Frequencies Using Deep Graph Neural Networks

Lattice vibration frequencies are related to many important materials properties such as thermal and electrical conductivity as well as superconductivity. However, computational calculation of vibration frequencies using density functional theory (DFT) methods is too computationally demanding for a large number of samples in materials screening. Here we propose a deep graph neural network-based algorithm for predicting crystal vibration frequencies from crystal structures with high accuracy. Our algorithm addresses the variable dimension of vibration frequency spectrum using the zero padding scheme. Benchmark studies on two data sets with 15,000 and 35,552 samples show that the aggregated $R^2$ scores of the prediction reaches 0.554 and 0.724 respectively. Our work demonstrates the capability of deep graph neural networks to learn to predict phonon spectrum properties of crystal structures in addition to phonon density of states (DOS) and electronic DOS in which the output dimension is constant.
SCIENCE
arxiv.org

Automated Pulmonary Embolism Detection from CTPA Images Using an End-to-End Convolutional Neural Network

Automated methods for detecting pulmonary embolisms (PEs) on CT pulmonary angiography (CTPA) images are of high demand. Existing methods typically employ separate steps for PE candidate detection and false positive removal, without considering the ability of the other step. As a result, most existing methods usually suffer from a high false positive rate in order to achieve an acceptable sensitivity. This study presents an end-to-end trainable convolutional neural network (CNN) where the two steps are optimized jointly. The proposed CNN consists of three concatenated subnets: 1) a novel 3D candidate proposal network for detecting cubes containing suspected PEs, 2) a 3D spatial transformation subnet for generating fixed-sized vessel-aligned image representation for candidates, and 3) a 2D classification network which takes the three cross-sections of the transformed cubes as input and eliminates false positives. We have evaluated our approach using the 20 CTPA test dataset from the PE challenge, achieving a sensitivity of 78.9%, 80.7% and 80.7% at 2 false positives per volume at 0mm, 2mm and 5mm localization error, which is superior to the state-of-the-art methods. We have further evaluated our system on our own dataset consisting of 129 CTPA data with a total of 269 emboli. Our system achieves a sensitivity of 63.2%, 78.9% and 86.8% at 2 false positives per volume at 0mm, 2mm and 5mm localization error.
HEALTH
arxiv.org

VeSoNet: Traffic-Aware Content Caching for Vehicular Social Networks based on Path Planning and Deep Reinforcement Learning

Vehicular social networking is an emerging application of the promising Internet of Vehicles (IoV) which aims to achieve the seamless integration of vehicular networks and social networks. However, the unique characteristics of vehicular networks such as high mobility and frequent communication interruptions make content delivery to end-users under strict delay constrains an extremely challenging task. In this paper, we propose a social-aware vehicular edge computing architecture that solves the content delivery problem by using some of the vehicles in the network as edge servers that can store and stream popular content to close-by end-users. The proposed architecture includes three components. First, we propose a social-aware graph pruning search algorithm that computes and assigns the vehicles to the shortest path with the most relevant vehicular content providers. Secondly, we use a traffic-aware content recommendation scheme to recommend relevant content according to their social context. This scheme uses graph embeddings in which the vehicles are represented by a set of low-dimension vectors (vehicle2vec) to store information about previously consumed content. Finally, we propose a Deep Reinforcement Learning (DRL) method to optimize the content provider vehicles distribution across the network. The results obtained from a realistic traffic simulation show the effectiveness and robustness of the proposed system when compared to the state-of-the-art baselines.
COMPUTERS
arxiv.org

Pairwise interactions for Potential energy surfaces and Atomic forces with Deep Neural network

Molecular dynamics (MD) simulation, which is considered an important tool for studying physical and chemical processes at the atomic scale, requires accurate calculations of energies and forces. Although reliable energies and forces can be obtained by electronic structure calculations such as those based on density functional theory (DFT), this approach is computationally expensive. In this work, we propose a full-stack model using deep neural network (NN) to enhance the calculation of force and energy, in which the NN is designed to extract the embedding feature of pairwise interactions of an atom and its neighbors, which are aggregated to obtain its feature vector for predicting atomic force and potential energy. By designing the features of the pairwise interactions, we can control the performance of models and take into account the many-body effects and other physics of the atomic interactions. Moreover, we demonstrated that using the Coulomb matrix of the local structures in complement to the pairwise information, we can improve the prediction of force and energy for silicon systems and the transferability of our models is confirmed to larger systems, with high accuracy.
SCIENCE
IN THIS ARTICLE
#Generalization#Lg
arxiv.org

LSP : Acceleration and Regularization of Graph Neural Networks via Locality Sensitive Pruning of Graphs

Graph Neural Networks (GNNs) have emerged as highly successful tools for graph-related tasks. However, real-world problems involve very large graphs, and the compute resources needed to fit GNNs to those problems grow rapidly. Moreover, the noisy nature and size of real-world graphs cause GNNs to over-fit if not regularized properly. Surprisingly, recent works show that large graphs often involve many redundant components that can be removed without compromising the performance too much. This includes node or edge removals during inference through GNNs layers or as a pre-processing step that sparsifies the input graph. This intriguing phenomenon enables the development of state-of-the-art GNNs that are both efficient and accurate. In this paper, we take a further step towards demystifying this phenomenon and propose a systematic method called Locality-Sensitive Pruning (LSP) for graph pruning based on Locality-Sensitive Hashing. We aim to sparsify a graph so that similar local environments of the original graph result in similar environments in the resulting sparsified graph, which is an essential feature for graph-related tasks. To justify the application of pruning based on local graph properties, we exemplify the advantage of applying pruning based on locality properties over other pruning strategies in various scenarios. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of LSP, which removes a significant amount of edges from large graphs without compromising the performance, accompanied by a considerable acceleration.
CODING & PROGRAMMING
arxiv.org

Graph Neural Network Training with Data Tiering

Graph Neural Networks (GNNs) have shown success in learning from graph-structured data, with applications to fraud detection, recommendation, and knowledge graph reasoning. However, training GNN efficiently is challenging because: 1) GPU memory capacity is limited and can be insufficient for large datasets, and 2) the graph-based data structure causes irregular data access patterns. In this work, we provide a method to statistical analyze and identify more frequently accessed data ahead of GNN training. Our data tiering method not only utilizes the structure of input graph, but also an insight gained from actual GNN training process to achieve a higher prediction result. With our data tiering method, we additionally provide a new data placement and access strategy to further minimize the CPU-GPU communication overhead. We also take into account of multi-GPU GNN training as well and we demonstrate the effectiveness of our strategy in a multi-GPU system. The evaluation results show that our work reduces CPU-GPU traffic by 87-95% and improves the training speed of GNN over the existing solutions by 1.6-2.1x on graphs with hundreds of millions of nodes and billions of edges.
CODING & PROGRAMMING
towardsdatascience.com

Radial Basis Function Neural Network Simplified

A short introduction to radial basis function neural network. Radial basis function (RBF) networks have a fundamentally different architecture than most neural network architectures. Most neural network architecture consists of many layers and introduces nonlinearity by repetitively applying nonlinear activation functions. RBF network on the other hand only consists of an input layer, a single hidden layer, and an output layer.
CODING & PROGRAMMING
Nature.com

Human stem cell-derived GABAergic neurons functionally integrate into human neuronal networks

Gamma-aminobutyric acid (GABA)-releasing interneurons modulate neuronal network activity in the brain by inhibiting other neurons. The alteration or absence of these cells disrupts the balance between excitatory and inhibitory processes, leading to neurological disorders such as epilepsy. In this regard, cell-based therapy may be an alternative therapeutic approach. We generated light-sensitive human embryonic stem cell (hESC)-derived GABAergic interneurons (hdIN) and tested their functionality. After 35Â days in vitro (DIV), hdINs showed electrophysiological properties and spontaneous synaptic currents comparable to mature neurons. In co-culture with human cortical neurons and after transplantation (AT) into human brain tissue resected from patients with drug-resistant epilepsy, light-activated channelrhodopsin-2 (ChR2) expressing hdINs induced postsynaptic currents in human neurons, strongly suggesting functional efferent synapse formation. These results provide a proof-of-concept that hESC-derived neurons can integrate and modulate the activity of a human host neuronal network. Therefore, this study supports the possibility of precise temporal control of network excitability by transplantation of light-sensitive interneurons.
SCIENCE
arxiv.org

Training neural networks with synthetic electrocardiograms

We present a method for training neural networks with synthetic electrocardiograms that mimic signals produced by a wearable single lead electrocardiogram monitor. We use domain randomization where the synthetic signal properties such as the waveform shape, RR-intervals and noise are varied for every training example. Models trained with synthetic data are compared to their counterparts trained with real data. Detection of r-waves in electrocardiograms recorded during different physical activities and in atrial fibrillation is used to compare the models. By allowing the randomization to increase beyond what is typically observed in the real-world data the performance is on par or superseding the performance of networks trained with real data. Experiments show robust performance with different seeds and training examples on different test sets without any test set specific tuning. The method makes possible to train neural networks using practically free-to-collect data with accurate labels without the need for manual annotations and it opens up the possibility of extending the use of synthetic data on cardiac disease classification when disease specific a priori information is used in the electrocardiogram generation. Additionally the distribution of data can be controlled eliminating class imbalances that are typically observed in health related data and additionally the generated data is inherently private.
SCIENCE
arxiv.org

Diffraction integral computation using sinc approximation

We propose a method based on sinc series approximations for computing the Rayleigh-Sommerfeld and Fresnel diffraction integrals of optics. The diffraction integrals are given in terms of a convolution, and our proposed numerical approach is not only super-algebraically convergent, but it also satisfies an important property of the convolution -- namely, the preservation of bandwidth. Furthermore, the accuracy of the proposed method depends only on how well the source field is approximated; it is independent of wavelength, propagation distance, and observation plane discretization. In contrast, methods based on the fast Fourier transform (FFT), such as the angular spectrum method (ASM) and its variants, approximate the optical fields in the source and observation planes using Fourier series. We will show that the ASM introduces artificial periodic boundary conditions and violates the preservation of bandwidth property, resulting in limited accuracy which decreases for longer propagation distances. The sinc-based approach avoids both of these problems. Numerical results are presented for Gaussian beam propagation and circular aperture diffraction to demonstrate the high-order accuracy of the sinc method for both short-range and long-range propagation. For comparison, we also present numerical results obtained with the angular spectrum method.
MATHEMATICS
arxiv.org

Silicon photonic subspace neural chip for hardware-efficient deep learning

As deep learning has shown revolutionary performance in many artificial intelligence applications, its escalating computation demand requires hardware accelerators for massive parallelism and improved throughput. The optical neural network (ONN) is a promising candidate for next-generation neurocomputing due to its high parallelism, low latency, and low energy consumption. Here, we devise a hardware-efficient photonic subspace neural network (PSNN) architecture, which targets lower optical component usage, area cost, and energy consumption than previous ONN architectures with comparable task performance. Additionally, a hardware-aware training framework is provided to minimize the required device programming precision, lessen the chip area, and boost the noise robustness. We experimentally demonstrate our PSNN on a butterfly-style programmable silicon photonic integrated circuit and show its utility in practical image recognition tasks.
ENGINEERING
arxiv.org

Skillful Twelve Hour Precipitation Forecasts using Large Context Neural Networks

Lasse Espeholt, Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Jason Hickey, Aaron Bell, Nal Kalchbrenner. The problem of forecasting weather has been scientifically studied for centuries due to its high impact on human lives, transportation, food production and energy management, among others. Current operational forecasting models are based on physics and use supercomputers to simulate the atmosphere to make forecasts hours and days in advance. Better physics-based forecasts require improvements in the models themselves, which can be a substantial scientific challenge, as well as improvements in the underlying resolution, which can be computationally prohibitive. An emerging class of weather models based on neural networks represents a paradigm shift in weather forecasting: the models learn the required transformations from data instead of relying on hand-coded physics and are computationally efficient. For neural models, however, each additional hour of lead time poses a substantial challenge as it requires capturing ever larger spatial contexts and increases the uncertainty of the prediction. In this work, we present a neural network that is capable of large-scale precipitation forecasting up to twelve hours ahead and, starting from the same atmospheric state, the model achieves greater skill than the state-of-the-art physics-based models HRRR and HREF that currently operate in the Continental United States. Interpretability analyses reinforce the observation that the model learns to emulate advanced physics principles. These results represent a substantial step towards establishing a new paradigm of efficient forecasting with neural networks.
ENVIRONMENT
arxiv.org

Verifying Controllers with Convolutional Neural Network-based Perception: A Case for Intelligible, Safe, and Precise Abstractions

Chiao Hsieh (1), Keyur Joshi (1), Sasa Misailovic (1), Sayan Mitra (1) ((1) University of Illinois at Urbana-Champaign) Convolutional Neural Networks (CNN) for object detection, lane detection, and segmentation now sit at the head of most autonomy pipelines, and yet, their safety analysis remains an important challenge. Formal analysis of perception models is fundamentally difficult because their correctness is hard if not impossible to specify. We present a technique for inferring intelligible and safe abstractions for perception models from system-level safety requirements, data, and program analysis of the modules that are downstream from perception. The technique can help tradeoff safety, size, and precision, in creating abstractions and the subsequent verification. We apply the method to two significant case studies based on high-fidelity simulations (a) a vision-based lane keeping controller for an autonomous vehicle and (b) a controller for an agricultural robot. We show how the generated abstractions can be composed with the downstream modules and then the resulting abstract system can be verified using program analysis tools like CBMC. Detailed evaluations of the impacts of size, safety requirements, and the environmental parameters (e.g., lighting, road surface, plant type) on the precision of the generated abstractions suggest that the approach can help guide the search for corner cases and safe operating envelops.
ENGINEERING
arxiv.org

Kronecker Factorization for Preventing Catastrophic Forgetting in Large-scale Medical Entity Linking

Multi-task learning is useful in NLP because it is often practically desirable to have a single model that works across a range of tasks. In the medical domain, sequential training on tasks may sometimes be the only way to train models, either because access to the original (potentially sensitive) data is no longer available, or simply owing to the computational costs inherent to joint retraining. A major issue inherent to sequential learning, however, is catastrophic forgetting, i.e., a substantial drop in accuracy on prior tasks when a model is updated for a new task. Elastic Weight Consolidation is a recently proposed method to address this issue, but scaling this approach to the modern large models used in practice requires making strong independence assumptions about model parameters, limiting its effectiveness. In this work, we apply Kronecker Factorization--a recent approach that relaxes independence assumptions--to prevent catastrophic forgetting in convolutional and Transformer-based neural networks at scale. We show the effectiveness of this technique on the important and illustrative task of medical entity linking across three datasets, demonstrating the capability of the technique to be used to make efficient updates to existing methods as new medical data becomes available. On average, the proposed method reduces catastrophic forgetting by 51% when using a BERT-based model, compared to a 27% reduction using standard Elastic Weight Consolidation, while maintaining spatial complexity proportional to the number of model parameters.
HEALTH
arxiv.org

Can neural networks predict dynamics they have never seen?

Neural networks have proven to be remarkably successful for a wide range of complicated tasks, from image recognition and object detection to speech recognition and machine translation. One of their successes is the skill in prediction of future dynamics given a suitable training set of data. Previous studies have shown how Echo State Networks (ESNs), a subset of Recurrent Neural Networks, can successfully predict even chaotic systems for times longer than the Lyapunov time. This study shows that, remarkably, ESNs can successfully predict dynamical behavior that is qualitatively different from any behavior contained in the training set. Evidence is provided for a fluid dynamics problem where the flow can transition between laminar (ordered) and turbulent (disordered) regimes. Despite being trained on the turbulent regime only, ESNs are found to predict laminar behavior. Moreover, the statistics of turbulent-to-laminar and laminar-to-turbulent transitions are also predicted successfully, and the utility of ESNs in acting as an early-warning system for transition is discussed. These results are expected to be widely applicable to data-driven modelling of temporal behaviour in a range of physical, climate, biological, ecological and finance models characterized by the presence of tipping points and sudden transitions between several competing states.
COMPUTERS
arxiv.org

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods cannot achieve this goal in practice because the chain rule based gradient (w.r.t. structure parameters) estimators adopted by previous methods require dense computation at least in the backward propagation step. This paper solves this problem by proposing an efficient sparse training method with completely sparse forward and backward passes. We first formulate the training process as a continuous minimization problem under global sparsity constraint. We then separate the optimization process into two steps, corresponding to weight update and structure parameter update. For the former step, we use the conventional chain rule, which can be sparse via exploiting the sparse structure. For the latter step, instead of using the chain rule based gradient estimators as in existing methods, we propose a variance reduced policy gradient estimator, which only requires two forward passes without backward propagation, thus achieving completely sparse training. We prove that the variance of our gradient estimator is bounded. Extensive experimental results on real-world datasets demonstrate that compared to previous methods, our algorithm is much more effective in accelerating the training process, up to an order of magnitude faster.
CODING & PROGRAMMING
arxiv.org

Expert Human-Level Driving in Gran Turismo Sport Using Deep Reinforcement Learning with Image-based Representation

When humans play virtual racing games, they use visual environmental information on the game screen to understand the rules within the environments. In contrast, a state-of-the-art realistic racing game AI agent that outperforms human players does not use image-based environmental information but the compact and precise measurements provided by the environment. In this paper, a vision-based control algorithm is proposed and compared with human player performances under the same conditions in realistic racing scenarios using Gran Turismo Sport (GTS), which is known as a high-fidelity realistic racing simulator. In the proposed method, the environmental information that constitutes part of the observations in conventional state-of-the-art methods is replaced with feature representations extracted from game screen images. We demonstrate that the proposed method performs expert human-level vehicle control under high-speed driving scenarios even with game screen images as high-dimensional inputs. Additionally, it outperforms the built-in AI in GTS in a time trial task, and its score places it among the top 10% approximately 28,000 human players.
VIDEO GAMES
arxiv.org

Piezoelectric modulus prediction using machine learning and graph neural networks

Piezoelectric materials are widely used in all kinds of industries such as electric cigarette lighters, diesel engines and x-ray shutters. However, discovering high-performance and environmentally friendly (e.g. lead-free) piezoelectric materials is a difficult problem due to the sophisticated relationships from materials' composition/structures to the piezoelectric effect. Compared to other material properties such as formation energy, band gap, and bulk modulus, it is much more challenging to predict piezoelectric coefficients. Here, we propose a comprehensive study on designing and evaluating advanced machine learning models for predicting the piezoelectric modulus from materials' composition and/or structures. We train the prediction models based on extensive feature engineering combined with machine learning models (Random Forest and Support Vector Machines) and automated feature learning based on deep graph neural networks. Our SVM model with crystal structure feature outperform other methods. We also use this model to predict the piezoelectric coefficients for 12,680 materials from the Materials Project database and report the top 20 potential high performance piezoelectric materials.
CHEMISTRY
arxiv.org

Parallel Physics-Informed Neural Networks with Bidirectional Balance

As an emerging technology in deep learning, physics-informed neural networks (PINNs) have been widely used to solve various partial differential equations (PDEs) in engineering. However, PDEs based on practical considerations contain multiple physical quantities and complex initial boundary conditions, thus PINNs often returns incorrect results. Here we take heat transfer problem in multilayer fabrics as a typical example. It is coupled by multiple temperature fields with strong correlation, and the values of variables are extremely unbalanced among different dimensions. We clarify the potential difficulties of solving such problems by classic PINNs, and propose a parallel physics-informed neural networks with bidirectional balance. In detail, our parallel solving framework synchronously fits coupled equations through several multilayer perceptions. Moreover, we design two modules to balance forward process of data and back-propagation process of loss gradient. This bidirectional balance not only enables the whole network to converge stably, but also helps to fully learn various physical conditions in PDEs. We provide a series of ablation experiments to verify the effectiveness of the proposed methods. The results show that our approach makes the PINNs unsolvable problem solvable, and achieves excellent solving accuracy.
ENGINEERING
bitcoinmagazine.com

A Neural Network Is Developing Between Bitcoin Lightning Network Nodes

The below is a direct excerpt of Marty's Bent Issue #1109: "A neural network is developing between Lightning Nodes." Sign up for the newsletter here. Above is a visualization of the current Lightning Network topography made up of ~16,000 Lightning Nodes with ~140,000 payment channels opened between them. I don't know if I'm simply being duped by some visualization magic, but I can't help but think that we are all witnessing the emergence of something massive. Something that will have a profound effect on humanity that we can't quite comprehend yet. The topography that is emerging on the Lightning Network seems to be mimicking many things we find in nature as long time Bitcoin Core maintainer Wladimir van der Laan points out below.
ECONOMY

