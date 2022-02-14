ContributorsPublishersAdvertisers
Benign Overfitting in Two-layer Convolutional Neural Networks

By Yuan Cao, Zixiang Chen, Mikhail Belkin, Quanquan Gu
arxiv.org
 2 days ago

Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from...

arxiv.org

A rigorous stochastic theory for spike pattern formation in recurrent neural networks with arbitrary connection topologies

Cortical networks exhibit synchronized activity which often occurs in spontaneous events in the form of spike avalanches. Since synchronization has been causally linked to central aspects of brain function such as selective signal processing and integration of stimulus information, participating in an avalanche is a form of a transient synchrony which temporarily creates neural assemblies and hence might especially be useful for implementing flexible information processing. For understanding how assembly formation supports neural computation, it is therefore essential to establish a comprehensive theory of how network structure and dynamics interact to generate specific avalanche patterns and sequences. Here we derive exact avalanche distributions for a finite network of recurrently coupled spiking neurons with arbitrary non-negative interaction weights, which is made possible by formally mapping the model dynamics to a linear, random dynamical system on the $N$-torus and by exploiting self-similarities inherent in the phase space. We introduce the notion of relative unique ergodicity and show that this property is guaranteed if the system is driven by a time-invariant Bernoulli process. This approach allows us not only to provide closed-form analytical expressions for avalanche size, but also to determine the detailed set(s) of units firing in an avalanche (i.e., the avalanche assembly). The underlying dependence between network structure and dynamics is made transparent by expressing the distribution of avalanche assemblies in terms of the induced graph Laplacian. We explore analytical consequences of this dependence and provide illustrating examples.
SCIENCE
Nature.com

A reusable neural network pipeline for unidirectional fiber segmentation

Fiber-reinforced ceramic-matrix composites are advanced, temperature resistant materials with applications in aerospace engineering. Their analysis involvesÂ the detectionÂ and separation of fibers, embedded in a fiber bed, from an imaged sample. Currently, this is mostly done using semi-supervised techniques. Here, we present an open, automated computational pipeline to detect fibers from a tomographically reconstructed X-ray volume. We apply our pipeline to a non-trivial dataset by Larson et al. To separate the fibers in these samples, we tested four different architectures of convolutional neural networks. When comparing our neural network approach to a semi-supervised one, we obtained Dice and Matthews coefficients reaching up to 98%, showing that these automated approaches can match human-supervised methods, in some cases separating fibers that human-curated algorithms could not find. The software written for this project is open source, released under a permissive license, and can be freely adapted and re-used in other domains.
INDUSTRY
arxiv.org

Removing Distortion Effects in Music Using Deep Neural Networks

Johannes Imort, Giorgio Fabbro, Marco A. Martínez Ramírez, Stefan Uhlich, Yuichiro Koyama, Yuki Mitsufuji. Audio effects are an essential element in the context of music production, and therefore, modeling analog audio effects has been extensively researched for decades using system-identification methods, circuit simulation, and recently, deep learning. However, only few works tackled the reconstruction of signals that were processed using an audio effect unit. Given the recent advances in music source separation and automatic mixing, the removal of audio effects could facilitate an automatic remixing system. This paper focuses on removing distortion and clipping applied to guitar tracks for music production while presenting a comparative investigation of different deep neural network (DNN) architectures on this task. We achieve exceptionally good results in distortion removal using DNNs for effects that superimpose the clean signal to the distorted signal, while the task is more challenging if the clean signal is not superimposed. Nevertheless, in the latter case, the neural models under evaluation surpass one state-of-the-art declipping system in terms of source-to-distortion ratio, leading to better quality and faster inference.
COMPUTERS
arxiv.org

Large-scale Personalized Video Game Recommendation via Social-aware Contextualized Graph Neural Network

Because of the large number of online games available nowadays, online game recommender systems are necessary for users and online game platforms. The former can discover more potential online games of their interests, and the latter can attract users to dwell longer in the platform. This paper investigates the characteristics of user behaviors with respect to the online games on the Steam platform. Based on the observations, we argue that a satisfying recommender system for online games is able to characterize: personalization, game contextualization and social connection. However, simultaneously solving all is rather challenging for game recommendation. Firstly, personalization for game recommendation requires the incorporation of the dwelling time of engaged games, which are ignored in existing methods. Secondly, game contextualization should reflect the complex and high-order properties of those relations. Last but not least, it is problematic to use social connections directly for game recommendations due to the massive noise within social connections. To this end, we propose a Social-aware Contextualized Graph Neural Recommender System (SCGRec), which harnesses three perspectives to improve game recommendation. We conduct a comprehensive analysis of users' online game behaviors, which motivates the necessity of handling those three characteristics in the online game recommendation.
VIDEO GAMES
arxiv.org

Fortnet, a software package for training Behler-Parrinello neural networks

A new, open source, parallel, stand-alone software package (Fortnet) has been developed, which implements Behler-Parrinello neural networks. It covers the entire workflow from feature generation to the evaluation of generated potentials, coupled with higher-level analysis such as the analytic calculation of atomic forces. The functionality of the software package is demonstrated by driving the training for the fitted correction functions of the density functional tight binding (DFTB) method, which are commonly used to compensate the inaccuracies resulting from the DFTB approximations to the Kohn-Sham Hamiltonian. The usual two-body form of those correction functions limits the transferability of the parameterizations between very different structural environments. The recently introduced DFTB+ANN approach strives to lift these limitations by combining DFTB with a near-sighted artificial neural network (ANN). After investigating various approaches, we have found the combination of DFTB with an ANN acting on-top of some baseline correction functions (delta learning) the most promising one. It allowed to introduce many-body corrections on top of two-body parametrizations, while excellent transferability to chemical environments with deviating energetics could be demonstrated.
SOFTWARE
arxiv.org

Improving Fraud detection via Hierarchical Attention-based Graph Neural Network

Graph neural networks (GNN) have emerged as a powerful tool for fraud detection tasks, where fraudulent nodes are identified by aggregating neighbor information via different relations. To get around such detection, crafty fraudsters resort to camouflage via connecting to legitimate users (i.e., relation camouflage) or providing seemingly legitimate feedbacks (i.e., feature camouflage). A wide-spread solution reinforces the GNN aggregation process with neighbor selectors according to original node features. This method may carry limitations when identifying fraudsters not only with the relation camouflage, but with the feature camouflage making them hard to distinguish from their legitimate neighbors. In this paper, we propose a Hierarchical Attention-based Graph Neural Network (HA-GNN) for fraud detection, which incorporates weighted adjacency matrices across different relations against camouflage. This is motivated in the Relational Density Theory and is exploited for forming a hierarchical attention-based graph neural network. Specifically, we design a relation attention module to reflect the tie strength between two nodes, while a neighborhood attention module to capture the long-range structural affinity associated with the graph. We generate node embeddings by aggregating information from local/long-range structures and original node features. Experiments on three real-world datasets demonstrate the effectiveness of our model over the state-of-the-arts.
arxiv.org

An ASP approach for reasoning on neural networks under a finitely many-valued semantics for weighted conditional knowledge bases

Weighted knowledge bases for description logics with typicality have been recently considered under a "concept-wise" multipreference semantics (in both the two-valued and fuzzy case), as the basis of a logical semantics of MultiLayer Perceptrons (MLPs). In this paper we consider weighted conditional ALC knowledge bases with typicality in the finitely many-valued case, through three different semantic constructions, based on coherent, faithful and phi-coherent interpretations. For the boolean fragment LC of ALC we exploit ASP and "asprin" for reasoning with the concept-wise multipreference entailment under a phi-coherent semantics, suitable to characterize the stationary states of MLPs. As a proof of concept, we experiment the proposed approach for checking properties of trained MLPs.
COMPUTERS
arxiv.org

Enhancing Organ at Risk Segmentation with Improved Deep Neural Networks

Organ at risk (OAR) segmentation is a crucial step for treatment planning and outcome determination in radiotherapy treatments of cancer patients. Several deep learning based segmentation algorithms have been developed in recent years, however, U-Net remains the de facto algorithm designed specifically for biomedical image segmentation and has spawned many variants with known weaknesses. In this study, our goal is to present simple architectural changes in U-Net to improve its accuracy and generalization properties. Unlike many other available studies evaluating their algorithms on single center data, we thoroughly evaluate several variations of U-Net as well as our proposed enhanced architecture on multiple data sets for an extensive and reliable study of the OAR segmentation problem. Our enhanced segmentation model includes (a)architectural changes in the loss function, (b)optimization framework, and (c)convolution type. Testing on three publicly available multi-object segmentation data sets, we achieved an average of 80% dice score compared to the baseline U-Net performance of 63%.
HEALTH
arxiv.org

Predicting Fuel Consumption in Power Generation Plants using Machine Learning and Neural Networks

Gabin Maxime Nguegnang, Marcellin Atemkeng, Theophilus Ansah-Narh, Rockefeller Rockefeller, Gabin Maxime Nguegnang, Marco Andrea Garuti. The instability of power generation from national grids has led industries (e.g., telecommunication) to rely on plant generators to run their businesses. However, these secondary generators create additional challenges such as fuel leakages in and out of the system and perturbations in the fuel level gauges. Consequently, telecommunication operators have been involved in a constant need for fuel to supply diesel generators. With the increase in fuel prices due to socio-economic factors, excessive fuel consumption and fuel pilferage become a problem, and this affects the smooth run of the network companies. In this work, we compared four machine learning algorithms (i.e. Gradient Boosting, Random Forest, Neural Network, and Lasso) to predict the amount of fuel consumed by a power generation plant. After evaluating the predictive accuracy of these models, the Gradient Boosting model out-perform the other three regressor models with the highest Nash efficiency value of 99.1%.
ENERGY INDUSTRY
arxiv.org

Maximum likelihood reconstruction of water Cherenkov events with deep generative neural networks

Mo Jia, Karan Kumar, Liam S. Mackey, Alexander Putra, Cristovao Vilela, Michael J. Wilking, Junjie Xia, Chiaki Yanagisawa, Karan Yang. Large water Cherenkov detectors have shaped our current knowledge of neutrino physics and nucleon decay, and will continue to do so in the foreseeable future. These highly capable detectors allow for directional and topological, as well as calorimetric information to be extracted from signals on their photosensors. The current state-of-the-art approach to water Cherenkov reconstruction relies on maximum-likelihood estimation, with several simplifying assumptions employed to make the problem tractable. In this paper, we describe neural networks that produce probability density functions for the signals at each photosensor, given a set of inputs that characterizes a particle in the detector. The neural networks we propose allow for likelihood-based approaches to event reconstruction with significantly fewer assumptions compared to traditional methods, and are thus expected to improve on the current performance of water Cherenkov detectors.
PHYSICS
arxiv.org

Benign-Overfitting in Conditional Average Treatment Effect Prediction with Linear Regression

We study the benign overfitting theory in the prediction of the conditional average treatment effect (CATE), with linear regression models. As the development of machine learning for causal inference, a wide range of large-scale models for causality are gaining attention. One problem is that suspicions have been raised that the large-scale models are prone to overfitting to observations with sample selection, hence the large models may not be suitable for causal prediction. In this study, to resolve the suspicious, we investigate on the validity of causal inference methods for overparameterized models, by applying the recent theory of benign overfitting (Bartlett et al., 2020). Specifically, we consider samples whose distribution switches depending on an assignment rule, and study the prediction of CATE with linear models whose dimension diverges to infinity. We focus on two methods: the T-learner, which based on a difference between separately constructed estimators with each treatment group, and the inverse probability weight (IPW)-learner, which solves another regression problem approximated by a propensity score. In both methods, the estimator consists of interpolators that fit the samples perfectly. As a result, we show that the T-learner fails to achieve the consistency except the random assignment, while the IPW-learner converges the risk to zero if the propensity score is known. This difference stems from that the T-learner is unable to preserve eigenspaces of the covariances, which is necessary for benign overfitting in the overparameterized setting. Our result provides new insights into the usage of causal inference methods in the overparameterizated setting, in particular, doubly robust estimators.
SCIENCE
arxiv.org

Distributed State Estimation with Deep Neural Networks for Uncertain Nonlinear Systems under Event-Triggered Communication

Distributed state estimation is examined for a sensor network tasked with reconstructing a system's state through the use of a distributed and event-triggered observer. Each agent in the sensor network employs a deep neural network (DNN) to approximate the uncertain nonlinear dynamics of the system, which is trained using a multiple timescale approach. Specifically, the outer weights of each DNN are updated online using a Lyapunov-based gradient descent update law, while the inner weights and biases are trained offline using a supervised learning method and collected input-output data. The observer utilizes event-triggered communication to promote the efficient use of network resources. A nonsmooth Lyapunov analysis shows the distributed event-triggered observer has a uniformly ultimately bounded state reconstruction error. A simulation study is provided to validate the result and demonstrate the performance improvements afforded by the DNNs.
COMPUTERS
arxiv.org

Spectroscopic Studies of Type Ia Supernovae Using LSTM Neural Networks

We present a data-driven method based on Long Short-Term Memory (LSTM) neural networks to analyze spectral time series of Type Ia Supernovae (SNe Ia). The dataset includes 3091 spectra from 361 individual SNe Ia. The method allows for accurate reconstruction of the spectral sequence of an SN Ia based on a single observed spectrum around maximum light. The precision of the spectral reconstruction increases with more spectral time coverages, but significant benefit of multiple epoch data at around optical maximum is only evident for observations separated by more than a week. The method shows great power in extracting the spectral information of SNe Ia, and suggests that the most critical information of an SN Ia can be derived from a single spectrum around the optical maximum. The algorithm we have developed is important for the planning of spectroscopic follow-up observations of future supernova surveys with the LSST/Rubin and the WFIRST/Roman telescopes.
ASTRONOMY
arxiv.org

Two-Step Spike Encoding Scheme and Architecture for Highly Sparse Spiking-Neural-Network

This paper proposes a two-step spike encoding scheme, which consists of the source encoding and the process encoding for a high energy-efficient spiking-neural-network (SNN) acceleration. The eigen-train generation and its superposition generate spike trains which show high accuracy with low spike ratio. Sparsity boosting (SB) and spike generation skipping (SGS) reduce the amount of operations for SNN. Time shrinking multi-level encoding (TS-MLE) compresses the number of spikes in a train along time axis, and spike-level clock skipping (SLCS) decreases the processing time. Eigen-train generation achieves 90.3% accuracy, the same accuracy of CNN, under the condition of 4.18% spike ratio for CIFAR-10 classification. SB reduces spike ratio by 0.49x with only 0.1% accuracy loss, and the SGS reduces the spike ratio by 20.9% with 0.5% accuracy loss. TS-MLE and SLCS increases the throughput of SNN by 2.8x while decreasing the hardware resource for spike generator by 75% compared with previous generators.
ARCHITECTURE
arxiv.org

Optimising hadronic collider simulations using amplitude neural networks

Precision phenomenological studies of high-multiplicity scattering processes at collider experiments present a substantial theoretical challenge and are vitally important ingredients in experimental measurements. Machine learning technology has the potential to dramatically optimise simulations for complicated final states. We investigate the use of neural networks to approximate matrix elements, studying the case of loop-induced diphoton production through gluon fusion. We train neural network models on one-loop amplitudes from the NJet C++ library and interface them with the Sherpa Monte Carlo event generator to provide the matrix element within a realistic hadronic collider simulation. Computing some standard observables with the models and comparing to conventional techniques, we find excellent agreement in the distributions and a reduced total simulation time by a factor of thirty.
SCIENCE
arxiv.org

Diffractive deep neural network based adaptive optics scheme for vortex beam in oceanic turbulence

Vortex beam carrying orbital angular momentum (OAM) is disturbed by oceanic turbulence (OT) when propagating in underwater wireless optical communication (UWOC) system. Adaptive optics (AO) is used to compensate for distortion and improve the performance of the UWOC system. In this work, we propose a diffractive deep neural network (DDNN) based AO scheme to compensate for the distortion caused by OT, where the DDNN is trained to obtain the mapping between the distortion intensity distribution of the vortex beam and its corresponding phase screen representating OT. The intensity pattern of the distorted vortex beam obtained in the experiment is input to the DDNN model, and the predicted phase screen can be used to compensate the distortion in real time. The experiment results show that the proposed scheme can extract quickly the characteristics of the intensity pattern of the distorted vortex beam, and output accurately the predicted phase screen. The mode purity of the compensated vortex beam is significantly improved, even with a strong OT. Our scheme may provide a new avenue for AO techniques, and is expected to promote the communication quality of UWOC system.
SCIENCE
arxiv.org

SUMO: Advanced sleep spindle identification with neural networks

Sleep spindles are neurophysiological phenomena that appear to be linked to memory formation and other functions of the central nervous system, and that can be observed in electroencephalographic recordings (EEG) during sleep. Manually identified spindle annotations in EEG recordings suffer from substantial intra- and inter-rater variability, even if raters have been highly trained, which reduces the reliability of spindle measures as a research and diagnostic tool. The Massive Online Data Annotation (MODA) project has recently addressed this problem by forming a consensus from multiple such rating experts, thus providing a corpus of spindle annotations of enhanced quality. Based on this dataset, we present a U-Net-type deep neural network model to automatically detect sleep spindles. Our model's performance exceeds that of the state-of-the-art detector and of most experts in the MODA dataset. We observed improved detection accuracy in subjects of all ages, including older individuals whose spindles are particularly challenging to detect reliably. Our results underline the potential of automated methods to do repetitive cumbersome tasks with super-human performance.
SCIENCE
arxiv.org

Fixed-Point Code Synthesis For Neural Networks

Over the last few years, neural networks have started penetrating safety critical systems to take decisions in robots, rockets, autonomous driving car, etc. A problem is that these critical systems often have limited computing resources. Often, they use the fixed-point arithmetic for its many advantages (rapidity, compatibility with small memory devices.) In this article, a new technique is introduced to tune the formats (precision) of already trained neural networks using fixed-point arithmetic, which can be implemented using integer operations only. The new optimized neural network computes the output with fixed-point numbers without modifying the accuracy up to a threshold fixed by the user. A fixed-point code is synthesized for the new optimized neural network ensuring the respect of the threshold for any input vector belonging the range [xmin, xmax] determined during the analysis. From a technical point of view, we do a preliminary analysis of our floating neural network to determine the worst cases, then we generate a system of linear constraints among integer variables that we can solve by linear programming. The solution of this system is the new fixed-point format of each neuron. The experimental results obtained show the efficiency of our method which can ensure that the new fixed-point neural network has the same behavior as the initial floating-point neural network.
CODING & PROGRAMMING
arxiv.org

Robust Training of Neural Networks using Scale Invariant Architectures

In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models. However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, i.e. the scale of parameter doesn't affect the output of the network, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\tfrac{2\lambda}{\eta}}$, where $\eta$ is learning rate and $\lambda$ is weight decay. We show that this general approach is robust to rescaling of parameter and loss by proving that its convergence only depends logarithmically on the scale of initialization and loss, whereas the standard SGD might not even converge for many initializations. Following our recipe, we design a scale invariant version of BERT, called SIBERT, which when trained simply by vanilla SGD achieves performance comparable to BERT trained by adaptive methods like Adam on downstream tasks.
CODING & PROGRAMMING
arxiv.org

Dynamic Virtual Network Embedding Algorithm based on Graph Convolution Neural Network and Reinforcement Learning

Network virtualization (NV) is a technology with broad application prospects. Virtual network embedding (VNE) is the core orientation of VN, which aims to provide more flexible underlying physical resource allocation for user function requests. The classical VNE problem is usually solved by heuristic method, but this method often limits the flexibility of the algorithm and ignores the time limit. In addition, the partition autonomy of physical domain and the dynamic characteristics of virtual network request (VNR) also increase the difficulty of VNE. This paper proposed a new type of VNE algorithm, which applied reinforcement learning (RL) and graph neural network (GNN) theory to the algorithm, especially the combination of graph convolutional neural network (GCNN) and RL algorithm. Based on a self-defined fitness matrix and fitness value, we set up the objective function of the algorithm implementation, realized an efficient dynamic VNE algorithm, and effectively reduced the degree of resource fragmentation. Finally, we used comparison algorithms to evaluate the proposed method. Simulation experiments verified that the dynamic VNE algorithm based on RL and GCNN has good basic VNE characteristics. By changing the resource attributes of physical network and virtual network, it can be proved that the algorithm has good flexibility.
COMPUTERS

