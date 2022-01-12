ContributorsPublishersAdvertisers
Computers

On generalization bounds for deep networks based on loss surface implicit regularization

By Masaaki Imaizumi, Johannes Schmidt-Hieber
arxiv.org
 3 days ago

The classical statistical learning theory says that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. The implicit regularization induced by...

arxiv.org

Comments / 0

Related
The Independent

China builds ‘artificial moon’ to simulate low gravity inspired by a levitating frog

China has built an artificial moon research facility that simulates low-gravity environments, which will help it explore the satellite further.The facility, which will be officially launched in the coming months, can apparently make gravity “disappear” in an effect that can “last as long as you want” according to Li Ruilin, from the China University of Mining and Technology.The artificial moon itself is in a vacuum chamber, although it is only 60 centimetres in diameter compared to the 3,474.8 kilometres of the actual moon.The landscape is made up of rocks and dust like that on the Moon and is supported by...
ASTRONOMY
IN THIS ARTICLE
#Deep Learning#Deep Neural Networks#Regularization#Generalization#Sgd#Machine Learning#Lg
The Associated Press

Third-Party Analysis Illustrates Dramatic Capture and Cleaning of Exhaled Air with New Air-Clenz™ Computer Monitor To Help Curtail Spread of Airborne Respiratory Particles, including COVID

Air-Clenz Systems™ (Air-Clenz™) today announced the results of an independent third-party analysis of the effectiveness of the Air-Clenz Computer Monitor. These simulations showed that the Air-Clenz monitor quickly captured and cleaned 95+% of the user’s exhaled air. And when compared side-by-side to that of a conventional computer monitor, identically positioned, the difference was visibly dramatic.
PUBLIC HEALTH
arxiv.org

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

Text classification is a fundamental Natural Language Processing task that has a wide variety of applications, where deep learning approaches have produced state-of-the-art results. While these models have been heavily criticized for their black-box nature, their robustness to slight perturbations in input text has been a matter of concern. In this work, we carry out a data-focused study evaluating the impact of systematic practical perturbations on the performance of the deep learning based text classification models like CNN, LSTM, and BERT-based algorithms. The perturbations are induced by the addition and removal of unwanted tokens like punctuation and stop-words that are minimally associated with the final performance of the model. We show that these deep learning approaches including BERT are sensitive to such legitimate input perturbations on four standard benchmark datasets SST2, TREC-6, BBC News, and tweet_eval. We observe that BERT is more susceptible to the removal of tokens as compared to the addition of tokens. Moreover, LSTM is slightly more sensitive to input perturbations as compared to CNN based model. The work also serves as a practical guide to assessing the impact of discrepancies in train-test conditions on the final performance of models.
COMPUTERS
arxiv.org

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi. Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference and serves as on-chip memory storage for DNN weights. However, IMC's functional flexibility limitations and their impact on performance, energy, and area efficiency are not yet fully understood at the system level. To target practical end-to-end IoT applications, IMC arrays must be enclosed in heterogeneous programmable systems, introducing new system-level challenges which we aim at addressing in this work. We present a heterogeneous tightly-coupled clustered architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators. We benchmark the system on a highly heterogeneous workload such as the Bottleneck layer from a MobileNetV2, showing 11.5x performance and 9.5x energy efficiency improvements, compared to highly optimized parallel execution on the cores. Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator. Our results show that our solution, on the end-to-end inference of the MobileNetV2, is one order of magnitude better in terms of execution latency than existing programmable architectures and two orders of magnitude better than state-of-the-art heterogeneous solutions integrating in-memory computing analog cores.
COMPUTERS
arxiv.org

Deep neural networks for smooth approximation of physics with higher order and continuity B-spline base functions

This paper deals with the following important research question. Traditionally, the neural network employs non-linear activation functions concatenated with linear operators to approximate a given physical phenomenon. They "fill the space" with the concatenations of the activation functions and linear operators and adjust their coefficients to approximate the physical phenomena. We claim that it is better to "fill the space" with linear combinations of smooth higher-order B-splines base functions as employed by isogeometric analysis and utilize the neural networks to adjust the coefficients of linear combinations. In other words, the possibilities of using neural networks for approximating the B-spline base functions' coefficients and by approximating the solution directly are evaluated. Solving differential equations with neural networks has been proposed by Maziar Raissi et al. in 2017 by introducing Physics-informed Neural Networks (PINN), which naturally encode underlying physical laws as prior information. Approximation of coefficients using a function as an input leverages the well-known capability of neural networks being universal function approximators. In essence, in the PINN approach the network approximates the value of the given field at a given point. We present an alternative approach, where the physcial quantity is approximated as a linear combination of smooth B-spline basis functions, and the neural network approximates the coefficients of B-splines. This research compares results from the DNN approximating the coefficients of the linear combination of B-spline basis functions, with the DNN approximating the solution directly. We show that our approach is cheaper and more accurate when approximating smooth physical fields.
SCIENCE
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping

We propose a novel optimization framework that crops a given image based on user description and aesthetics. Unlike existing image cropping methods, where one typically trains a deep network to regress to crop parameters or cropping actions, we propose to directly optimize for the cropping parameters by repurposing pre-trained networks on image captioning and aesthetic tasks, without any fine-tuning, thereby avoiding training a separate network. Specifically, we search for the best crop parameters that minimize a combined loss of the initial objectives of these networks. To make the optimization table, we propose three strategies: (i) multi-scale bilinear sampling, (ii) annealing the scale of the crop region, therefore effectively reducing the parameter space, (iii) aggregation of multiple optimization results. Through various quantitative and qualitative evaluations, we show that our framework can produce crops that are well-aligned to intended user descriptions and aesthetically pleasing.
AGRICULTURE
arxiv.org

Deep Domain Adversarial Adaptation for Photon-efficient Imaging Based on Spatiotemporal Inception Network

In single-photon LiDAR, photon-efficient imaging captures the 3D structure of a scene by only several detected signal photons per pixel. The existing deep learning models for this task are trained on simulated datasets, which poses the domain shift challenge when applied to realistic scenarios. In this paper, we propose a spatiotemporal inception network (STIN) for photon-efficient imaging, which is able to precisely predict the depth from a sparse and high-noise photon counting histogram by fully exploiting spatial and temporal information. Then the domain adversarial adaptation frameworks, including domain-adversarial neural network and adversarial discriminative domain adaptation, are effectively applied to STIN to alleviate the domain shift problem for realistic applications. Comprehensive experiments on the simulated data generated from the NYU~v2 and the Middlebury datasets demonstrate that STIN outperforms the state-of-the-art models at low signal-to-background ratios from 2:10 to 2:100. Moreover, experimental results on the real-world dataset captured by the single-photon imaging prototype show that the STIN with domain adversarial training achieves better generalization performance compared with the state-of-the-arts as well as the baseline STIN trained by simulated data.
SCIENCE
arxiv.org

Nonlocal Kernel Network (NKN): a Stable and Resolution-Independent Deep Neural Network

Neural operators have recently become popular tools for designing solution maps between function spaces in the form of neural networks. Differently from classical scientific machine learning approaches that learn parameters of a known partial differential equation (PDE) for a single instance of the input parameters at a fixed resolution, neural operators approximate the solution map of a family of PDEs. Despite their success, the uses of neural operators are so far restricted to relatively shallow neural networks and confined to learning hidden governing laws. In this work, we propose a novel nonlocal neural operator, which we refer to as nonlocal kernel network (NKN), that is resolution independent, characterized by deep neural networks, and capable of handling a variety of tasks such as learning governing equations and classifying images. Our NKN stems from the interpretation of the neural network as a discrete nonlocal diffusion reaction equation that, in the limit of infinite layers, is equivalent to a parabolic nonlocal equation, whose stability is analyzed via nonlocal vector calculus. The resemblance with integral forms of neural operators allows NKNs to capture long-range dependencies in the feature space, while the continuous treatment of node-to-node interactions makes NKNs resolution independent. The resemblance with neural ODEs, reinterpreted in a nonlocal sense, and the stable network dynamics between layers allow for generalization of NKN's optimal parameters from shallow to deep networks. This fact enables the use of shallow-to-deep initialization techniques. Our tests show that NKNs outperform baseline methods in both learning governing equations and image classification tasks and generalize well to different resolutions and depths.
CODING & PROGRAMMING
arxiv.org

Effect of Prior-based Losses on Segmentation Performance: A Benchmark

Today, deep convolutional neural networks (CNNs) have demonstrated state-of-the-art performance for medical image segmentation, on various imaging modalities and tasks. Despite early success, segmentation networks may still generate anatomically aberrant segmentations, with holes or inaccuracies near the object boundaries. To enforce anatomical plausibility, recent research studies have focused on incorporating prior knowledge such as object shape or boundary, as constraints in the loss function. Prior integrated could be low-level referring to reformulated representations extracted from the ground-truth segmentations, or high-level representing external medical information such as the organ's shape or size. Over the past few years, prior-based losses exhibited a rising interest in the research field since they allow integration of expert knowledge while still being architecture-agnostic. However, given the diversity of prior-based losses on different medical imaging challenges and tasks, it has become hard to identify what loss works best for which dataset. In this paper, we establish a benchmark of recent prior-based losses for medical image segmentation. The main objective is to provide intuition onto which losses to choose given a particular task or dataset. To this end, four low-level and high-level prior-based losses are selected. The considered losses are validated on 8 different datasets from a variety of medical image segmentation challenges including the Decathlon, the ISLES and the WMH challenge. Results show that whereas low-level prior-based losses can guarantee an increase in performance over the Dice loss baseline regardless of the dataset characteristics, high-level prior-based losses can increase anatomical plausibility as per data characteristics.
SCIENCE
arxiv.org

Distinction and quadratic base change for regular supercuspidal representations

In this article, we study Prasad's conjecture for regular supercuspidal representations based on the machinery developed by Hakim and Murnaghan to study distinguished representations, and the fundamental work of Kaletha on parameterization of regular supercuspidal representations. For regular supercuspidal representations, we give some new interpretations of the numerical quantities appearing in Prasad's formula, and reduce the proof to the case of tori. The proof of Prasad's conjecture then reduces to a comparison of various quadratic characters appearing naturally in the above process. We also have some new observations on these characters and study the relation between them in detail. For some particular examples, we show the coincidence of these characters, which gives a new purely local proof of Prasad's conjecture for regular supercuspidal representations of these groups. We also prove Prasad's conjecture for regular supercuspidal representations of G(E), when E/F is unramified and G is a general quasi-split reductive group.
MATHEMATICS
arxiv.org

Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network

Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply a fully convolutional network (FCN) to solve this problem, which was originally designed for semantic segmentation of images. The rectangular shape of the RIS and the spatial correlation of channels with adjacent RIS antennas due to the short distance between them encourage us to apply it for the RIS configuration. We design a set of channel features that includes both cascaded channels via the RIS and the direct channel. In the base station (BS), the differentiable minimum mean squared error (MMSE) precoder is used for pretraining and the weighted minimum mean squared error (WMMSE) precoder is then applied for fine-tuning, which is nondifferentiable, more complex, but achieves a better performance. Evaluation results show that the proposed solution has higher performance and allows for a faster evaluation than the baselines. Hence it scales better to a large number of antennas, advancing the RIS one step closer to practical deployment.
COMPUTERS
arxiv.org

VGAER: graph neural network reconstruction based community detection

Community detection is a fundamental and important issue in network science, but there are only a few community detection algorithms based on graph neural networks, among which unsupervised algorithms are almost blank. By fusing the high-order modularity information with network features, this paper proposes a Variational Graph AutoEncoder Reconstruction based community detection VGAER for the first time, and gives its non-probabilistic version. They do not need any prior information. We have carefully designed corresponding input features, decoder, and downstream tasks based on the community detection task and these designs are concise, natural, and perform well (NMI values under our design are improved by 59.1% - 565.9%). Based on a series of experiments with wide range of datasets and advanced methods, VGAER has achieved superior performance and shows strong competitiveness and potential with a simpler design. Finally, we report the results of algorithm convergence analysis and t-SNE visualization, which clearly depicted the stable performance and powerful network modularity ability of VGAER. Our codes are available at this https URL.
CODING & PROGRAMMING
arxiv.org

Deep Learning-based Predictive Control of Battery Management for Frequency Regulation

This paper proposes a deep learning-based optimal battery management scheme for frequency regulation (FR) by integrating model predictive control (MPC), supervised learning (SL), reinforcement learning (RL), and high-fidelity battery models. By taking advantage of deep neural networks (DNNs), the derived DNN-approximated policy is computationally efficient in online implementation. The design procedure of the proposed scheme consists of two sequential processes: (1) the SL process, in which we first run a simulation with an MPC embedding a low-fidelity battery model to generate a training data set, and then, based on the generated data set, we optimize a DNN-approximated policy using SL algorithms; and (2) the RL process, in which we utilize RL algorithms to improve the performance of the DNN-approximated policy by balancing short-term economic incentives and long-term battery degradation. The SL process speeds up the subsequent RL process by providing a good initialization. By utilizing RL algorithms, one prominent property of the proposed scheme is that it can learn from the data generated by simulating the FR policy on the high-fidelity battery simulator to adjust the DNN-approximated policy, which is originally based on low-fidelity battery model. A case study using real-world data of FR signals and prices is performed. Simulation results show that, compared to conventional MPC schemes, the proposed deep learning-based scheme can effectively achieve higher economic benefits of FR participation while maintaining lower online computational cost.
ENGINEERING
arxiv.org

Compression-Resistant Backdoor Attack against Deep Neural Networks

In recent years, many backdoor attacks based on training data poisoning have been proposed. However, in practice, those backdoor attacks are vulnerable to image compressions. When backdoor instances are compressed, the feature of specific backdoor trigger will be destroyed, which could result in the backdoor attack performance deteriorating. In this paper, we propose a compression-resistant backdoor attack based on feature consistency training. To the best of our knowledge, this is the first backdoor attack that is robust to image compressions. First, both backdoor images and their compressed versions are input into the deep neural network (DNN) for training. Then, the feature of each image is extracted by internal layers of the DNN. Next, the feature difference between backdoor images and their compressed versions are minimized. As a result, the DNN treats the feature of compressed images as the feature of backdoor images in feature space. After training, the backdoor attack against DNN is robust to image compression. Furthermore, we consider three different image compressions (i.e., JPEG, JPEG2000, WEBP) in feature consistency training, so that the backdoor attack is robust to multiple image compression algorithms. Experimental results demonstrate the effectiveness and robustness of the proposed backdoor attack. When the backdoor instances are compressed, the attack success rate of common backdoor attack is lower than 10%, while the attack success rate of our compression-resistant backdoor is greater than 97%. The compression-resistant attack is still robust even when the backdoor images are compressed with low compression quality. In addition, extensive experiments have demonstrated that, our compression-resistant backdoor attack has the generalization ability to resist image compression which is not used in the training process.
COMPUTERS
arxiv.org

Formant Tracking Using Quasi-Closed Phase Forward-Backward Linear Prediction Analysis and Deep Neural Networks

Formant tracking is investigated in this study by using trackers based on dynamic programming (DP) and deep neural nets (DNNs). Using the DP approach, six formant estimation methods were first compared. The six methods include linear prediction (LP) algorithms, weighted LP algorithms and the recently developed quasi-closed phase forward-backward (QCP-FB) method. QCP-FB gave the best performance in the comparison. Therefore, a novel formant tracking approach, which combines benefits of deep learning and signal processing based on QCP-FB, was proposed. In this approach, the formants predicted by a DNN-based tracker from a speech frame are refined using the peaks of the all-pole spectrum computed by QCP-FB from the same frame. Results show that the proposed DNN-based tracker performed better both in detection rate and estimation error for the lowest three formants compared to reference formant trackers. Compared to the popular Wavesurfer, for example, the proposed tracker gave a reduction of 29%, 48% and 35% in the estimation error for the lowest three formants, respectively.
COMPUTERS
arxiv.org

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks. Specifically, we consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under gradient flow, including the paths which converge to one of the saddle points or the original point. We also show specific convergence rates of trajectories that converge to the global minimizer by stages. To achieve these results, this paper mainly extends the machinery in Eftekhari's work to provably identify the rank-stable set and the global minimizer convergent set. We also give specific examples to show the necessity of our definitions. Crucially, as far as we know, our results appear to be the first to give a non-local global analysis of linear neural networks from arbitrary initialized points, rather than the lazy training regime which has dominated the literature of neural networks, and restricted benign initialization in Eftekhari's work. We also note that extending our results to general linear networks without one hidden neuron assumption remains a challenging open problem.
COMPUTERS

Comments / 0

Community Policy