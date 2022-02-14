ContributorsPublishersAdvertisers
KNIFE: Kernelized-Neural Differential Entropy Estimation

By Georg Pichler, Pierre Colombo, Malik Boudiaf, Gunther Koliander, Pablo Piantanida
 2 days ago

Mutual Information (MI) has been widely used as a loss regularizer for training neural networks. This has been particularly effective when learn disentangled or compressed representations of high dimensional data. However, differential entropy (DE), another fundamental measure of information, has not found widespread...

arxiv.org

Online Deep Neural Network for Optimization in Wireless Communications

Recently, deep neural network (DNN) has been widely adopted in the design of intelligent communication systems thanks to its strong learning ability and low testing complexity. However, most current offline DNN-based methods still suffer from unsatisfactory performance, limited generalization ability, and poor interpretability. In this article, we propose an online DNN-based approach to solve general optimization problems in wireless communications, where a dedicated DNN is trained for each data sample. By treating the optimization variables and the objective function as network parameters and loss function, respectively, the optimization problem can be solved equivalently through network training. Thanks to the online optimization nature and meaningful network parameters, the proposed approach owns strong generalization ability and interpretability, while its superior performance is demonstrated through a practical example of joint beamforming in intelligent reflecting surface (IRS)-aided multi-user multiple-input multiple-output (MIMO) systems. Simulation results show that the proposed online DNN outperforms conventional offline DNN and state-of-the-art iterative optimization algorithm, but with low complexity.
SOFTWARE
arxiv.org

Entanglement entropy and flow in two dimensional QCD:parton and string duality

We discuss quantum entanglement between fast and slow degrees of freedom, in a two dimensional (2D) large $N_c$ gauge theory with Dirac quarks, quantized on the light front. Using the 't Hooft wave functions, we construct the reduced density matrix for an interval in the momentum fraction $x$-space, and calculate its von Neumann entropy in terms of structure functions, that are measured by DIS on mesons (hadrons in general). We found that the entropy is bounded by an area law with logarithmic divergences, proportional to the rapidity of the meson. The evolution of the entanglement entropy with rapidity, is fixed by the cumulative singlet PDF, and bounded from above by a Kolmogorov-Sinai entropy of 1. At low-$x$, the entanglement exhibits an asymptotic expansion, similar to the forward meson-meson scattering amplitude in the Regge limit. The evolution of the entanglement entropy in parton-$x$ per unit rapidity, measures the meson singlet PDF. The re-summed entanglement entropy along the single meson Regge trajectory, is string-like. We suggest that its extension to multi-meson states, models DIS scattering on a large 2D $^\prime$nucleus$^\prime$. The result, is a large rate of change of the entanglement entropy with rapidity, that matches the current Bekenstein-Bremermann bound for maximum quantum information flow. This mechanism may be at the origin of the large entropy deposition and rapid thermalization, reported in current heavy ion colliders, and may extend to future electron-ion colliders.
PHYSICS
arxiv.org

On entropy, entropy-like quantities, and applications

This is a review on entropy in various fields of mathematics and science. Its scope is to convey a unified vision of the classical as well as some newer entropy notions to a broad audience with an intermediate background in dynamical systems and ergodic theory. Due to the breadth and depth of the subject, we have opted for a compact exposition whose contents are a compromise between conceptual import and instrumental relevance. The intended technical level and the space limitation born furthermore upon the final selection of the topics, which cover the three items named in the title. Specifically, the first part is devoted to the avatars of entropy in the traditional contexts: many particle physics, information theory, and dynamical systems. This chronological order helps to present the materials in a didactic manner. The axiomatic approach will be also considered at this stage to show that, quite remarkably, the essence of entropy can be encapsulated in a few basic properties. Inspired by the classical entropies, further akin quantities have been proposed in the course of time, mostly aimed at specific needs. A common denominator of those addressed in the second part of this review is their major impact on research. The final part shows that, along with its profound role in the theory, entropy has interesting practical applications beyond information theory and communications technology. For this sake we preferred examples from applied mathematics, although there are certainly nice applications in, say, physics, computer science and even social sciences. This review concludes with a representative list of references.
MATHEMATICS
arxiv.org

Robust Dynamic State Estimator of Integrated Energy Systems based on Natural Gas Partial Differential Equations

The reliability and precision of dynamic database are vital for the optimal operating and global control of integrated energy systems. One of the effective ways to obtain the accurate states is state estimations. A novel robust dynamic state estimation methodology for integrated natural gas and electric power systems is proposed based on Kalman filter. To take full advantage of measurement redundancies and predictions for enhancing the estimating accuracy, the dynamic state estimation model coupling gas and power systems by gas turbine units is established. The exponential smoothing technique and gas physical model are integrated in Kalman filter. Additionally, the time-varying scalar matrix is proposed to conquer bad data in Kalman filter algorithm. The proposed method is applied to an integrated gas and power systems formed by GasLib-40 and IEEE 39-bus system with five gas turbine units. The simulating results show that the method can obtain the accurate dynamic states under three different measurement error conditions, and the filtering performance are better than separate estimation methods. Additionally, the proposed method is robust when the measurements experience bad data.
ENERGY INDUSTRY
towardsdatascience.com

How to Automatically Design an Efficient Neural Network

A Gentle Introduction to Neural Architecture Search. If you have ever used Deep Learning methods, you may already know that whenever you consider a new dataset, your model’s performance is crucially dependent on the network’s architecture. That’s why while fine-tuning pre-trained networks can assist you in your journey,...
CODING & PROGRAMMING
arxiv.org

Studying the neural representations of uncertainty

Edgar Y Walker, Stephan Pohl, Rachel N Denison, David L Barack, Jennifer Lee, Ned Block, Wei Ji Ma, Florent Meyniel. The study of the brain's representations of uncertainty is a central topic in neuroscience. Unlike other cases of representation, uncertainty is a property of an observer's representation of the world, posing specific methodological challenges. We analyze how the literature on uncertainty addresses those challenges and distinguish between "descriptive" and "process" approaches. Descriptive approaches treat uncertainty reported by subjects or inferred from stimuli as an independent variable used to test for a relationship to neural responses. By contrast, process approaches treat uncertainty derived from models of neural responses as a dependent variable used to test for a relationship to subjects' reports or stimuli. To compare those two approaches, we apply four criteria for neural representations: sensitivity, specificity, invariance, functionality. Experiments can be cataloged by their approach and whether they test for each criterion. Our analysis rigorously characterizes the study of neural representations of uncertainty, shaping research questions and guiding future experiments.
SCIENCE
arxiv.org

The Ecological Footprint of Neural Machine Translation Systems

Over the past decade, deep learning (DL) has led to significant advancements in various fields of artificial intelligence, including machine translation (MT). These advancements would not be possible without the ever-growing volumes of data and the hardware that allows large DL models to be trained efficiently. Due to the large amount of computing cores as well as dedicated memory, graphics processing units (GPUs) are a more effective hardware solution for training and inference with DL models than central processing units (CPUs). However, the former is very power demanding. The electrical power consumption has economical as well as ecological implications.
SCIENCE
arxiv.org

Distributed State Estimation with Deep Neural Networks for Uncertain Nonlinear Systems under Event-Triggered Communication

Distributed state estimation is examined for a sensor network tasked with reconstructing a system's state through the use of a distributed and event-triggered observer. Each agent in the sensor network employs a deep neural network (DNN) to approximate the uncertain nonlinear dynamics of the system, which is trained using a multiple timescale approach. Specifically, the outer weights of each DNN are updated online using a Lyapunov-based gradient descent update law, while the inner weights and biases are trained offline using a supervised learning method and collected input-output data. The observer utilizes event-triggered communication to promote the efficient use of network resources. A nonsmooth Lyapunov analysis shows the distributed event-triggered observer has a uniformly ultimately bounded state reconstruction error. A simulation study is provided to validate the result and demonstrate the performance improvements afforded by the DNNs.
COMPUTERS
CSO

Google adds Python to its differential privacy repertoire

Google has announced it's adding Python to the languages supported by one of its open-source projects designed to bolster privacy on the internet. The project includes a library and tools for using differential privacy, a technology designed to preserve an individual's privacy in large data sets. "Previously, our differential privacy...
SOFTWARE
arxiv.org

Energy awareness in low precision neural networks

Power consumption is a major obstacle in the deployment of deep neural networks (DNNs) on end devices. Existing approaches for reducing power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not take into account the precise power consumed by each module in the network, and are therefore not optimal. In this paper we develop accurate power consumption models for all arithmetic operations in the DNN, under various working conditions. We reveal several important factors that have been overlooked to date. Based on our analysis, we present PANN (power-aware neural network), a simple approach for approximating any full-precision network by a low-power fixed-precision variant. Our method can be applied to a pre-trained network, and can also be used during training to achieve improved performance. In contrast to previous methods, PANN incurs only a minor degradation in accuracy w.r.t. the full-precision version of the network, even when working at the power-budget of a 2-bit quantized variant. In addition, our scheme enables to seamlessly traverse the power-accuracy trade-off at deployment time, which is a major advantage over existing quantization methods that are constrained to specific bit widths.
arxiv.org

Sparse Polynomial Optimisation for Neural Network Verification

The prevalence of neural networks in society is expanding at an increasing rate. It is becoming clear that providing robust guarantees on systems that use neural networks is very important, especially in safety-critical applications. A trained neural network's sensitivity to adversarial attacks is one of its greatest shortcomings. To provide robust guarantees, one popular method that has seen success is to bound the activation functions using equality and inequality constraints. However, there are numerous ways to form these bounds, providing a trade-off between conservativeness and complexity. Depending on the complexity of these bounds, the computational time of the optimisation problem varies, with longer solve times often leading to tighter bounds. We approach the problem from a different perspective, using sparse polynomial optimisation theory and the Positivstellensatz, which derives from the field of real algebraic geometry. The former exploits the natural cascading structure of the neural network using ideas from chordal sparsity while the later asserts the emptiness of a semi-algebraic set with a nested family of tests of non-decreasing accuracy to provide tight bounds. We show that bounds can be tightened significantly, whilst the computational time remains reasonable. We compare the solve times of different solvers and show how the accuracy can be improved at the expense of increased computation time. We show that by using this sparse polynomial framework the solve time and accuracy can be improved over other methods for neural network verification with ReLU, sigmoid and tanh activation functions.
CODING & PROGRAMMING
arxiv.org

Empirical Risk Minimization with Relative Entropy Regularization: Optimality and Sensitivity Analysis

The optimality and sensitivity of the empirical risk minimization problem with relative entropy regularization (ERM-RER) are investigated for the case in which the reference is a sigma-finite measure instead of a probability measure. This generalization allows for a larger degree of flexibility in the incorporation of prior knowledge over the set of models. In this setting, the interplay of the regularization parameter, the reference measure, the risk function, and the empirical risk induced by the solution of the ERM-RER problem is characterized. This characterization yields necessary and sufficient conditions for the existence of a regularization parameter that achieves an arbitrarily small empirical risk with arbitrarily high probability. The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied. The sensitivity is then used to provide upper and lower bounds on the expected empirical risk. Moreover, it is shown that the expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets.
SCIENCE
arxiv.org

Verifying Inverse Model Neural Networks

Inverse problems exist in a wide variety of physical domains from aerospace engineering to medical imaging. The goal is to infer the underlying state from a set of observations. When the forward model that produced the observations is nonlinear and stochastic, solving the inverse problem is very challenging. Neural networks are an appealing solution for solving inverse problems as they can be trained from noisy data and once trained are computationally efficient to run. However, inverse model neural networks do not have guarantees of correctness built-in, which makes them unreliable for use in safety and accuracy-critical contexts. In this work we introduce a method for verifying the correctness of inverse model neural networks. Our approach is to overapproximate a nonlinear, stochastic forward model with piecewise linear constraints and encode both the overapproximate forward model and the neural network inverse model as a mixed-integer program. We demonstrate this verification procedure on a real-world airplane fuel gauge case study. The ability to verify and consequently trust inverse model neural networks allows their use in a wide variety of contexts, from aerospace to medicine.
SCIENCE
arxiv.org

Lochs-type theorems beyond positive entropy

Lochs' theorem and its generalizations are conversion theorems that relate the number of digits determined in one expansion of a real number as a function of the number of digits given in some other expansion. In its original version, Lochs' theorem related decimal expansions with continued fraction expansions. Such conversion results can also be stated for sequences of interval partitions under suitable assumptions, with results holding almost everywhere, or in measure, involving the entropy. This is the viewpoint we develop here. In order to deal with sequences of partitions beyond positive entropy, this paper introduces the notion of log-balanced sequences of partitions, together with their weight functions. These are sequences of interval partitions such that the logarithms of the measures of their intervals at each depth are roughly the same. We then state Lochs-type theorems which work even in the case of zero entropy, in particular for several important log-balanced sequences of partitions of a number-theoretic nature.
MATHEMATICS
arxiv.org

Isothermal Limit of Entropy Solutions of the Euler Equations for Isentropic Gas Dynamics

We are concerned with the isothermal limit of entropy solutions in $L^\infty$, containing the vacuum states, of the Euler equations for isentropic gas dynamics. We prove that the entropy solutions in $L^\infty$ of the isentropic Euler equations converge strongly to the corresponding entropy solutions of the isothermal Euler equations, when the adiabatic exponent $\gamma \rightarrow 1$. This is achieved by combining careful entropy analysis and refined kinetic formulation with compensated compactness argument to obtain the required uniform estimates for the limit. The entropy analysis involves careful estimates for the relation between the corresponding entropy pairs for the isentropic and isothermal Euler equations when the adiabatic exponent $\gamma\to 1$. The kinetic formulation for the entropy solutions of the isentropic Euler equations with the uniformly bounded initial data is refined, so that the total variation of the dissipation measures in the formulation is locally uniformly bounded with respect to $\gamma>1$.
SCIENCE
arxiv.org

Differentially Private Graph Classification with GNNs

Tamara T. Mueller, Johannes C. Paetzold, Chinmay Prabhakar, Dmitrii Usynin, Daniel Rueckert, Georgios Kaissis. Graph Neural Networks (GNNs) have established themselves as the state-of-the-art models for many machine learning applications such as the analysis of social networks, protein interactions and molecules. Several among these datasets contain privacy-sensitive data. Machine learning with differential privacy is a promising technique to allow deriving insight from sensitive data while offering formal guarantees of privacy protection. However, the differentially private training of GNNs has so far remained under-explored due to the challenges presented by the intrinsic structural connectivity of graphs. In this work, we introduce differential privacy for graph-level classification, one of the key applications of machine learning on graphs. Our method is applicable to deep learning on multi-graph datasets and relies on differentially private stochastic gradient descent (DP-SGD). We show results on a variety of synthetic and public datasets and evaluate the impact of different GNN architectures and training hyperparameters on model performance for differentially private graph classification. Finally, we apply explainability techniques to assess whether similar representations are learned in the private and non-private settings and establish robust baselines for future work in this area.
COMPUTERS
arxiv.org

Tailoring Gradient Methods for Differentially-Private Distributed Optimization

Decentralized optimization is gaining increased traction due to its widespread applications in large-scale machine learning and multi-agent systems. The same mechanism that enables its success, i.e., information sharing among participating agents, however, also leads to the disclosure of individual agents' private information, which is unacceptable when sensitive data are involved. As differential privacy is becoming a de facto standard for privacy preservation, recently results have emerged integrating differential privacy with distributed optimization. Although such differential-privacy based privacy approaches for distributed optimization are efficient in both computation and communication, directly incorporating differential privacy design in existing distributed optimization approaches significantly compromises optimization accuracy. In this paper, we propose to redesign and tailor gradient methods for differentially-private distributed optimization, and propose two differential-privacy oriented gradient methods that can ensure both privacy and optimality. We prove that the proposed distributed algorithms can ensure almost sure convergence to an optimal solution under any persistent and variance-bounded differential-privacy noise, which, to the best of our knowledge, has not been reported before. The first algorithm is based on static-consensus based gradient methods and only shares one variable in each iteration. The second algorithm is based on dynamic-consensus (gradient-tracking) based distributed optimization methods and, hence, it is applicable to general directed interaction graph topologies. Numerical comparisons with existing counterparts confirm the effectiveness of the proposed approaches.
CODING & PROGRAMMING
arxiv.org

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical analysis has proved FQE to be minimax-optimal with tabular, linear and several nonparametric function families, its practical performance with more general function approximator is less theoretically understood. We focus on FQE with general differentiable function approximators, making our theory applicable to neural function approximations. We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator. In addition, we study bootstrapping FQE estimators for error distribution inference and estimating confidence intervals, accompanied by a Cramer-Rao lower bound that matches our upper bounds. The Z-estimation analysis provides a generalizable theoretical framework for studying off-policy estimation in RL and provides sharp statistical theory for FQE with differentiable function approximators.
SCIENCE
arxiv.org

ASR-Aware End-to-end Neural Diarization

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model. Two categories of features are explored: features derived directly from ASR output (phones, position-in-word and word boundaries) and features derived from a lexical speaker change detection model, trained by fine-tuning a pretrained BERT model on the ASR output. Three modifications to the Conformer-based EEND architecture are proposed to incorporate the features. First, ASR features are concatenated with acoustic features. Second, we propose a new attention mechanism called contextualized self-attention that utilizes ASR features to build robust speaker representations. Finally, multi-task learning is used to train the model to minimize classification loss for the ASR features along with diarization loss. Experiments on the two-speaker English conversations of Switchboard+SRE data sets show that multi-task learning with position-in-word information is the most effective way of utilizing ASR features, reducing the diarization error rate (DER) by 20% relative to the baseline.
COMPUTERS
arxiv.org

A Neural Network Model of Continual Learning with Cognitive Control

Neural networks struggle in continual learning settings from catastrophic forgetting: when trials are blocked, new learning can overwrite the learning from previous blocks. Humans learn effectively in these settings, in some cases even showing an advantage of blocking, suggesting the brain contains mechanisms to overcome this problem. Here, we build on previous work and show that neural networks equipped with a mechanism for cognitive control do not exhibit catastrophic forgetting when trials are blocked. We further show an advantage of blocking over interleaving when there is a bias for active maintenance in the control signal, implying a tradeoff between maintenance and the strength of control. Analyses of map-like representations learned by the networks provided additional insights into these mechanisms. Our work highlights the potential of cognitive control to aid continual learning in neural networks, and offers an explanation for the advantage of blocking that has been observed in humans.
SCIENCE

Comments / 0

