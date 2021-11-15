ContributorsPublishersAdvertisers
Distributionally Robust Expected Residual Minimization for Stochastic Variational Inequality Problems

By Atsushi Hori, Yuya Yamakawa, Nobuo Yamashita
The stochastic variational inequality problem (SVIP) is an equilibrium model that includes random variables and has been widely applied in various fields such as economics and engineering. Expected residual minimization (ERM) is an established model...

Distributed stochastic proximal algorithm with random reshuffling for non-smooth finite-sum optimization

The non-smooth finite-sum minimization is a fundamental problem in machine learning. This paper develops a distributed stochastic proximal-gradient algorithm with random reshuffling to solve the finite-sum minimization over time-varying multi-agent networks. The objective function is a sum of differentiable convex functions and non-smooth regularization. Each agent in the network updates local variables with a constant step-size by local information and cooperates to seek an optimal solution. We prove that local variable estimates generated by the proposed algorithm achieve consensus and are attracted to a neighborhood of the optimal solution in expectation with an $\mathcal{O}(\frac{1}{T}+\frac{1}{\sqrt{T}})$ convergence rate. In addition, this paper shows that the steady-state error of the objective function can be arbitrarily small by choosing small enough step-sizes. Finally, some comparative simulations are provided to verify the convergence performance of the proposed algorithm.
A Multirate Variational Approach to Nonlinear MPC

A nonlinear model predictive control (NMPC) approach is proposed based on a variational representation of the system model and the receding horizon optimal control problem. The proposed tube-based convex MPC approach provides improvements in model accuracy and computational efficiency, and allows for alternative means of computing linearization error bounds. To this end we investigate the use of single rate and multirate system representations derived from a discrete variational principle to obtain structure-preserving time-stepping schemes. We show empirically that the desirable conservation properties of the discrete time model are inherited by the optimal control problem. Model linearization is achieved either by direct Jacobian Linearization or by quadratic and linear Taylor series approximations of the Lagrangian and generalized forces respectively. These two linearization schemes are proved to be equivalent for a specific choice of approximation points. Using the multirate variational formulation we derive a novel multirate NMPC approach, and show that it can provide large computational savings for systems with dynamics or control inputs evolving on different time scales.
Model-Based Reinforcement Learning for Stochastic Hybrid Systems

Optimal control of general nonlinear systems is a central challenge in automation. Data-driven approaches to control, enabled by powerful function approximators, have recently had great success in tackling challenging robotic applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand the closed-loop behavior. This paper adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. Therefore, we consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expecation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine dynamical systems with nonlinear boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract locally linear or polynomial feedback controllers from nonlinear experts via imitation learning. Finally, we introduce a novel hybrid realtive entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid systems and optimizes a set of time-invariant local feedback controllers derived from a locally polynomial approximation of a global value function.
Robust Eigenvectors of Symmetric Tensors

The {\em tensor power method} generalizes the matrix power method to higher order arrays, or tensors. Like in the matrix case, the fixed points of the tensor power method are the eigenvectors of the tensor. While every real symmetric matrix has an eigendecomposition, the vectors generating a symmetric decomposition of a real symmetric tensor are not always eigenvectors of the tensor.
Noise-Assisted Variational Quantum Thermalization

Preparing thermal states on a quantum computer can have a variety of applications, from simulating many-body quantum systems to training machine learning models. Variational circuits have been proposed for this task on near-term quantum computers, but several challenges remain, such as finding a scalable cost-function, avoiding the need of purification, and mitigating noise effects. We propose a new algorithm for thermal state preparation that tackles those three challenges by exploiting the noise of quantum circuits. We consider a variational architecture containing a depolarizing channel after each unitary layer, with the ability to directly control the level of noise. We derive a closed-form approximation for the free-energy of such circuit and use it as a cost function for our variational algorithm. By evaluating our method on a variety of Hamiltonians and system sizes, we find several systems for which the thermal state can be approximated with a high fidelity. However, we also show that the ability for our algorithm to learn the thermal state strongly depends on the temperature: while a high fidelity can be obtained for high and low temperatures, we identify a specific range for which the problem becomes more challenging. We hope that this first study on noise-assisted thermal state preparation will inspire future research on exploiting noise in variational algorithms.
Spatial statistics and stochastic partial differential equations: a mechanistic viewpoint

The Stochastic Partial Differential Equation (SPDE) approach, now commonly used in spatial statistics to construct Gaussian random fields, is revisited from a mechanistic perspective based on the movement of microscopic particles, thereby relating pseudo-differential operators to dispersal kernels. We first establish a connection between Lévy flights and PDEs involving the Fractional Laplacian (FL) operator. The corresponding Fokker-Planck PDEs will serve as a basis to propose new generalisations by considering a general form of SPDE with terms accounting for dispersal, drift and reaction. We detail the difference between the FL operator (with or without linear reaction term) associated with a fat-tailed dispersal kernel and therefore describing long-distance dependencies, and the damped FL operator associated with a thin-tailed kernel, thus corresponding to short-distance dependencies. Then, SPDE-based random fields with non-stationary external spatially and temporally varying force are illustrated and nonlinear bistable reaction term are introduced. The physical meaning of the latter and possible applications are discussed. Returning to the particulate interpretation of the above-mentioned equations, we describe in a relatively simple case their links with point processes. We unravel the nature of the point processes they generate and show how such mechanistic models, associated to a probabilistic observation model, can be used in a hierarchical setting to estimate the parameters of the particle dynamics.
Residual-Guided Learning Representation for Self-Supervised Monocular Depth Estimation

Photometric consistency loss is one of the representative objective functions commonly used for self-supervised monocular depth estimation. However, this loss often causes unstable depth predictions in textureless or occluded regions due to incorrect guidance. Recent self-supervised learning approaches tackle this issue by utilizing feature representations explicitly learned from auto-encoders, expecting better discriminability than the input image. Despite the use of auto-encoded features, we observe that the method does not embed features as discriminative as auto-encoded features. In this paper, we propose residual guidance loss that enables the depth estimation network to embed the discriminative feature by transferring the discriminability of auto-encoded features. We conducted experiments on the KITTI benchmark and verified our method's superiority and orthogonality on other state-of-the-art methods.
Non-Adaptive Stochastic Score Classification and Explainable Halfspace Evaluation

We consider the stochastic score classification problem. There are several binary tests, where each test $i$ is associated with a probability $p_i$ of being positive and a cost $c_i$. The score of an outcome is a weighted sum of all positive tests, and the range of possible scores is partitioned into intervals corresponding to different classes. The goal is to perform tests sequentially (and possibly adaptively) so as to identify the class at the minimum expected cost. We provide the first constant-factor approximation algorithm for this problem, which improves over the previously-known logarithmic approximation ratio. Moreover, our algorithm is $non$ $adaptive$: it just involves performing tests in a $fixed$ order until the class is identified. Our approach also extends to the $d$-dimensional score classification problem and the "explainable" stochastic halfspace evaluation problem (where we want to evaluate some function on $d$ halfspaces). We obtain an $O(d^2\log d)$-approximation algorithm for both these extensions. Finally, we perform computational experiments that demonstrate the practical performance of our algorithm for score classification. We observe that, for most instances, the cost of our algorithm is within $50\%$ of an information-theoretic lower bound on the optimal value.
An Algebraic and Microlocal Approach to the Stochastic Non-linear Schrodinger Equation

In a recent work [DDRZ20], it has been developed a novel framework aimed at studying at a perturbative level a large class of non-linear, scalar, real, stochastic PDEs and inspired by the algebraic approach to quantum field theory. The main advantage is the possibility of computing the expectation value and the correlation functions of the underlying solutions accounting for renormalization intrinsically and without resorting to any specific regularization scheme. In this work we prove that it is possible to extend the range of applicability of this framework to cover also the stochastic non-linear Schroedinger equation in which randomness is codified by an additive, Gaussian, complex white noise.
Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

Joel Dapello, Jenelle Feather, Hang Le, Tiago Marques, David D. Cox, Josh H. McDermott, James J. DiCarlo, SueYeon Chung. Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response stochasticity, like that exhibited by biological neurons. Here, using recently developed geometrical techniques from computational neuroscience, we investigate how adversarial perturbations influence the internal representations of standard, adversarially trained, and biologically-inspired stochastic networks. We find distinct geometric signatures for each type of network, revealing different mechanisms for achieving robust representations. Next, we generalize these results to the auditory domain, showing that neural stochasticity also makes auditory models more robust to adversarial perturbations. Geometric analysis of the stochastic networks reveals overlap between representations of clean and adversarially perturbed stimuli, and quantitatively demonstrates that competing geometric effects of stochasticity mediate a tradeoff between adversarial and clean performance. Our results shed light on the strategies of robust perception utilized by adversarially trained and stochastic networks, and help explain how stochasticity may be beneficial to machine and biological computation.
Orthounimodal Distributionally Robust Optimization: Representation, Computation and Multivariate Extreme Event Applications

This paper studies a basic notion of distributional shape known as orthounimodality (OU) and its use in shape-constrained distributionally robust optimization (DRO). As a key motivation, we argue how such type of DRO is well-suited to tackle multivariate extreme event estimation by giving statistically valid confidence bounds on target extremal probabilities. In particular, we explain how DRO can be used as a nonparametric alternative to conventional extreme value theory that extrapolates tails based on theoretical limiting distributions, which could face challenges in bias-variance control and other technical complications. We also explain how OU resolves the challenges in interpretability and robustness faced by existing distributional shape notions used in the DRO literature. Methodologically, we characterize the extreme points of the OU distribution class in terms of what we call OU sets and build a corresponding Choquet representation, which subsequently allows us to reduce OU-DRO into moment problems over infinite-dimensional random variables. We then develop, in the bivariate setting, a geometric approach to reduce such moment problems into finite dimension via a specially constructed variational problem designed to eliminate suboptimal solutions. Numerical results illustrate how our approach gives rise to valid and competitive confidence bounds for extremal probabilities.
Hybrid Acceleration Scheme for Variance Reduced Stochastic Optimization Algorithms

Stochastic variance reduced optimization methods are known to be globally convergent while they suffer from slow local convergence, especially when moderate or high accuracy is needed. To alleviate this problem, we propose an optimization algorithm -- which we refer to as a hybrid acceleration scheme -- for a class of proximal variance reduced stochastic optimization algorithms. The proposed optimization scheme combines a fast locally convergent algorithm, such as a quasi--Newton method, with a globally convergent variance reduced stochastic algorithm, for instance SAGA or L--SVRG. Our global convergence result of the hybrid acceleration method is based on specific safeguard conditions that need to be satisfied for a step of the locally fast convergent method to be accepted.
Stochastic Rotating Waves

Stochastic dynamics has emerged as one of the key themes ranging from models in applications to theoretical foundations in mathematics. One class of stochastic dynamics problems that has received considerable attention recently are travelling wave patterns occurring in stochastic partial differential equations (SPDEs), i.e., how deterministic travelling waves behave under stochastic perturbations. In this paper, we start the mathematical study of related class of problems: stochastic rotating waves generated by SPDEs. We combine deterministic dynamics PDE techniques with methods from stochastic analysis. We establish two different approaches, the variational phase and the approximated variational phase, for defining stochastic phase variables along the rotating wave, which track the effect of noise on neutral spectral modes associated to the special Euclidean symmetry group of rotating waves. Furthermore, we prove transverse stability results for rotating waves showing that over certain time scales and for small noise, the stochastic rotating wave stays close to its deterministic counterpart.
Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

This paper investigates the problem of best arm identification in $\textit{contaminated}$ stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for the gap-based algorithm, the sample complexity is asymptotically optimal up to constant factors, while for the successive elimination-based algorithm, it is optimal up to logarithmic factors. Finally, numerical experiments are provided to illustrate the gains of the algorithms compared to the existing baselines.
Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence rate compared to stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.
Weiss variation for general boundaries

The Weiss variation of the Einstein-Hilbert action with an appropriate boundary term has been studied for general boundary surfaces; the boundary surfaces can be spacelike, timelike, or null. To achieve this we introduce an auxiliary reference connection and find that the resulting Weiss variation yields the Einstein equations as expected, with additional boundary contributions. Among these boundary contributions, we obtain the dynamical variable and the associated conjugate momentum, irrespective of the spacelike, timelike or, null nature of the boundary surface. We also arrive at the generally non-vanishing covariant generalization of the Einstein energy-momentum pseudotensor. We study this tensor in the Schwarzschild geometry and find that the pseudotensorial ambiguities translate into ambiguities in the choice of coordinates on the reference geometry. Moreover, we show that from the Weiss variation, one can formally derive a gravitational Schr{ö}dinger equation, which may, despite ambiguities in the definition of the Hamiltonian, be useful as a tool for studying the problem of time in quantum general relativity. Implications have been discussed.
Keys to Accurate Feature Extraction Using Residual Spiking Neural Networks

Alex Vicente-Sola (1), Davide L. Manna (1), Paul Kirkland (1), Gaetano Di Caterina (1), Trevor Bihl (2) ((1) University of Strathclyde, (2) Air Force Research Laboratory) Spiking neural networks (SNNs) have become an interesting alternative to conventional artificial neural networks (ANN) thanks to their temporal processing capabilities and their low-SWaP (Size, Weight, and Power) and energy efficient implementations in neuromorphic hardware. However the challenges involved in training SNNs have limited their performance in terms of accuracy and thus their applications. Improving learning algorithms and neural architectures for a more accurate feature extraction is therefore one of the current priorities in SNN research. In this paper we present a study on the key components of modern spiking architectures. We empirically compare different techniques in image classification datasets taken from the best performing networks. We design a spiking version of the successful residual network (ResNet) architecture and test different components and training strategies on it. Our results provide a state of the art guide to SNN design, which allows to make informed choices when trying to build the optimal visual feature extractor. Finally, our network outperforms previous SNN architectures in CIFAR-10 (94.1%) and CIFAR-100 (74.5%) datasets and matches the state of the art in DVS-CIFAR10 (71.3%), with less parameters than the previous state of the art and without the need for ANN-SNN conversion. Code available at this https URL.
Instability and Turbulent Relaxation in a Stochastic Magnetic Field

An analysis of instability dynamics in a stochastic magnetic field is presented for the tractable case of the resistive interchange. Externally prescribed static magnetic perturbations convert the eigenmode problem to a stochastic differential equation, which is solved by the method of averaging. The dynamics are rendered multi-scale, due to the size disparity between the test mode and magnetic perturbations. Maintaining quasi-neutrality at all orders requires that small-scale convective cell turbulence be driven by disparate scale interaction. The cells in turn produce turbulent mixing of vorticity and pressure, which is calculated by fluctuation-dissipation type analyses, and are relevant to pump-out phenomena. The development of correlation between the ambient magnetic perturbations and the cells is demonstrated, showing that turbulence will `lock on' to ambient stochasticity. Magnetic perturbations are shown to produce a magnetic braking effect on vorticity generation at large scale. Detailed testable predictions are presented. The relations of these findings to the results of available simulations and recent experiments are discussed.
Adversarial Tradeoffs in Linear Inverse Problems and Robust StateEstimation

Adversarially robust training has been shown to reduce the susceptibility of learned models to targeted input data perturbations. However, it has also been observed that such adversarially robust models suffer a degradation in accuracy when applied to unperturbed data sets, leading to a robustness-accuracy tradeoff. In this paper, we provide sharp and interpretable characterizations of such robustness-accuracy tradeoffs for linear inverse problems. In particular, we provide an algorithm to find the optimal adversarial perturbation given data, and develop tight upper and lower bounds on the adversarial loss in terms of the standard (non-adversarial) loss and the spectral properties of the resulting estimator. Further, motivated by the use of adversarial training in reinforcement learning, we define and analyze the \emph{adversarially robust Kalman Filtering problem.} We apply a refined version of our general theory to this problem, and provide the first characterization of robustness-accuracy tradeoffs in a setting where the data is generated by a dynamical system. In doing so, we show a natural connection between a filter's robustness to adversarial perturbation and underlying control theoretic properties of the system being observed, namely the spectral properties of its observability gramian.
Stochastic and Worst-Case Generalized Sorting Revisited

The \emph{generalized sorting problem} is a restricted version of standard comparison sorting where we wish to sort $n$ elements but only a subset of pairs are allowed to be compared. Formally, there is some known graph $G = (V, E)$ on the $n$ elements $v_1, \dots, v_n$, and the goal is to determine the true order of the elements using as few comparisons as possible, where all comparisons $(v_i, v_j)$ must be edges in $E$. We are promised that if the true ordering is $x_1 < x_2 < \cdots < x_n$ for $\{x_i\}$ an unknown permutation of the vertices $\{v_i\}$, then $(x_i, x_{i+1}) \in E$ for all $i$: this Hamiltonian path ensures that sorting is actually possible.
