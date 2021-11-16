ContributorsPublishersAdvertisers
Science

Asymptotics of solutions of the sample average approximation method to solve risk averse stochastic programs

By Volker Kratschmer
arxiv.org
 8 days ago

The paper studies the Sample Average Approximation method to solve risk averse stochastic programs expressed in terms of divergence risk measures. It continues the recent contribution [18] on the first order asymptotics of the optimal values. We...

arxiv.org

Comments / 0

Related
arxiv.org

GCGE: A Package for Solving Large Scale Eigenvalue Problems by Parallel Block Damping Inverse Power Method

We propose an eigensolver and the corresponding package, GCGE, for solving large scale eigenvalue problems. This method is the combination of damping idea, subspace projection method and inverse power method with dynamic shifts. To reduce the dimensions of projection subspaces, a moving mechanism is developed when the number of desired eigenpairs is large. The numerical methods, implementing techniques and the structure of the package are presented. Plenty of numerical results are provided to demonstrate the efficiency, stability and scalability of the concerned eigensolver and the package GCGE for computing many eigenpairs of large symmetric matrices arising from applications.
MATHEMATICS
arxiv.org

Long time asymptotics for the nonlocal mKdV equation with finite density initial data

In this paper, we consider the Cauchy problem for an integrable real nonlocal (also called reverse-space-time) mKdV equation with nonzero boundary conditions \begin{align*} &q_t(x,t)-6\sigma q(x,t)q(-x,-t)q_{x}(x,t)+q_{xxx}(x,t)=0, &q(x,0)=q_{0}(x),\lim_{x\to \pm\infty} q_{0}(x)=q_{\pm}, \end{align*} where $|q_{\pm}|=1$ and $q_{+}=\delta q_{-}$, $\sigma\delta=-1$. Based on the spectral analysis of the Lax pair, we express the solution of the Cauchy problem of the nonlocal mKdV equation in terms of a Riemann-Hilbert problem. In a fixed space-time solitonic region $-6<x/t<6$, we apply $\bar{\partial}$-steepest descent method to analyze the long-time asymptotic behavior of the solution $q(x,t)$. We find that the long time asymptotic behavior of $q(x,t)$ can be characterized with an $N(\Lambda)$-soliton on discrete spectrum and leading order term $\mathcal{O}(t^{-1/2})$ on continuous spectrum up to an residual error order $\mathcal{O}(t^{-1})$.
MATHEMATICS
arxiv.org

The Augmented Lagrangian Method Can Approximately Solve Convex Optimization with Least Constraint Violation

There are many important practical optimization problems whose feasible regions are not known to be nonempty or not, and optimizers of the objective function with the least constraint violation prefer to be found. A natural way for dealing with these problems is to extend the nonlinear optimization problem as the one optimizing the objective function over the set of points with the least constraint violation. This leads to the study of the shifted problem. This paper focuses on the constrained convex optimization problem. The sufficient condition for the closedness of the set of feasible shifts is presented and the continuity properties of the optimal value function and the solution mapping for the shifted problem are studied. Properties of the conjugate dual of the shifted problem are discussed through the relations between the dual function and the optimal value function. The solvability of the dual of the optimization problem with the least constraint violation is investigated. It is shown that, if the least violated shift is in the domain of the subdifferential of the optimal value function, then this dual problem has an unbounded solution set. Under this condition, the optimality conditions for the problem with the least constraint violation are established in term of the augmented Lagrangian. It is shown that the augmented Lagrangian method has the properties that the sequence of shifts converges to the least violated shift and the sequence of multipliers is unbounded. Moreover, it is proved that the augmented Lagrangian method is able to find an approximate solution to the problem with the least constraint violation.
COMPUTERS
arxiv.org

Asymptotic distribution for pairs of linear and quadratic forms at integral vectors

We study the joint distribution of values of a pair consisting of a quadratic form $q$ and a linear form $\mathbf l$ over the set of integral vectors, a problem initiated by Dani-Margulis (1989). In the spirit of the celebrated theorem of Eskin, Margulis and Mozes on the quantitative version of the Oppenheim conjecture, we show that if $n \ge 5$ then under the assumptions that for every $(\alpha, \beta ) \in \mathbb R^2 \setminus \{ (0,0) \}$, the form $\alpha q + \beta \mathbf l^2$ is irrational and that the signature of the restriction of $q$ to the kernel of $\mathbf l$ is $(p, n-1-p)$, where $3\le p \le n-2$, the number of vectors $v \in \mathbb Z^n$ for which $\|v\| < T$, $a < q(v) < b$ and $c< \mathbf l(v) < d$ is asymptotically.
MATHEMATICS
IN THIS ARTICLE
#Approximation#Risk Averse#Stochastic
arxiv.org

Model-Based Reinforcement Learning for Stochastic Hybrid Systems

Optimal control of general nonlinear systems is a central challenge in automation. Data-driven approaches to control, enabled by powerful function approximators, have recently had great success in tackling challenging robotic applications. However, such methods often obscure the structure of dynamics and control behind black-box over-parameterized representations, thus limiting our ability to understand the closed-loop behavior. This paper adopts a hybrid-system view of nonlinear modeling and control that lends an explicit hierarchical structure to the problem and breaks down complex dynamics into simpler localized units. Therefore, we consider a sequence modeling paradigm that captures the temporal structure of the data and derive an expecation-maximization (EM) algorithm that automatically decomposes nonlinear dynamics into stochastic piecewise affine dynamical systems with nonlinear boundaries. Furthermore, we show that these time-series models naturally admit a closed-loop extension that we use to extract locally linear or polynomial feedback controllers from nonlinear experts via imitation learning. Finally, we introduce a novel hybrid realtive entropy policy search (Hb-REPS) technique that incorporates the hierarchical nature of hybrid systems and optimizes a set of time-invariant local feedback controllers derived from a locally polynomial approximation of a global value function.
CODING & PROGRAMMING
arxiv.org

Deep ReLU neural network approximation of parametric and stochastic elliptic PDEs with lognormal inputs

We investigate non-adaptive methods of deep ReLU neural network approximation of the solution $u$ to parametric and stochastic elliptic PDEs with lognormal inputs on non-compact set $\mathbb{R}^\infty$. The approximation error is measured in the norm of the Bochner space $L_2(\mathbb{R}^\infty, V, \gamma)$, where $\gamma$ is the tensor product standard Gaussian probability on $\mathbb{R}^\infty$ and $V$ is the energy space. The approximation is based on an $m$-term truncation of the Hermite generalized polynomial chaos expansion (gpc) of $u$. Under a certain assumption on $\ell_q$-summability condition for lognormal inputs ($0< q <\infty$), we proved that for every integer $n > 1$, one can construct a non-adaptive compactly supported deep ReLU neural network $\boldsymbol{\phi}_n$ of size not greater than $n$ on $\mathbb{R}^m$ with $m = \mathcal{O} (n/\log n)$, having $m$ outputs so that the summation constituted by replacing polynomials in the $m$-term truncation of Hermite gpc expansion by these $m$ outputs approximates $u$ with an error bound $\mathcal{O}\left(\left(n/\log n\right)^{-1/q}\right)$. This error bound is comparable to the error bound of the best approximation of $u$ by $n$-term truncations of Hermite gpc expansion which is $\mathcal{O}(n^{-1/q})$. We also obtained some results on similar problems for parametric and stochastic elliptic PDEs with affine inputs, based on the Jacobi and Taylor gpc expansions.
CODING & PROGRAMMING
arxiv.org

Convergence and Stability of the Stochastic Proximal Point Algorithm with Momentum

Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence rate compared to stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.
CODING & PROGRAMMING
arxiv.org

Optimal bounds for numerical approximations of infinite horizon problems based on dynamic programming approach

In this paper we get error bounds for fully discrete approximations of infinite horizon problems via the dynamic programming approach. It is well known that considering a time discretization with a positive step size $h$ an error bound of size $h$ can be proved for the difference between the value function (viscosity solution of the Hamilton-Jacobi-Bellman equation corresponding to the infinite horizon) and the value function of the discrete time problem. However, including also a spatial discretization based on elements of size $k$ an error bound of size $O(k/h)$ can be found in the literature for the error between the value functions of the continuous problem and the fully discrete problem. In this paper we revise the error bound of the fully discrete method and prove, under similar assumptions to those of the time discrete case, that the error of the fully discrete case is in fact $O(h+k)$ which gives first order in time and space for the method. This error bound matches the numerical experiments of many papers in the literature in which the behaviour $1/h$ from the bound $O(k/h)$ have not been observed.
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Science
NewsBreak
Computer Science
arxiv.org

Bolstering Stochastic Gradient Descent with Model Building

Stochastic gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates for solving machine learning problems. These rates are obtained especially when these algorithms are fine-tuned for the application at hand. Although this tuning process can require large computational costs, recent work has shown that these costs can be reduced by line search methods that iteratively adjust the stepsize. We propose an alternative approach to stochastic line search by using a new algorithm based on forward step model building. This model building step incorporates a second-order information that allows adjusting not only the stepsize but also the search direction. Noting that deep learning model parameters come in groups (layers of tensors), our method builds its model and calculates a new step for each parameter group. This novel diagonalization approach makes the selected step lengths adaptive. We provide convergence rate analysis, and experimentally show that the proposed algorithm achieves faster convergence and better generalization in most problems. Moreover, our experiments show that the proposed method is quite robust as it converges for a wide range of initial stepsizes.
CODING & PROGRAMMING
arxiv.org

CSG: A stochastic gradient method for a wide class of optimization problems appearing in a machine learning or data-driven context

A recent article introduced thecontinuous stochastic gradient method (CSG) for the efficient solution of a class of stochastic optimization problems. While the applicability of known stochastic gradient type methods is typically limited to expected risk functions, no such limitation exists for CSG. This advantage stems from the computation of design dependent integration weights, allowing for optimal usage of available information and therefore stronger convergence properties. However, the nature of the formula used for these integration weights essentially limited the practical applicability of this method to problems in which stochasticity enters via a low-dimensional and sufficiently simple probability distribution. In this paper we significantly extend the scope of the CSG method by presenting alternative ways to calculate the integration weights. A full convergence analysis for this new variant of the CSG method is presented and its efficiency is demonstrated in comparison to more classical stochastic gradient methods by means of a number of problem classes relevant to stochastic optimization and machine learning.
CODING & PROGRAMMING
arxiv.org

Stochastic and Worst-Case Generalized Sorting Revisited

The \emph{generalized sorting problem} is a restricted version of standard comparison sorting where we wish to sort $n$ elements but only a subset of pairs are allowed to be compared. Formally, there is some known graph $G = (V, E)$ on the $n$ elements $v_1, \dots, v_n$, and the goal is to determine the true order of the elements using as few comparisons as possible, where all comparisons $(v_i, v_j)$ must be edges in $E$. We are promised that if the true ordering is $x_1 < x_2 < \cdots < x_n$ for $\{x_i\}$ an unknown permutation of the vertices $\{v_i\}$, then $(x_i, x_{i+1}) \in E$ for all $i$: this Hamiltonian path ensures that sorting is actually possible.
MATHEMATICS
arxiv.org

A robust collision source method for rank adaptive dynamical low-rank approximation in radiation therapy

Deterministic models for radiation transport describe the density of radiation particles moving through a background material. In radiation therapy applications, the phase space of this density is composed of energy, spatial position and direction of flight. The resulting six-dimensional phase space prohibits fine numerical discretizations, which are essential for the construction of accurate and reliable treatment plans. In this work, we tackle the high dimensional phase space through a dynamical low-rank approximation of the particle density. Dynamical low-rank approximation (DLRA) evolves the solution on a low-rank manifold in time. Interpreting the energy variable as a pseudo-time lets us employ the DLRA framework to represent the solution of the radiation transport equation on a low-rank manifold for every energy. Stiff scattering terms are treated through an efficient implicit energy discretization and a rank adaptive integrator is chosen to dynamically adapt the rank in energy. To facilitate the use of boundary conditions and reduce the overall rank, the radiation transport equation is split into collided and uncollided particles through a collision source method. Uncollided particles are described by a directed quadrature set guaranteeing low computational costs, whereas collided particles are represented by a low-rank solution. It can be shown that the presented method is L$^2$-stable under a time step restriction which does not depend on stiff scattering terms. Moreover, the implicit treatment of scattering does not require numerical inversions of matrices. Numerical results for radiation therapy configurations as well as the line source benchmark underline the efficiency of the proposed method.
SCIENCE
arxiv.org

Robust recovery for stochastic block models

We develop an efficient algorithm for weak recovery in a robust version of the stochastic block model. The algorithm matches the statistical guarantees of the best known algorithms for the vanilla version of the stochastic block model. In this sense, our results show that there is no price of robustness in the stochastic block model. Our work is heavily inspired by recent work of Banks, Mohanty, and Raghavendra (SODA 2021) that provided an efficient algorithm for the corresponding distinguishing problem. Our algorithm and its analysis significantly depart from previous ones for robust recovery. A key challenge is the peculiar optimization landscape underlying our algorithm: The planted partition may be far from optimal in the sense that completely unrelated solutions could achieve the same objective value. This phenomenon is related to the push-out effect at the BBP phase transition for PCA. To the best of our knowledge, our algorithm is the first to achieve robust recovery in the presence of such a push-out effect in a non-asymptotic setting. Our algorithm is an instantiation of a framework based on convex optimization (related to but distinct from sum-of-squares), which may be useful for other robust matrix estimation problems. A by-product of our analysis is a general technique that boosts the probability of success (over the randomness of the input) of an arbitrary robust weak-recovery algorithm from constant (or slowly vanishing) probability to exponentially high probability.
COMPUTERS
arxiv.org

Solving Probability and Statistics Problems by Program Synthesis

We solve university level probability and statistics questions by program synthesis using OpenAI's Codex, a Transformer trained on text and fine-tuned on code. We transform course problems from MIT's 18.05 Introduction to Probability and Statistics and Harvard's STAT110 Probability into programming tasks. We then execute the generated code to get a solution. Since these course questions are grounded in probability, we often aim to have Codex generate probabilistic programs that simulate a large number of probabilistic dependencies to compute its solution. Our approach requires prompt engineering to transform the question from its original form to an explicit, tractable form that results in a correct program and solution. To estimate the amount of work needed to translate an original question into its tractable form, we measure the similarity between original and transformed questions. Our work is the first to introduce a new dataset of university-level probability and statistics problems and solve these problems in a scalable fashion using the program synthesis capabilities of large language models.
CODING & PROGRAMMING
arxiv.org

Stochastic Rotating Waves

Stochastic dynamics has emerged as one of the key themes ranging from models in applications to theoretical foundations in mathematics. One class of stochastic dynamics problems that has received considerable attention recently are travelling wave patterns occurring in stochastic partial differential equations (SPDEs), i.e., how deterministic travelling waves behave under stochastic perturbations. In this paper, we start the mathematical study of related class of problems: stochastic rotating waves generated by SPDEs. We combine deterministic dynamics PDE techniques with methods from stochastic analysis. We establish two different approaches, the variational phase and the approximated variational phase, for defining stochastic phase variables along the rotating wave, which track the effect of noise on neutral spectral modes associated to the special Euclidean symmetry group of rotating waves. Furthermore, we prove transverse stability results for rotating waves showing that over certain time scales and for small noise, the stochastic rotating wave stays close to its deterministic counterpart.
SCIENCE
arxiv.org

Stochastic Extragradient: General Analysis and Improved Rates

The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, several important questions regarding the convergence properties of SEG are still open, including the sampling of stochastic gradients, mini-batching, convergence guarantees for the monotone finite-sum variational inequalities with possibly non-monotone terms, and others. To address these questions, in this paper, we develop a novel theoretical framework that allows us to analyze several variants of SEG in a unified manner. Besides standard setups, like Same-Sample SEG under Lipschitzness and monotonicity or Independent-Samples SEG under uniformly bounded variance, our approach allows us to analyze variants of SEG that were never explicitly considered in the literature before. Notably, we analyze SEG with arbitrary sampling which includes importance sampling and various mini-batching strategies as special cases. Our rates for the new variants of SEG outperform the current state-of-the-art convergence guarantees and rely on less restrictive assumptions.
CODING & PROGRAMMING
arxiv.org

Indefinite linear-quadratic optimal control of mean-field stochastic differential equation with jump diffusion: an equivalent cost functional method

In this paper, we consider a linear-quadratic optimal control problem of mean-field stochastic differential equation with jump diffusion, which is also called as an MF-LQJ problem. Here, cost functional is allowed to be indefinite. We use an equivalent cost functional method to deal with the MF-LQJ problem with indefinite weighting matrices. Some equivalent cost functionals enable us to establish a bridge between indefinite and positive-definite MF-LQJ problems. With such a bridge, solvabilities of stochastic Hamiltonian system and Riccati equations are further characterized. Optimal control of the indefinite MF-LQJ problem is represented as a state feedback via solutions of Riccati equations. As a by-product, the method provides a new way to prove the existence and uniqueness of solution to mean field forward-backward stochastic differential equation with jump diffusion (MF-FBSDEJ, for short), where existing methods in literature do not work. Some examples are provided to illustrate our results.
MATHEMATICS
arxiv.org

Asymptotic behavior of a doubly haptotactic cross-diffusion model for oncolytic virotherapy

V_t=- (\alpha_u u+\alpha_w w)v,\\ w_t=D_w\Delta w-\xi_w\nabla\cdot(w\nabla v)- w+\rho uz,\\ z_t=D_z\Delta z-\delta_z z- \rho uz+\beta w,. \end{array}\right. \end{equation*} with positive parameters $D_u,D_w,D_z,\xi_u,\xi_w,\delta_z,\rho$, $\alpha_u,\alpha_w,\mu_u,\beta$. When posed under no-flux boundary conditions in a smoothly bounded domain $\Omega\subset {\mathbb{R}}^2$, and along with initial conditions involving suitably regular data, the global existence of classical solution...
SCIENCE
arxiv.org

Learning Optimal Control with Stochastic Models of Hamiltonian Dynamics

Optimal control problems can be solved by first applying the Pontryagin maximum principle, followed by computing a solution of the corresponding unconstrained Hamiltonian dynamical system. In this paper, and to achieve a balance between robustness and efficiency, we learn a reduced Hamiltonian of the unconstrained Hamiltonian. This reduced Hamiltonian is learned by going backward in time and by minimizing the loss function resulting from application of the Pontryagin maximum principle conditions. The robustness of our learning process is then further improved by progressively learning a posterior distribution of reduced Hamiltonians. This leads to a more efficient sampling of the generalized coordinates (position, velocity) of our phase space. Our solution framework applies to not only optimal control problems with finite-dimensional phase (state) spaces but also the infinite dimensional case.
MATHEMATICS
arxiv.org

A Simple Approximation Algorithm for Vector Scheduling and Applications to Stochastic Min-Norm Load Balancing

We consider the Vector Scheduling problem on identical machines: we have m machines, and a set J of n jobs, where each job j has a processing-time vector $p_j\in \mathbb{R}^d_{\geq 0}$. The goal is to find an assignment $\sigma:J\to [m]$ of jobs to machines so as to minimize the makespan $\max_{i\in [m]}\max_{r\in [d]}( \sum_{j:\sigma(j)=i}p_{j,r})$. A natural lower bound on the optimal makespan is lb $:=\max\{\max_{j\in J,r\in [d]}p_{j,r},\max_{r\in [d]}(\sum_{j\in J}p_{j,r}/m)\}$. Our main result is a very simple O(log d)-approximation algorithm for vector scheduling with respect to the lower bound lb: we devise an algorithm that returns an assignment whose makespan is at most O(log d)*lb.
COMPUTERS

Comments / 0

Community Policy