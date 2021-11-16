ContributorsPublishersAdvertisers
Off-Policy Actor-Critic with Emphatic Weightings

By Eric Graves, Ehsan Imani, Raksha Kumaraswamy, Martha White
 8 days ago

A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due to the policy gradient theorem, which provides a simplified form for the gradient. The off-policy setting, however, has been less clear due to the existence of multiple objectives and...

arxiv.org

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

We study policy gradient (PG) for reinforcement learning in continuous time and space under the regularized exploratory formulation developed by Wang et al. (2020). We represent the gradient of the value function with respect to a given parameterized stochastic policy as the expected integration of an auxiliary running reward function that can be evaluated using samples and the current value function. This effectively turns PG into a policy evaluation (PE) problem, enabling us to apply the martingale approach recently developed by Jia and Zhou (2021) for PE to solve our PG problem. Based on this analysis, we propose two types of the actor-critic algorithms for RL, where we learn and update value functions and policies simultaneously and alternatingly. The first type is based directly on the aforementioned representation which involves future trajectories and hence is offline. The second type, designed for online learning, employs the first-order condition of the policy gradient and turns it into martingale orthogonality conditions. These conditions are then incorporated using stochastic approximation when updating policies. Finally, we demonstrate the algorithms by simulations in two concrete examples.
CODING & PROGRAMMING
arxiv.org

Quantum process tomography of adiabatic and superadiabatic stimulated Raman passage

Quantum control methods for three-level systems have become recently an important direction of research in quantum information science and technology. Here we present numerical simulations using realistic experimental parameters for quantum process tomography in STIRAP (stimulated Raman adiabatic passage) and saSTIRAP (superadiabatic STIRAP). Specifically, we identify a suitable basis in the operator space as the identity operator together with the 8 Gell-Mann operators, and we calculate the corresponding process matrices, which have $9\times 9=81$ elements. We discuss these results for the ideal decoherence-free case, as well as for the experimentally-relevant case with decoherence included.
SCIENCE
arxiv.org

Semantic-Aware Collaborative Deep Reinforcement Learning Over Wireless Cellular Networks

Collaborative deep reinforcement learning (CDRL) algorithms in which multiple agents can coordinate over a wireless network is a promising approach to enable future intelligent and autonomous systems that rely on real-time decision-making in complex dynamic environments. Nonetheless, in practical scenarios, CDRL faces many challenges due to the heterogeneity of agents and their learning tasks, different environments, time constraints of the learning, and resource limitations of wireless networks. To address these challenges, in this paper, a novel semantic-aware CDRL method is proposed to enable a group of heterogeneous untrained agents with semantically-linked DRL tasks to collaborate efficiently across a resource-constrained wireless cellular network. To this end, a new heterogeneous federated DRL (HFDRL) algorithm is proposed to select the best subset of semantically relevant DRL agents for collaboration. The proposed approach then jointly optimizes the training loss and wireless bandwidth allocation for the cooperating selected agents in order to train each agent within the time limit of its real-time task. Simulation results show the superior performance of the proposed algorithm compared to state-of-the-art baselines.
COMPUTERS
arxiv.org

State Estimation of the Stefan PDE: A Tutorial on Design and Applications to Polar Ice and Batteries

The Stefan PDE system is a representative model for thermal phase change phenomena, such as melting and solidification, arising in numerous science and engineering processes. The mathematical description is given by a Partial Differential Equation (PDE) of the temperature distribution defined on a spatial interval with a moving boundary, where the boundary represents the liquid-solid interface and its dynamics are governed by an Ordinary Differential Equation (ODE). The PDE-ODE coupling at the boundary is nonlinear and creates a significant challenge for state estimation with provable convergence and robustness. This tutorial article presents a state estimation method based on PDE backstepping for the Stefan system, using measurements only at the moving boundary. PDE backstepping observer design generates an observer gain by employing a Volterra transformation of the observer error state into a desirable target system, solving a Goursat-form PDE for the transformation's kernel, and performing a Lyapunov analysis of the target observer error system. The observer is applied to models of problems motivated by climate change and the need for renewable energy storage: a model of polar ice dynamics and a model of charging and discharging in lithium-ion batteries. The numerical results for polar ice demonstrate a robust performance of the designed estimator with respect to the unmodeled salinity effect in sea ice. The results for an electrochemical PDE model of a lithium-ion battery with a phase transition material show the elimination of more than 15 \% error in State-of-Charge estimate within 5 minutes even in the presence of sensor noise.
SCIENCE
arxiv.org

Uncertainty estimation under model misspecification in neural network regression

Maria R. Cervera, Rafael Dätwyler, Francesco D'Angelo, Hamza Keurti, Benjamin F. Grewe, Christian Henning. Although neural networks are powerful function approximators, the underlying modelling assumptions ultimately define the likelihood and thus the hypothesis class they are parameterizing. In classification, these assumptions are minimal as the commonly employed softmax is capable of representing any categorical distribution. In regression, however, restrictive assumptions on the type of continuous distribution to be realized are typically placed, like the dominant choice of training via mean-squared error and its underlying Gaussianity assumption. Recently, modelling advances allow to be agnostic to the type of continuous distribution to be modelled, granting regression the flexibility of classification models. While past studies stress the benefit of such flexible regression models in terms of performance, here we study the effect of the model choice on uncertainty estimation. We highlight that under model misspecification, aleatoric uncertainty is not properly captured, and that a Bayesian treatment of a misspecified model leads to unreliable epistemic uncertainty estimates. Overall, our study provides an overview on how modelling choices in regression may influence uncertainty estimation and thus any downstream decision making process.
SCIENCE
arxiv.org

Topology optimization for the design of porous electrodes

Porous electrodes are an integral part of many electrochemical devices since they have high porosity to maximize electrochemical transport and high surface area to maximize activity. Traditional porous electrode materials are typically homogeneous, stochastic collections of small scale particles and offer few opportunities to engineer higher performance. Fortunately, recent breakthroughs in advanced and additive manufacturing are yielding new methods to structure and pattern porous electrodes across length scales. These architected electrodes are emerging as a promising new technology to continue to drive improvement; however, it is still unclear which structures to employ and few tools are available to guide their design. In this work we address this gap by applying topology optimization to the design of porous electrodes. We demonstrate our framework on two applications: a porous electrode driving a steady Faradaic reaction and a transiently operated electrode in a supercapacitor. We present computationally designed electrodes that minimize energy losses in a half-cell. For low conductivity materials, the optimization algorithm creates electrode designs with a hierarchy of length scales. Further, the designed electrodes are found to outperform undesigned, homogeneous electrodes. Finally, we present three-dimensional porous electrode designs. We thus establish a topology optimization framework for designing porous electrodes.
MATHEMATICS
arxiv.org

Multi-task manifold learning for small sample size datasets

In this study, we develop a method for multi-task manifold learning. The method aims to improve the performance of manifold learning for multiple tasks, particularly when each task has a small number of samples. Furthermore, the method also aims to generate new samples for new tasks, in addition to new samples for existing tasks. In the proposed method, we use two different types of information transfer: instance transfer and model transfer. For instance transfer, datasets are merged among similar tasks, whereas for model transfer, the manifold models are averaged among similar tasks. For this purpose, the proposed method consists of a set of generative manifold models corresponding to the tasks, which are integrated into a general model of a fiber bundle. We applied the proposed method to artificial datasets and face image sets, and the results showed that the method was able to estimate the manifolds, even for a tiny number of samples.
COMPUTERS
arxiv.org

Stochastic Processes Under Linear Differential Constraints : Application to Gaussian Process Regression for the 3 Dimensional Free Space Wave Equation

Let $P$ be a linear differential operator over $\mathcal{D} \subset \mathbb{R}^d$ and $U = (U_x)_{x \in \mathcal{D}}$ a second order stochastic process. In the first part of this article, we prove a new simple necessary and sufficient condition for all the trajectories of $U$ to verify the partial differential equation (PDE) $T(U) = 0$. This condition is formulated in terms of the covariance kernel of $U$. The novelty of this result is that the equality $T(U) = 0$ is understood in the sense of distributions, which is a functional analysis framework particularly adapted to the study of PDEs. This theorem provides precious insights during the second part of this article, which is dedicated to performing "physically informed" machine learning on data that is solution to the homogeneous 3 dimensional free space wave equation. We perform Gaussian Process Regression (GPR) on this data, which is a kernel based machine learning technique. To do so, we model the solution of this PDE as a trajectory drawn from a well-chosen Gaussian process (GP). We obtain explicit formulas for the covariance kernel of the corresponding stochastic process; this kernel can then be used for GPR. We explore two particular cases : the radial symmetry and the point source. In the case of radial symmetry, we derive "fast to compute" GPR formulas; in the case of the point source, we show a direct link between GPR and the classical triangulation method for point source localization used e.g. in GPS systems. We also show that this use of GPR can be interpreted as a new answer to the ill-posed inverse problem of reconstructing initial conditions for the wave equation with finite dimensional data, and also provides a way of estimating physical parameters from this data as in [Raissi et al,2017]. We finish by showcasing this physically informed GPR on a number of practical examples.
MATHEMATICS
arxiv.org

Effective acetylene length dependence of the elastic properties of different kinds of graphynes

Graphyne is a planar network of connected carbon chains, each formed by $n$ acetylene linkages. Uncountable ways to make these connections lead to uncountable structural graphyne families (GFs). As the synthesis of graphynes with $n > 1$ has been reported in literature, it is of interest to find out how their physical properties depend on $n$ for each possible GF. Although literature already present specific models to describe the dependence on $n$ of the elastic properties of specific GFs, there is not yet enough amount of data for the physical properties of different graphynes with different values of $n$. Based on fully atomistic molecular dynamics simulations, the Young's modulus, shear modulus, linear compressibility and Poisson's ratio of 10 graphyne members of 7 different GFs are calculated. A simple elastic model consisting of a serial combination of $n$ springs is proposed to describe the dependence on $n$ of the elastic properties of these 7 GFs. We show that except for the Poisson's ratio, this simple unique elastic model is able to numerically describe, with good precision, the Young's modulus, shear modulus and linear compressibility of all different graphynes, including anisotropy and negative values of linear compressibility of some GFs.
CHEMISTRY
arxiv.org

Trimming Stability Selection increases variable selection robustness

Contamination can severely distort an estimator unless the estimation procedure is suitably robust. This is a well-known issue and has been addressed in Robust Statistics, however, the relation of contamination and distorted variable selection has been rarely considered in literature. As for variable selection, many methods for sparse model selection have been proposed, including Stability Selection which is a meta-algorithm based on some variable selection algorithm in order to immunize against particular data configurations. We introduce the variable selection breakdown point that quantifies the number of cases resp. cells that have to be contaminated in order to let no relevant variable be detected. We show that particular outlier configurations can completely mislead model selection and argue why even cell-wise robust methods cannot fix this problem. We combine the variable selection breakdown point with resampling, resulting in the Stability Selection breakdown point that quantifies the robustness of Stability Selection. We propose a trimmed Stability Selection which only aggregates the models with the lowest in-sample losses so that, heuristically, models computed on heavily contaminated resamples should be trimmed away. We provide a short simulation study that reveals both the potential of our approach as well as the fragility of variable selection, even for an extremely small cell-wise contamination rate.
SCIENCE
arxiv.org

Nucleation in Sessile Saline Microdroplets: Induction Time Measurement via Deliquescence-Recrystallization Cycling

Ruel Cedeno, Romain Grossier, Mehdi Lagaize (CINaM), David Nerini (MIO), Nadine Candoni (AMU), A. E. Flood, Stéphane Veesler (CINaM) Induction time, a measure of how long one will wait for nucleation to occur, is an important parameter in quantifying nucleation kinetics and its underlying mechanisms. Due to the stochastic nature of nucleation, efficient methods for measuring large number of independent induction times are needed to ensure statistical reproducibility. In this work, we present a novel approach for measuring and analyzing induction times in sessile arrays of microdroplets via deliquescence/recrystallization cycling. With the help of a recently developed image analysis protocol, we show that the interfering diffusion-mediated interactions between microdroplets can be eliminated by controlling the relative humidity, thereby ensuring independent nucleation events. Moreover, possible influence of heterogeneities, impurities, and memory effect appear negligible as suggested by our 2-cycle experiment. Further statistical analysis (k-sample Anderson-Darling test) reveals that upon identifying possible outliers, the dimensionless induction times obtained from different datasets (microdroplet lines) obey the same distribution and thus can be pooled together to form a much larger dataset. The pooled dataset showed an excellent fit with the Weibull function, giving a mean supersaturation at nucleation of 1.61 and 1.85 for the 60pL and 4pL microdroplet respectively. This confirms the effect of confinement where smaller systems require higher supersaturations to nucleate. Both the experimental method and the data-treatment procedure presented herein offer promising routes in the study of fundamental aspects of nucleation kinetics, particularly confinement effects, and are adaptable to other salts, pharmaceuticals, or biological crystals of interest.
SCIENCE
arxiv.org

A Geometric Approach to Optimal Control of Hybrid and Impulsive Systems

Hybrid dynamical systems are systems which undergo both continuous and discrete transitions. The Bolza problem from optimal control theory is applied to these systems and a hybrid version of Pontryagin's maximum principle is presented. This hybrid maximum principle is presented to emphasize its geometric nature which makes its study amenable to the tools of geometric mechanics and symplectic geometry. One explicit benefit of this geometric approach is that Zeno behavior can be strongly controlled for "generic" control problems. Moreover, when the underlying control system is a mechanical impact system, additional structure is present which can be exploited and is thus explored. Multiple examples are presented for both mechanical and non-mechanical systems.
MATHEMATICS
arxiv.org

Empirically estimating the distribution of the loudest candidate from a gravitational-wave search

Searches for gravitational-wave signals are often based on maximizing a detection statistic over a bank of waveform templates, covering a given parameter space with a variable level of correlation. Results are often evaluated using a noise-hypothesis test, where the background is characterized by the sampling distribution of the loudest template. In the context of continuous gravitational-wave searches, properly describing said distribution is an open problem: current approaches focus on a particular detection statistic and neglect template-bank correlations. We introduce a new approach using extreme value theory to describe the distribution of the loudest template's detection statistic in an arbitrary template bank. Our new proposal automatically generalizes to a wider class of detection statistics, including (but not limited to) line-robust statistics and transient continuous-wave signal hypotheses, and improves the estimation of the expected maximum detection statistic at a negligible computing cost. The performance of our proposal is demonstrated on simulated data as well as by applying it to different kinds of (transient) continuous-wave searches using O2 Advanced LIGO data. We release an accompanying Python software package, distromax, implementing our new developments.
PHYSICS
arxiv.org

Hyperfine Interaction in a MoS$_2$ Quantum Dot: Decoherence of a Spin-Valley Qubit

A successful and promising device for the physical implementation of electron spin-valley based qubits is the Transition Metal Dichalcogenide monolayer (TMD-ML) semiconductor quantum dot. The electron spin in TMD-ML semiconductor quantum dots can be isolated and controlled with high accuracy, but it still suffers from decoherence due to the unavoidable coupling with the surrounding environment, such as nuclear spin environments. A common tool to investigate systems like the one considered in this work is the density matrix formalism by presenting an exact master equation for a central spin (spin-qubit) system in a time-dependent and coupled to a nuclear spin bath in terms of hyperfine interaction. The master equation provides a unified description of the dynamics of the central spin. Analyzing this in more detail, we calculate fidelity loss due to the Overhauser field from hyperfine interaction in a wide range number of nuclear spins $\mathcal{N}$.
PHYSICS
arxiv.org

A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence

Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences, which plays an important part in steering generation interactively. However, editing subsequences mainly involves randomly resampling subsequences from a possible generation space. We propose a contextual latent space model (CLSM) in order for users to be able to explore subsequence generation with a sense of direction in the generation space, e.g., interpolation, as well as exploring variations -- semantically similar possible subsequences. A context-informed prior and decoder constitute the generative model of CLSM, and a context position-informed encoder is the inference model. In experiments, we use a monophonic symbolic music dataset, demonstrating that our contextual latent space is smoother in interpolation than baselines, and the quality of generated samples is superior to baseline models. The generation examples are available online.
COMPUTERS

