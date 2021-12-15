ContributorsPublishersAdvertisers
Finite-Sample Analysis of Decentralized Q-Learning for Stochastic Games

By Zuguang Gao, Qianqian Ma, Tamer Başar, John R. Birge
arxiv.org
 4 days ago

Learning in stochastic games is arguably the most standard and fundamental setting in multi-agent reinforcement learning (MARL). In this paper, we consider decentralized MARL in stochastic games in the non-asymptotic regime. In particular, we establish the finite-sample complexity of fully decentralized Q-learning algorithms in a significant class...

arxiv.org

arxiv.org

Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework

As a seminal tool in self-supervised representation learning, contrastive learning has gained unprecedented attention in recent years. In essence, contrastive learning aims to leverage pairs of positive and negative samples for representation learning, which relates to exploiting neighborhood information in a feature space. By investigating the connection between contrastive learning and neighborhood component analysis (NCA), we provide a novel stochastic nearest neighbor viewpoint of contrastive learning and subsequently propose a series of contrastive losses that outperform the existing ones. Under our proposed framework, we show a new methodology to design integrated contrastive losses that could simultaneously achieve good accuracy and robustness on downstream tasks. With the integrated framework, we achieve up to 6\% improvement on the standard accuracy and 17\% improvement on the adversarial accuracy.
arxiv.org

Robust Active Learning: Sample-Efficient Training of Robust Deep Learning Models

Active learning is an established technique to reduce the labeling cost to build high-quality machine learning models. A core component of active learning is the acquisition function that determines which data should be selected to annotate. State-of-the-art acquisition functions -- and more largely, active learning techniques -- have been designed to maximize the clean performance (e.g. accuracy) and have disregarded robustness, an important quality property that has received increasing attention. Active learning, therefore, produces models that are accurate but not robust.
COMPUTERS
arxiv.org

A Risk-Averse Preview-based $Q$-Learning Algorithm: Application to Highway Driving of Autonomous Vehicles

A risk-averse preview-based $Q$-learning planner is presented for navigation of autonomous vehicles. To this end, the multi-lane road ahead of a vehicle is represented by a finite-state non-stationary Markov decision process (MDP). A risk assessment unit module is then presented that leverages the preview information provided by sensors along with a stochastic reachability module to assign reward values to the MDP states and update them as scenarios develop. A sampling-based risk-averse preview-based $Q$-learning algorithm is finally developed that generates samples using the preview information and reward function to learn risk-averse optimal planning strategies without actual interaction with the environment. The risk factor is imposed on the objective function to avoid fluctuation of the $Q$ values, which can jeopardize the vehicle's safety and/or performance. The overall hybrid automaton model of the system is leveraged to develop a feasibility check unit module that detects unfeasible plans and enables the planner system to proactively react to the changes of the environment. Theoretical results are provided to bound the number of samples required to guarantee $\epsilon$-optimal planning with a high probability. Finally, to verify the efficiency of the presented algorithm, its implementation on highway driving of an autonomous vehicle in a varying traffic density is considered.
arxiv.org

Hybrid Data-driven Framework for Shale Gas Production Performance Analysis via Game Theory, Machine Learning and Optimization Approaches

A comprehensive and precise analysis of shale gas production performance is crucial for evaluating resource potential, designing field development plan, and making investment decisions. However, quantitative analysis can be challenging because production performance is dominated by a complex interaction among a series of geological and engineering factors. In this study, we propose a hybrid data-driven procedure for analyzing shale gas production performance, which consists of a complete workflow for dominant factor analysis, production forecast, and development optimization. More specifically, game theory and machine learning models are coupled to determine the dominating geological and engineering factors. The Shapley value with definite physical meanings is employed to quantitatively measure the effects of individual factors. A multi-model-fused stacked model is trained for production forecast, on the basis of which derivative-free optimization algorithms are introduced to optimize the development plan. The complete workflow is validated with actual production data collected from the Fuling shale gas field, Sichuan Basin, China. The validation results show that the proposed procedure can draw rigorous conclusions with quantified evidence and thereby provide specific and reliable suggestions for development plan optimization. Comparing with traditional and experience-based approaches, the hybrid data-driven procedure is advanced in terms of both efficiency and accuracy.
ENERGY INDUSTRY
IN THIS ARTICLE
#Stochastic#Q Learning#Marl#Machine Learning#Lg#Multiagent Systems
arxiv.org

JueWu-MC: Playing Minecraft with Sample-efficient Hierarchical Reinforcement Learning

Learning rational behaviors in open-world games like Minecraft remains to be challenging for Reinforcement Learning (RL) research due to the compound challenge of partial observability, high-dimensional visual perception and delayed reward. To address this, we propose JueWu-MC, a sample-efficient hierarchical RL approach equipped with representation learning and imitation learning to deal with perception and exploration. Specifically, our approach includes two levels of hierarchy, where the high-level controller learns a policy to control over options and the low-level workers learn to solve each sub-task. To boost the learning of sub-tasks, we propose a combination of techniques including 1) action-aware representation learning which captures underlying relations between action and representation, 2) discriminator-based self-imitation learning for efficient exploration, and 3) ensemble behavior cloning with consistency filtering for policy robustness. Extensive experiments show that JueWu-MC significantly improves sample efficiency and outperforms a set of baselines by a large margin. Notably, we won the championship of the NeurIPS MineRL 2021 research competition and achieved the highest performance score ever.
VIDEO GAMES
arxiv.org

Adaptive smoothing mini-batch stochastic accelerated gradient method for nonsmooth convex stochastic composite optimization

This paper considers a class of convex constrained nonsmooth convex stochastic composite optimization problems whose objective function is given by the summation of a differentiable convex component, together with a general nonsmooth but convex component. The nonsmooth component is not required to have easily obtainable proximal operator, or have the max structure that the smoothing technique in [Nesterov, 2005] can be used. In order to solve such type problems, we propose an adaptive smoothing mini-batch stochastic accelerated gradient (AdaSMSAG) method, which combines the stochastic approximation method, the Nesterov's accelerated gradient method, and the smoothing methods that allow general smoothing approximations. Convergence of the method is established. Moreover, the order of the worst-case iteration complexity is better than that of the state-of-the-art stochastic approximation methods. Numerical results are provided to illustrate the efficiency of the proposed AdaSMSAG method for a risk management in portfolio optimization and a family of Wasserstein distributionally robust support vector machine problems with real data.
COMPUTERS
arxiv.org

Fixed point ratios for finite primitive groups and applications

Let $G$ be a finite primitive permutation group on a set $\Omega$ and recall that the fixed point ratio of an element $x \in G$, denoted ${\rm fpr}(x)$, is the proportion of points in $\Omega$ fixed by $x$. Fixed point ratios in this setting have been studied for many decades, finding a wide range of applications. In this paper, we are interested in comparing ${\rm fpr}(x)$ with the order of $x$. Our main theorem classifies the triples $(G,\Omega,x)$ as above with the property that $x$ has prime order $r$ and ${\rm fpr}(x) > 1/(r+1)$. There are several applications. Firstly, we extend earlier work of Guralnick and Magaard by determining the primitive permutation groups of degree $m$ with minimal degree at most $2m/3$. Secondly, our main result plays a key role in recent work of the authors (together with Moretó and Navarro) on the commuting probability of $p$-elements in finite groups. Finally, we use our main theorem to investigate the minimal index of a primitive permutation group, which allows us to answer a question of Bhargava.
MATHEMATICS
arxiv.org

Data-driven stochastic model predictive control

We propose a novel data-driven stochastic model predictive control (MPC) algorithm to control linear time-invariant systems with additive stochastic disturbances in the dynamics. The scheme centers around repeated predictions and computations of optimal control inputs based on a non-parametric representation of the space of all possible trajectories, using the fundamental lemma from behavioral systems theory. This representation is based on a single measured input-state-disturbance trajectory generated by persistently exciting inputs and does not require any further identification step. Based on stochastic MPC ideas, we enforce the satisfaction of state constraints with a pre-specified probability level, allowing for a systematic trade-off between control performance and constraint satisfaction. The proposed data-driven stochastic MPC algorithm enables efficient control where robust methods are too conservative, which we demonstrate in a simulation example.
COMPUTERS
arxiv.org

Deblurring via Stochastic Refinement

Image deblurring is an ill-posed problem with multiple plausible solutions for a given input image. However, most existing methods produce a deterministic estimate of the clean image and are trained to minimize pixel-level distortion. These metrics are known to be poorly correlated with human perception, and often lead to unrealistic reconstructions. We present an alternative framework for blind deblurring based on conditional diffusion models. Unlike existing techniques, we train a stochastic sampler that refines the output of a deterministic predictor and is capable of producing a diverse set of plausible reconstructions for a given input. This leads to a significant improvement in perceptual quality over existing state-of-the-art methods across multiple standard benchmarks. Our predict-and-refine approach also enables much more efficient sampling compared to typical diffusion models. Combined with a carefully tuned network architecture and inference procedure, our method is competitive in terms of distortion metrics such as PSNR. These results show clear benefits of our diffusion-based method for deblurring and challenge the widely used strategy of producing a single, deterministic reconstruction.
COMPUTERS
arxiv.org

Gillespie algorithms for stochastic multiagent dynamics in populations and network

Many multiagent dynamics, including various collective dynamics occurring on networks, can be modeled as a stochastic process in which the agents in the system change their state over time in interaction with each other. The Gillespie algorithms are popular algorithms that exactly simulate such stochastic multiagent dynamics when each state change is driven by a discrete event, the dynamics is defined in continuous time, and the stochastic law of event occurrence is governed by independent Poisson processes. In the first main part of this volume, we provide a tutorial on the Gillespie algorithms focusing on simulation of social multiagent dynamics occurring in populations and networks. We do not assume advanced knowledge of mathematics (or computer science or physics). We clarify why one should use the continuous-time models and the Gillespie algorithms in many cases, instead of easier-to-understand discrete-time models. In the remainder of this volume, we review recent extensions of the Gillespie algorithms aiming to add more reality to the model (i.e., non-Poissonian cases) or to speed up the simulations.
CODING & PROGRAMMING
arxiv.org

Automated Side Channel Analysis of Media Software with Manifold Learning

The prosperous development of cloud computing and machine learning as a service has led to the widespread use of media software to process confidential media data. This paper explores an adversary's ability to launch side channel analyses (SCA) against media software to reconstruct confidential media inputs. Recent advances in representation learning and perceptual learning inspired us to consider the reconstruction of media inputs from side channel traces as a cross-modality manifold learning task that can be addressed in a unified manner with an autoencoder framework trained to learn the mapping between media inputs and side channel observations. We further enhance the autoencoder with attention to localize the program points that make the primary contribution to SCA, thus automatically pinpointing information-leakage points in media software. We also propose a novel and highly effective defensive technique called perception blinding that can perturb media inputs with perception masks and mitigate manifold learning-based SCA.
SOFTWARE
arxiv.org

SASG: Sparsification with Adaptive Stochastic Gradients for Communication-efficient Distributed Learning

Stochastic optimization algorithms implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the communication overhead for exchanging information such as stochastic gradients between different workers. Sparse communication with memory and the adaptive aggregation methodology are two successful frameworks among the various techniques proposed to address this issue. In this paper, we creatively exploit the advantages of Sparse communication and Adaptive aggregated Stochastic Gradients to design a communication-efficient distributed algorithm named SASG. Specifically, we first determine the workers that need to communicate based on the adaptive aggregation rule and then sparse this transmitted information. Therefore, our algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed system. We define an auxiliary sequence and give convergence results of the algorithm with the help of Lyapunov function analysis. Experiments on training deep neural networks show that our algorithm can significantly reduce the number of communication rounds and bits compared to the previous methods, with little or no impact on training and testing accuracy.
CODING & PROGRAMMING
arxiv.org

Deep Q-Learning Market Makers in a Multi-Agent Simulated Stock Market

Market makers play a key role in financial markets by providing liquidity. They usually fill order books with buy and sell limit orders in order to provide traders alternative price levels to operate. This paper focuses precisely on the study of these markets makers strategies from an agent-based perspective. In particular, we propose the application of Reinforcement Learning (RL) for the creation of intelligent market markers in simulated stock markets. This research analyzes how RL market maker agents behaves in non-competitive (only one RL market maker learning at the same time) and competitive scenarios (multiple RL market markers learning at the same time), and how they adapt their strategies in a Sim2Real scope with interesting results. Furthermore, it covers the application of policy transfer between different experiments, describing the impact of competing environments on RL agents performance. RL and deep RL techniques are proven as profitable market maker approaches, leading to a better understanding of their behavior in stock markets.
MARKETS
arxiv.org

Sample Average Approximation for Stochastic Optimization with Dependent Data: Performance Guarantees and Tractability

Sample average approximation (SAA), a popular method for tractably solving stochastic optimization problems, enjoys strong asymptotic performance guarantees in settings with independent training samples. However, these guarantees are not known to hold generally with dependent samples, such as in online learning with time series data or distributed computing with Markovian training samples. In this paper, we show that SAA remains tractable when the distribution of unknown parameters is only observable through dependent instances and still enjoys asymptotic consistency and finite sample guarantees. Specifically, we provide a rigorous probability error analysis to derive $1 - \beta$ confidence bounds for the out-of-sample performance of SAA estimators and show that these estimators are asymptotically consistent. We then, using monotone operator theory, study the performance of a class of stochastic first-order algorithms trained on a dependent source of data. We show that approximation error for these algorithms is bounded and concentrates around zero, and establish deviation bounds for iterates when the underlying stochastic process is $\phi$-mixing. The algorithms presented can be used to handle numerically inconvenient loss functions such as the sum of a smooth and non-smooth function or of non-smooth functions with constraints. To illustrate the usefulness of our results, we present several stochastic versions of popular algorithms such as stochastic proximal gradient descent (S-PGD), stochastic relaxed Peaceman--Rachford splitting algorithms (S-rPRS), and numerical experiment.
COMPUTERS
arxiv.org

Stochastic Vertex Cover with Few Queries

We study the minimum vertex cover problem in the following stochastic setting. Let $G$ be an arbitrary given graph, $p \in (0, 1]$ a parameter of the problem, and let $G_p$ be a random subgraph that includes each edge of $G$ independently with probability $p$. We are unaware of the realization $G_p$, but can learn if an edge $e$ exists in $G_p$ by querying it. The goal is to find an approximate minimum vertex cover (MVC) of $G_p$ by querying few edges of $G$ non-adaptively.
COMPUTERS
arxiv.org

A closed-measure approach to stochastic approximation

This paper introduces a new methodolody in order to tackle the issue of the almost sure convergence of stochastic approximation algorithms defined from a differential inclusion. Under the assumption of slowly decaying step-sizes, we establish that the set of essential accumulation points of the iterates belongs to the Birkhoff center associated with the differential inclusion. Unlike previous works, our results do not rely on the notion of asymptotic pseudotrajectories introduced by Benaim-Hofbauer-Sorin, which is the predominant technique to tackle the convergence problem. They follow as a consequence of Young's superposition principle for closed measures. This perspective allows to revisit certain results of Faure and Roth, and bridges the gap between Young's principle and the notion of invariant measure recently introduced by Bolte-Pauwels-Rios-Zertuche. As an other asset, the proposed approach allows to obtain sufficient conditions under which the velocities locally compensate around any essential accumulation point.
MATHEMATICS
arxiv.org

MetaCloth: Learning Unseen Tasks of Dense Fashion Landmark Detection from a Few Samples

Recent advanced methods for fashion landmark detection are mainly driven by training convolutional neural networks on large-scale fashion datasets, which has a large number of annotated landmarks. However, such large-scale annotations are difficult and expensive to obtain in real-world applications, thus models that can generalize well from a small amount of labelled data are desired. We investigate this problem of few-shot fashion landmark detection, where only a few labelled samples are available for an unseen task. This work proposes a novel framework named MetaCloth via meta-learning, which is able to learn unseen tasks of dense fashion landmark detection with only a few annotated samples. Unlike previous meta-learning work that focus on solving "N-way K-shot" tasks, where each task predicts N number of classes by training with K annotated samples for each class (N is fixed for all seen and unseen tasks), a task in MetaCloth detects N different landmarks for different clothing categories using K samples, where N varies across tasks, because different clothing categories usually have various number of landmarks. Therefore, numbers of parameters are various for different seen and unseen tasks in MetaCloth. MetaCloth is carefully designed to dynamically generate different numbers of parameters for different tasks, and learn a generalizable feature extraction network from a few annotated samples with a set of good initialization parameters. Extensive experiments show that MetaCloth outperforms its counterparts by a large margin.
BEAUTY & FASHION
arxiv.org

Sampling rate-corrected analysis of irregularly sampled time series

Tobias Braun, Cinthya N. Fernandez, Deniz Eroglu, Adam Hartland, Sebastian F. M. Breitenbach, Norbert Marwan. The analysis of irregularly sampled time series remains a challenging task requiring methods that account for continuous and abrupt changes of sampling resolution without introducing additional biases. The edit-distance is an effective metric to quantitatively compare time series segments of unequal length by computing the cost of transforming one segment into the other. We show that transformation costs generally exhibit a non-trivial relationship with local sampling rate. If the sampling resolution undergoes strong variations, this effect impedes unbiased comparison between different time episodes. We study the impact of this effect on recurrence quantification analysis, a framework that is well-suited for identifying regime shifts in nonlinear time series. A constrained randomization approach is put forward to correct for the biased recurrence quantification measures. This strategy involves the generation of a novel type of time series and time axis surrogates which we call sampling rate constrained (SRC) surrogates. We demonstrate the effectiveness of the proposed approach with a synthetic example and an irregularly sampled speleothem proxy record from Niue island in the central tropical Pacific. Application of the proposed correction scheme identifies a spurious transition that is solely imposed by an abrupt shift in sampling rate and uncovers periods of reduced seasonal rainfall predictability associated with enhanced ENSO and tropical cyclone activity.
SCIENCE
arxiv.org

A GMRT Narrowband vs. Wideband Analysis of the ACT-CLJ0034.4+0225 Field Selected from the ACTPol Cluster Sample

Sinenhlanhla P. Sikhosana, Kenda Knowles, C.H. Ishwara-Chandra, Matt Hilton, Kavilan Moodley, Neeraj Gupta. Low frequency radio observations of galaxy clusters are a useful probe of the non-thermal intracluster medium (ICM), through observations of diffuse radio emission such as radio halos and relics. Current formation theories cannot fully account for some of the observed properties of this emission. In this study, we focus on the development of interferometric techniques for extracting extended, faint diffuse emissions in the presence of bright, compact sources in wide-field and broadband continuum imaging data. We aim to apply these techniques to the study of radio halos, relics and radio mini-halos using a uniformly selected and complete sample of galaxy clusters selected via the Sunyaev-Zel'dovich (SZ) effect by the Atacama Cosmology Telescope (ACT) project, and its polarimetric extension (ACTPol). We use the upgraded Giant Metrewave Radio Telescope (uGMRT) for targeted radio observations of a sample of 40 clusters. We present an overview of our sample, confirm the detection of a radio halo in ACT-CL J0034.4+0225, and compare the narrowband and wideband analysis results for this cluster. Due to the complexity of the ACT-CL J0034.4+0225 field, we use three pipelines to process the wideband data. We conclude that the experimental SPAM wideband pipeline produces the best results for this particular field. However, due to the severe artefacts in the field, further analysis is required to improve the image quality.
ASTRONOMY
arxiv.org

Deterministic particle flows for constraining stochastic nonlinear systems

Devising optimal interventions for constraining stochastic systems is a challenging endeavour that has to confront the interplay between randomness and nonlinearity. Existing methods for identifying the necessary dynamical adjustments resort either to space discretising solutions of ensuing partial differential equations, or to iterative stochastic path sampling schemes. Yet, both approaches become computationally demanding for increasing system dimension. Here, we propose a generally applicable and practically feasible non-iterative methodology for obtaining optimal dynamical interventions for diffusive nonlinear systems. We estimate the necessary controls from an interacting particle approximation to the logarithmic gradient of two forward probability flows evolved following deterministic particle dynamics. Applied to several biologically inspired models, we show that our method provides the necessary optimal controls in settings with terminal-, transient-, or generalised collective-state constraints and arbitrary system dynamics.
SCIENCE

