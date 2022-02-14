ContributorsPublishersAdvertisers
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences

By Aadirupa Saha, Pierre Gaillard
 2 days ago

We study the problem of $K$-armed dueling bandit for both stochastic and adversarial environments, where the goal of the learner is to aggregate information through relative preferences of pair of decisions points queried in an online sequential manner. We first propose a novel reduction from any (general) dueling bandits to multi-armed...

10 Best Online Courses to Learn Oracle and PL/SQL for Beginners

PL/SQL is a powerful programming language and also an in-demand skill given the popularity of the Oracle database. PL stands for a procedural language and is considered like any other language having the condition statement and loops. The good thing is the [SQL commands] are included in this language and that is what makes this programming language different than the other languages and also provides a comprehensive solution to work with the oracle database. These are the collection of the best online courses you to learn Oracle database and PL/QL.
CODING & PROGRAMMING
Non-Stationary Dueling Bandits

We study the non-stationary dueling bandits problem with $K$ arms, where the time horizon $T$ consists of $M$ stationary segments, each of which is associated with its own preference matrix. The learner repeatedly selects a pair of arms and observes a binary preference between them as feedback. To minimize the accumulated regret, the learner needs to pick the Condorcet winner of each stationary segment as often as possible, despite preference matrices and segment lengths being unknown. We propose the $\mathrm{Beat\, the\, Winner\, Reset}$ algorithm and prove a bound on its expected binary weak regret in the stationary case, which tightens the bound of current state-of-art algorithms. We also show a regret bound for the non-stationary case, without requiring knowledge of $M$ or $T$. We further propose and analyze two meta-algorithms, $\mathrm{DETECT}$ for weak regret and $\mathrm{Monitored\, Dueling\, Bandits}$ for strong regret, both based on a detection-window approach that can incorporate any dueling bandit algorithm as a black-box algorithm. Finally, we prove a worst-case lower bound for expected weak regret in the non-stationary case.
On learning Whittle index policy for restless bandits with scalable regret

Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies based on data when the system model is unknown. However, the cumulative regret of most RL algorithms scales as $\tilde O(\mathsf{S} \sqrt{\mathsf{A} T})$, where $\mathsf{S}$ is the size of the state space, $\mathsf{A}$ is the size of the action space, $T$ is the horizon, and the $\tilde{O}(\cdot)$ notation hides logarithmic terms. Due to the linear dependence on the size of the state space, these regret bounds are prohibitively large for resource allocation and scheduling problems. In this paper, we present a model-based RL algorithm for such problem which has scalable regret. In particular, we consider a restless bandit model, and propose a Thompson-sampling based learning algorithm which is tuned to the underlying structure of the model. We present two characterizations of the regret of the proposed algorithm with respect to the Whittle index policy. First, we show that for a restless bandit with $n$ arms and at most $m$ activations at each time, the regret scales either as $\tilde{O}(mn\sqrt{T})$ or $\tilde{O}(n^2 \sqrt{T})$ depending on the reward model. Second, under an additional technical assumption, we show that the regret scales as $\tilde{O}(n^{1.5} \sqrt{T})$. We present numerical examples to illustrate the salient features of the algorithm.
MARKETS
Communication Efficient Federated Learning for Generalized Linear Bandits

Contextual bandit algorithms have been recently studied under the federated learning setting to satisfy the demand of keeping data decentralized and pushing the learning of bandit models to the client side. But limited by the required communication efficiency, existing solutions are restricted to linear models to exploit their closed-form solutions for parameter estimation. Such a restricted model choice greatly hampers these algorithms' practical utility. In this paper, we take the first step to addressing this challenge by studying generalized linear bandit models under a federated learning setting. We propose a communication-efficient solution framework that employs online regression for local update and offline regression for global update. We rigorously proved that, though the setting is more general and challenging, our algorithm can attain sub-linear rate in both regret and communication cost, which is also validated by our extensive empirical evaluations.
TECHNOLOGY
#Bandits#Dueling#Online Learning#Condorcet#Lg#Machine Learning
Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based control policies for finite action spaces. While the problem is well-studied for bandits with perfectly observed context vectors, little is known about the case of imperfectly observed contexts. For this setting, existing approaches are inapplicable and new conceptual and technical frameworks are required. We present an implementable posterior sampling algorithm for bandits with imperfect context observations and study its performance for learning optimal decisions. The provided numerical results relate the performance of the algorithm to different quantities of interest including the number of arms, dimensions, observation matrices, posterior rescaling factors, and signal-to-noise ratios. In general, the proposed algorithm exposes efficiency in learning from the noisy imperfect observations and taking actions accordingly. Enlightening understandings the analyses provide as well as interesting future directions it points to, are discussed as well.
COMPUTERS
CAUSPref: Causal Preference Learning for Out-of-Distribution Recommendation

In spite of the tremendous development of recommender system owing to the progressive capability of machine learning recently, the current recommender system is still vulnerable to the distribution shift of users and items in realistic scenarios, leading to the sharp decline of performance in testing environments. It is even more severe in many common applications where only the implicit feedback from sparse data is available. Hence, it is crucial to promote the performance stability of recommendation method in different environments. In this work, we first make a thorough analysis of implicit recommendation problem from the viewpoint of out-of-distribution (OOD) generalization. Then under the guidance of our theoretical analysis, we propose to incorporate the recommendation-specific DAG learner into a novel causal preference-based recommendation framework named CAUSPref, mainly consisting of causal learning of invariant user preference and anti-preference negative sampling to deal with implicit feedback. Extensive experimental results from real-world datasets clearly demonstrate that our approach surpasses the benchmark models significantly under types of out-of-distribution settings, and show its impressive interpretability.
COMPUTERS
Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

We consider the regret minimization task in a dueling bandits problem with context information. In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information. We assume that the feedback process is determined by a linear stochastic transitivity model with contextualized utilities (CoLST), and the learner's task is to include the best arm (with highest latent context-dependent utility) in the duel. We propose a computationally efficient algorithm, $\texttt{CoLSTIM}$, which makes its choice based on imitating the feedback process using perturbed context-dependent utility estimates of the underlying CoLST model. If each arm is associated with a $d$-dimensional feature vector, we show that $\texttt{CoLSTIM}$ achieves a regret of order $\tilde O( \sqrt{dT})$ after $T$ learning rounds. Additionally, we also establish the optimality of $\texttt{CoLSTIM}$ by showing a lower bound for the weak regret that refines the existing average regret analysis. Our experiments demonstrate its superiority over state-of-art algorithms for special cases of CoLST models.
SCIENCE
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset collected a priori. When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving. We propose a pessimism-based algorithm, dubbed as pessimistic minimax value iteration (PMVI), which overcomes the distributional shift by constructing pessimistic estimates of the value functions for both players and outputs a policy pair by solving NEs based on the two value functions. Furthermore, we establish a data-dependent upper bound on the suboptimality which recovers a sublinear rate without the assumption on uniform coverage of the dataset. We also prove an information-theoretical lower bound, which suggests that the data-dependent term in the upper bound is intrinsic. Our theoretical results also highlight a notion of "relative uncertainty", which characterizes the necessary and sufficient condition for achieving sample efficiency in offline MGs. To the best of our knowledge, we provide the first nearly minimax optimal result for offline MGs with function approximation.
COMPUTERS
Technology
Video Games
Between Stochastic and Adversarial Online Convex Optimization: Improved Regret Bounds via Smoothness

Stochastic and adversarial data are two widely studied settings in online learning. But many optimization tasks are neither i.i.d. nor fully adversarial, which makes it of fundamental interest to get a better theoretical understanding of the world between these extremes. In this work we establish novel regret bounds for online convex optimization in a setting that interpolates between stochastic i.i.d. and fully adversarial losses. By exploiting smoothness of the expected losses, these bounds replace a dependence on the maximum gradient length by the variance of the gradients, which was previously known only for linear losses. In addition, they weaken the i.i.d. assumption by allowing adversarially poisoned rounds or shifts in the data distribution. To accomplish this goal, we introduce two key quantities associated with the loss sequence, that we call the cumulative stochastic variance and the adversarial variation. Our upper bounds are attained by instances of optimistic follow the regularized leader, and we design adaptive learning rates that automatically adapt to the cumulative stochastic variance and adversarial variation. In the fully i.i.d. case, our bounds match the rates one would expect from results in stochastic acceleration, and in the fully adversarial case they gracefully deteriorate to match the minimax regret. We further provide lower bounds showing that our regret upper bounds are tight for all intermediate regimes for the cumulative stochastic variance and the adversarial variation.
CODING & PROGRAMMING
Information-Theoretic Analysis of Minimax Excess Risk

Two main concepts studied in machine learning theory are generalization gap (difference between train and test error) and excess risk (difference between test error and the minimum possible error). While information-theoretic tools have been used extensively to study the generalization gap of learning algorithms, the information-theoretic nature of excess risk has not yet been fully investigated. In this paper, some steps are taken toward this goal. We consider the frequentist problem of minimax excess risk as a zero-sum game between algorithm designer and the world. Then, we argue that it is desirable to modify this game in a way that the order of play can be swapped. We prove that, under some regularity conditions, if the world and designer can play randomly the duality gap is zero and the order of play can be changed. In this case, a Bayesian problem surfaces in the dual representation. This makes it possible to utilize recent information-theoretic results on minimum excess risk in Bayesian learning to provide bounds on the minimax excess risk. We demonstrate the applicability of the results by providing information theoretic insight on two important classes of problems: classification when the hypothesis space has finite VC-dimension, and regularized least squares.
COMPUTERS
The Best Way to Learn Online? Be a Lurker

Lately I have been trying to get through the UN Intergovernmental Panel on Climate Change's big report, the one that came out late last year, called “Climate Change 2021: The Physical Science Basis.” It's a challenge because (a) I must, um, learn as I go, and (b) the PDF is nearly 4,000 pages of aggregated, footnoted, illustrated scientific consensus about weather. That's too much consensus; everyone agrees. Yet obviously they—the hundreds of IPCC-affiliated scientists who compiled this behemoth—want us to read it, right? The thing exists, so I should at least try. Plus it's free to download. I love a bargain.
TWITTER
Measuring frequency and period separations in red-giant stars using machine learning

Siddharth Dhanpal, Othman Benomar, Shravan Hanasoge, Abhisek Kundu, Dattaraj Dhuri, Dipankar Das, Bharat Kaul. Asteroseismology is used to infer the interior physics of stars. The \textit{Kepler} and TESS space missions have provided a vast data set of red-giant light curves, which may be used for asteroseismic analysis. These data sets are expected to significantly grow with future missions such as \textit{PLATO}, and efficient methods are therefore required to analyze these data rapidly. Here, we describe a machine learning algorithm that identifies red giants from the raw oscillation spectra and captures \textit{p} and \textit{mixed} mode parameters from the red-giant power spectra. We report algorithmic inferences for large frequency separation ($\Delta \nu$), frequency at maximum amplitude ($\nu_{max}$), and period separation ($\Delta \Pi$) for an ensemble of stars. In addition, we have discovered $\sim$25 new probable red giants among 151,000 \textit{Kepler} long-cadence stellar-oscillation spectra analyzed by the method, among which four are binary candidates which appear to possess red-giant counterparts. To validate the results of this method, we selected $\sim$ 3,000 \textit{Kepler} stars, at various evolutionary stages ranging from subgiants to red clumps, and compare inferences of $\Delta \nu$, $\Delta \Pi$, and $\nu_{max}$ with estimates obtained using other techniques. The power of the machine-learning algorithm lies in its speed: it is able to accurately extract seismic parameters from 1,000 spectra in $\sim$5 seconds on a modern computer (single core of the Intel Xeon Platinum 8280 CPU).
ASTRONOMY
Exact Penalty Algorithm of Strong Convertible Nonconvex Optimization

This paper defines a strong convertible nonconvex(SCN) function for solving the unconstrained optimization problems with the nonconvex or nonsmooth(nondifferentiable) function. First, many examples of SCN function are given, where the SCN functions are nonconvex or nonsmooth. Second, the operational properties of the SCN functions are proved, including addition, multiplication, compound operations and so on. Third, the SCN forms of some special functions common in machine learning and engineering applications are presented respectively where these SCN function optimization problems can be transformed into minmax problems with a convex and concave objective function. Fourth,a minmax optimization problem of SCN function and its penalty function are defined. The optimization condition,exactness and stability of the minmax optimization problem are proved. Finally, an algorithm of penalty function to solve the minmax optimization problem and its convergence are given. This paper provides an efficient technique for solving unconstrained nonconvex or nonsmooth(nondifferentiable) optimization problems to avoid using subdifferentiation or smoothing techniques.
MATHEMATICS
TRADITIONAL VERSUS ONLINE LEARNING

Ever since the pandemic arrived in late 2019, countless schools across the country have been greatly affected by it. Graduations were postponed, sports were canceled, and other events hosted by schools were gone. MSUM was not an exception to the consequences of the pandemic. During that time, the college eliminated a list of programs due to financial reasons. This also resulted in positions, possibly associated with those eliminated programs, being cut. As of now, none of the casualties are even rumored to return to the school’s overall list of programs.
COLLEGES
A Statistical Learning View of Simple Kriging

In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
MATHEMATICS
Identifying strongly correlated groups of sections in a large motorway network

In a motorway network, correlations between the different links, i.e. between the parts of (different) motorways, are of considerable interest. Knowledge of fluxes and velocities on individual motorways is not sufficient, rather, their correlations determine or reflect, respectively, the functionality of and the dynamics on the network as a whole. These correlations are time dependent as the dynamics on the network is highly non-stationary, as it strongly varies during the day and over the week. Correlations are indispensable to detect risks of failure in a traffic network. Discovery of alternative routes less correlated with the vulnerable ones helps to make the traffic network robust and to avoid a collapse. Hence, the identification of, especially, groups of strongly correlated road sections is needed. To this end, we employ an optimized $k$-means clustering method. A major ingredient is the spectral information of certain correlation matrices in which the leading collective motion of the network has been removed. We identify strongly correlated groups of sections in the large motorway network of North Rhine-Westphalia (NRW), Germany. The groups classify the motorway sections in terms of spectral and geographic features as well as of traffic phases during different time periods. The representation and visualization of the groups on the real topology, i.e. on the road map, provides new results on the dynamics on the motorway network. Our approach is very general and can also be applied to other correlated complex systems.
TRAFFIC
New Penalized Stochastic Gradient Methods for Linearly Constrained Strongly Convex Optimization

For minimizing a strongly convex objective function subject to linear inequality constraints, we consider a penalty approach that allows one to utilize stochastic methods for problems with a large number of constraints and/or objective function terms. We provide upper bounds on the distance between the solutions to the original constrained problem and the penalty reformulations, guaranteeing the convergence of the proposed approach. We give a nested accelerated stochastic gradient method and propose a novel way for updating the smoothness parameter of the penalty function and the step-size. The proposed algorithm requires at most $\tilde O(1/\sqrt{\epsilon})$ expected stochastic gradient iterations to produce a solution within an expected distance of $\epsilon$ to the optimal solution of the original problem, which is the best complexity for this problem class to the best of our knowledge. We also show how to query an approximate dual solution after stochastically solving the penalty reformulations, leading to results on the convergence of the duality gap. Moreover, the nested structure of the algorithm and upper bounds on the distance to the optimal solutions allows one to safely eliminate constraints that are inactive at an optimal solution throughout the algorithm, which leads to improved complexity results. Finally, we present computational results that demonstrate the effectiveness and robustness of our algorithm.
SCIENCE
Towards machine learning for microscopic mechanisms: a formula search for crystal structure stability based on atomic properties

Machine Learning (ML) techniques are revolutionizing the way to perform efficient materials modeling. Nevertheless, not all the ML approaches allow for the understanding of microscopic mechanisms at play in different phenomena. To address the latter aspect, we propose a combinatorial machine-learning approach to obtain physical formulas based on simple and easily-accessible ingredients, such as atomic properties. The latter are used to build materials features that are finally employed, through Linear Regression, to predict the energetic stability of semiconducting binary compounds with respect to zincblende and rocksalt crystal structures. The adopted models are trained using dataset built from first-principles calculations. Our results show that already one-dimensional (1D) formulas well describe the energetics; a simple grid-search optimization of the automatically-obtained 1D-formulas enhances the prediction performances at a very small computational cost. In addition, our approach allows to highlight the role of the different atomic properties involved in the formulas. The computed formulas clearly indicate that "spatial" atomic properties (i.e. radii indicating maximum probability densities for $s,p,d$ electronic shells) drive the stabilization of one crystal structure with respect to the other, suggesting the major relevance of the radius associated to the $p$-shell of the cation species.
COMPUTERS
Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods

Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent algorithms for solving min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. The success of the method led to several advanced extensions of the classical SGDA, including variants with arbitrary sampling, variance reduction, coordinate randomization, and distributed variants with compression, which were extensively studied in the literature, especially during the last few years. In this paper, we propose a unified convergence analysis that covers a large variety of stochastic gradient descent-ascent methods, which so far have required different intuitions, have different applications and have been developed separately in various communities. A key to our unified framework is a parametric assumption on the stochastic estimates. Via our general theoretical framework, we either recover the sharpest known rates for the known special cases or tighten them. Moreover, to illustrate the flexibility of our approach we develop several new variants of SGDA such as a new variance-reduced method (L-SVRGDA), new distributed methods with compression (QSGDA, DIANA-SGDA, VR-DIANA-SGDA), and a new method with coordinate randomization (SEGA-SGDA). Although variants of the new methods are known for solving minimization problems, they were never considered or analyzed for solving min-max problems and VIPs. We also demonstrate the most important properties of the new methods through extensive numerical experiments.
MATHEMATICS
How to learn a language online

Last year, I decided to start learning Korean. It was entirely on a whim — I don’t live in Korea and have no reason I’d ever need to go there. Nonetheless, it’s been an incredibly rewarding experience, and I’ve gotten to a point where I can speak, read, and write comfortably much faster than I ever thought I could.
CELL PHONES

