ContributorsPublishersAdvertisers
Computers

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

By Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu
arxiv.org
 2 days ago

Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such...

arxiv.org

Comments / 0

Related
arxiv.org

FedSpace: An Efficient Federated Learning Framework at Satellites and Ground Stations

Large-scale deployments of low Earth orbit (LEO) satellites collect massive amount of Earth imageries and sensor data, which can empower machine learning (ML) to address global challenges such as real-time disaster navigation and mitigation. However, it is often infeasible to download all the high-resolution images and train these ML models on the ground because of limited downlink bandwidth, sparse connectivity, and regularization constraints on the imagery resolution. To address these challenges, we leverage Federated Learning (FL), where ground stations and satellites collaboratively train a global ML model without sharing the captured images on the satellites. We show fundamental challenges in applying existing FL algorithms among satellites and ground stations, and we formulate an optimization problem which captures a unique trade-off between staleness and idleness. We propose a novel FL framework, named FedSpace, which dynamically schedules model aggregation based on the deterministic and time-varying connectivity according to satellite orbits. Extensive numerical evaluations based on real-world satellite images and satellite networks show that FedSpace reduces the training time by 1.7 days (38.6%) over the state-of-the-art FL algorithms.
AEROSPACE & DEFENSE
arxiv.org

HECO: Automatic Code Optimizations for Efficient Fully Homomorphic Encryption

In recent years, Fully Homomorphic Encryption (FHE) has undergone several breakthroughs and advancements leading to a leap in performance. Today, performance is no longer a major barrier to adoption. Instead, it is the complexity of developing an efficient FHE application that currently limits deploying FHE in practice and at scale. Several FHE compilers have emerged recently to ease FHE development. However, none of these answer how to automatically transform imperative programs to secure and efficient FHE implementations. This is a fundamental issue that needs to be addressed before we can realistically expect broader use of FHE. Automating these transformations is challenging because the restrictive set of operations in FHE and their non-intuitive performance characteristics require programs to be drastically transformed to achieve efficiency. In addition, existing tools are monolithic and focus on individual optimizations. Therefore, they fail to fully address the needs of end-to-end FHE development. In this paper, we present HECO, a new end-to-end design for FHE compilers that takes high-level imperative programs and emits efficient and secure FHE implementations. In our design, we take a broader view of FHE development, extending the scope of optimizations beyond the cryptographic challenges existing tools focus on.
CODING & PROGRAMMING
PC Gamer

Human reinforced learning could mean 'more truthful and less toxic' AI

AI has been making huge leaps in terms of scientific research, and companies like Nvidia and Meta are continuing to throw more resources towards the technology. But AI learning can have a pretty huge setback when it adopts the prejudices of those who make it. Like all those chatbots that wind up spewing hate speech thanks to their exposure to the criminally online.
SOFTWARE
arxiv.org

Security-Aware Virtual Network Embedding Algorithm based on Reinforcement Learning

Virtual network embedding (VNE) algorithm is always the key problem in network virtualization (NV) technology. At present, the research in this field still has the following problems. The traditional way to solve VNE problem is to use heuristic algorithm. However, this method relies on manual embedding rules, which does not accord with the actual situation of VNE. In addition, as the use of intelligent learning algorithm to solve the problem of VNE has become a trend, this method is gradually outdated. At the same time, there are some security problems in VNE. However, there is no intelligent algorithm to solve the security problem of VNE. For this reason, this paper proposes a security-aware VNE algorithm based on reinforcement learning (RL). In the training phase, we use a policy network as a learning agent and take the extracted attributes of the substrate nodes to form a feature matrix as input. The learning agent is trained in this environment to get the mapping probability of each substrate node. In the test phase, we map nodes according to the mapping probability and use the breadth-first strategy (BFS) to map links. For the security problem, we add security requirements level constraint for each virtual node and security level constraint for each substrate node. Virtual nodes can only be embedded on substrate nodes that are not lower than the level of security requirements. Experimental results show that the proposed algorithm is superior to other typical algorithms in terms of long-term average return, long-term revenue consumption ratio and virtual network request (VNR) acceptance rate.
SOFTWARE
IN THIS ARTICLE
#Reinforcement Learning#Mdp#De Rl#Safe De Rl#Iclr#Lg#Machine Learning
arxiv.org

GADGET: Online Resource Optimization for Scheduling Ring-All-Reduce Learning Jobs

Fueled by advances in distributed deep learning (DDL), recent years have witnessed a rapidly growing demand for resource-intensive distributed/parallel computing to process DDL computing jobs. To resolve network communication bottleneck and load balancing issues in distributed computing, the so-called ``ring-all-reduce'' decentralized architecture has been increasingly adopted to remove the need for dedicated parameter servers. To date, however, there remains a lack of theoretical understanding on how to design resource optimization algorithms for efficiently scheduling ring-all-reduce DDL jobs in computing clusters. This motivates us to fill this gap by proposing a series of new resource scheduling designs for ring-all-reduce DDL jobs. Our contributions in this paper are three-fold: i) We propose a new resource scheduling analytical model for ring-all-reduce deep learning, which covers a wide range of objectives in DDL performance optimization (e.g., excessive training avoidance, energy efficiency, fairness); ii) Based on the proposed performance analytical model, we develop an efficient resource scheduling algorithm called GADGET (greedy ring-all-reduce distributed graph embedding technique), which enjoys a provable strong performance guarantee; iii) We conduct extensive trace-driven experiments to demonstrate the effectiveness of the GADGET approach and its superiority over the state of the art.
TECHNOLOGY
arxiv.org

Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT

The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment. According to the different requirement of end users, these data usually have high heterogeneity and privacy, while most of users are reluctant to expose them to the public view. How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue, such that it has attracted extensive attention from academia and industry. As a new machine learning (ML) paradigm, federated learning (FL) has great advantages in training heterogeneous and private data. This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments. In order to increase the model aggregation rate and reduce communication costs, we apply deep reinforcement learning (DRL) to IIoT equipment selection process, specifically to select those IIoT equipment nodes with accurate models. Therefore, we propose a FL algorithm assisted by DRL, which can take into account the privacy and efficiency of data training of IIoT equipment. By analyzing the data characteristics of IIoT equipments, we use MNIST, fashion MNIST and CIFAR-10 data sets to represent the data generated by IIoT. During the experiment, we employ the deep neural network (DNN) model to train the data, and experimental results show that the accuracy can reach more than 97\%, which corroborates the effectiveness of the proposed algorithm.
SOFTWARE
arxiv.org

Demystify Optimization and Generalization of Over-parameterized PAC-Bayesian Learning

PAC-Bayesian is an analysis framework where the training error can be expressed as the weighted average of the hypotheses in the posterior distribution whilst incorporating the prior knowledge. In addition to being a pure generalization bound analysis tool, PAC-Bayesian bound can also be incorporated into an objective function to train a probabilistic neural network, making them a powerful and relevant framework that can numerically provide a tight generalization bound for supervised learning. For simplicity, we call probabilistic neural network learned using training objectives derived from PAC-Bayesian bounds as {\it PAC-Bayesian learning}. Despite their empirical success, the theoretical analysis of PAC-Bayesian learning for neural networks is rarely explored. This paper proposes a new class of convergence and generalization analysis for PAC-Bayes learning when it is used to train the over-parameterized neural networks by the gradient descent method. For a wide probabilistic neural network, we show that when PAC-Bayes learning is applied, the convergence result corresponds to solving a kernel ridge regression when the probabilistic neural tangent kernel (PNTK) is used as its kernel. Based on this finding, we further characterize the uniform PAC-Bayesian generalization bound which improves over the Rademacher complexity-based bound for non-probabilistic neural network. Finally, drawing the insight from our theoretical results, we propose a proxy measure for efficient hyperparameters selection, which is proven to be time-saving.
CODING & PROGRAMMING
arxiv.org

Algorithms for Efficiently Learning Low-Rank Neural Networks

We study algorithms for learning low-rank neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. First, we present a provably efficient algorithm which learns an optimal low-rank approximation to a single-hidden-layer ReLU network up to additive error $\epsilon$ with probability $\ge 1 - \delta$, given access to noiseless samples with Gaussian marginals in polynomial time and samples. Thus, we provide the first example of an algorithm which can efficiently learn a neural network up to additive error without assuming the ground truth is realizable. To solve this problem, we introduce an efficient SVD-based \textit{Nonlinear Kernel Projection} algorithm for solving a nonlinear low-rank approximation problem over Gaussian space. Inspired by the efficiency of our algorithm, we propose a novel low-rank initialization framework for training low-rank \textit{deep} networks, and prove that for ReLU networks, the gap between our method and existing schemes widens as the desired rank of the approximating weights decreases, or as the dimension of the inputs increases (the latter point holds when network width is superlinear in dimension). Finally, we validate our theory by training ResNets and EfficientNets \citep{he2016deepresidual, tan2019efficientnet} models on ImageNet \citep{ILSVRC15}.
CODING & PROGRAMMING
YOU MAY ALSO LIKE
NewsBreak
Artificial Intelligence
NewsBreak
Technology
NewsBreak
Computers
arxiv.org

Model-Free Reinforcement Learning for Symbolic Automata-encoded Objectives

Reinforcement learning (RL) is a popular approach for robotic path planning in uncertain environments. However, the control policies trained for an RL agent crucially depend on user-defined, state-based reward functions. Poorly designed rewards can lead to policies that do get maximal rewards but fail to satisfy desired task objectives or are unsafe. There are several examples of the use of formal languages such as temporal logics and automata to specify high-level task specifications for robots (in lieu of Markovian rewards). Recent efforts have focused on inferring state-based rewards from formal specifications; here, the goal is to provide (probabilistic) guarantees that the policy learned using RL (with the inferred rewards) satisfies the high-level formal specification. A key drawback of several of these techniques is that the rewards that they infer are sparse: the agent receives positive rewards only upon completion of the task and no rewards otherwise. This naturally leads to poor convergence properties and high variance during RL. In this work, we propose using formal specifications in the form of symbolic automata: these serve as a generalization of both bounded-time temporal logic-based specifications as well as automata. Furthermore, our use of symbolic automata allows us to define non-sparse potential-based rewards which empirically shape the reward surface, leading to better convergence during RL. We also show that our potential-based rewarding strategy still allows us to obtain the policy that maximizes the satisfaction of the given specification.
SOFTWARE
arxiv.org

5G Network on Wings: A Deep Reinforcement Learning Approach to UAV-based Integrated Access and Backhaul

Fast and reliable wireless communication has become a critical demand in human life. When natural disasters strike, providing ubiquitous connectivity becomes challenging by using traditional wireless networks. In this context, unmanned aerial vehicle (UAV) based aerial networks offer a promising alternative for fast, flexible, and reliable wireless communications in mission-critical (MC) scenarios. Due to the unique characteristics such as mobility, flexible deployment, and rapid reconfiguration, drones can readily change location dynamically to provide on-demand communications to users on the ground in emergency scenarios. As a result, the usage of UAV base stations (UAV-BSs) has been considered as an appropriate approach for providing rapid connection in MC scenarios. In this paper, we study how to control a UAV-BS in both static and dynamic environments. We investigate a situation in which a macro BS is destroyed as a result of a natural disaster and a UAV-BS is deployed using integrated access and backhaul (IAB) technology to provide coverage for users in the disaster area. We present a data collection system, signaling procedures and machine learning applications for this use case. A deep reinforcement learning algorithm is developed to jointly optimize the tilt of the access and backhaul antennas of the UAV-BS as well as its three-dimensional placement. Evaluation results show that the proposed algorithm can autonomously navigate and configure the UAV-BS to satisfactorily serve the MC users on the ground.
TECHNOLOGY
arxiv.org

Almost Optimal Proper Learning and Testing Polynomials

We give the first almost optimal polynomial-time proper learning algorithm of Boolean sparse multivariate polynomial under the uniform distribution. For $s$-sparse polynomial over $n$ variables and $\epsilon=1/s^\beta$, $\beta>1$, our algorithm makes $$q_U=\left(\frac{s}{\epsilon}\right)^{\frac{\log \beta}{\beta}+O(\frac{1}{\beta})}+ \tilde O\left(s\right)\left(\log\frac{1}{\epsilon}\right)\log n$$ queries. Notice that our query complexity is sublinear in $1/\epsilon$ and almost linear in $s$. All previous algorithms have query complexity at least quadratic in $s$ and linear in $1/\epsilon$.
MATHEMATICS
arxiv.org

COIL: Constrained Optimization in Learned Latent Space -- Learning Representations for Valid Solutions

Constrained optimization problems can be difficult because their search spaces have properties not conducive to search, e.g., multimodality, discontinuities, or deception. To address such difficulties, considerable research has been performed on creating novel evolutionary algorithms or specialized genetic operators. However, if the representation that defined the search space could be altered such that it only permitted valid solutions that satisfied the constraints, the task of finding the optimal would be made more feasible without any need for specialized optimization algorithms. We propose the use of a Variational Autoencoder to learn such representations. We present Constrained Optimization in Latent Space (COIL), which uses a VAE to generate a learned latent representation from a dataset comprising samples from the valid region of the search space according to a constraint, thus enabling the optimizer to find the objective in the new space defined by the learned representation. We investigate the value of this approach on different constraint types and for different numbers of variables. We show that, compared to an identical GA using a standard representation, COIL with its learned latent representation can satisfy constraints and find solutions with distance to objective up to two orders of magnitude closer.
COMPUTERS
arxiv.org

Offline Reinforcement Learning for Mobile Notifications

Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms. They are interesting recommender systems to machine learning practitioners with more sequential and long-term feedback considerations. Most machine learning applications in notification systems are built around response-prediction models, trying to attribute both short-term impact and long-term impact to a notification decision. However, a user's experience depends on a sequence of notifications and attributing impact to a single notification is not always accurate, if not impossible. In this paper, we argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed. We propose an offline reinforcement learning framework to optimize sequential notification decisions for driving user engagement. We describe a state-marginalized importance sampling policy evaluation approach, which can be used to evaluate the policy offline and tune learning hyperparameters. Through simulations that approximate the notifications ecosystem, we demonstrate the performance and benefits of the offline evaluation approach as a part of the reinforcement learning modeling approach. Finally, we collect data through online exploration in the production system, train an offline Double Deep Q-Network and launch a successful policy online. We also discuss the practical considerations and results obtained by deploying these policies for a large-scale recommendation system use-case.
SOFTWARE
arxiv.org

Mold into a Graph: Efficient Bayesian Optimization over Mixed-Spaces

Real-world optimization problems are generally not just black-box problems, but also involve mixed types of inputs in which discrete and continuous variables coexist. Such mixed-space optimization possesses the primary challenge of modeling complex interactions between the inputs. In this work, we propose a novel yet simple approach that entails exploiting the graph data structure to model the underlying relationship between variables, i.e., variables as nodes and interactions defined by edges. Then, a variational graph autoencoder is used to naturally take the interactions into account. We first provide empirical evidence of the existence of such graph structures and then suggest a joint framework of graph structure learning and latent space optimization to adaptively search for optimal graph connectivity. Experimental results demonstrate that our method shows remarkable performance, exceeding the existing approaches with significant computational efficiency for a number of synthetic and real-world tasks.
COMPUTERS
arxiv.org

Communication Efficient Federated Learning for Generalized Linear Bandits

Contextual bandit algorithms have been recently studied under the federated learning setting to satisfy the demand of keeping data decentralized and pushing the learning of bandit models to the client side. But limited by the required communication efficiency, existing solutions are restricted to linear models to exploit their closed-form solutions for parameter estimation. Such a restricted model choice greatly hampers these algorithms' practical utility. In this paper, we take the first step to addressing this challenge by studying generalized linear bandit models under a federated learning setting. We propose a communication-efficient solution framework that employs online regression for local update and offline regression for global update. We rigorously proved that, though the setting is more general and challenging, our algorithm can attain sub-linear rate in both regret and communication cost, which is also validated by our extensive empirical evaluations.
TECHNOLOGY
arxiv.org

Towards Training Reproducible Deep Learning Models

Reproducibility is an increasing concern in Artificial Intelligence (AI), particularly in the area of Deep Learning (DL). Being able to reproduce DL models is crucial for AI-based systems, as it is closely tied to various tasks like training, testing, debugging, and auditing. However, DL models are challenging to be reproduced due to issues like randomness in the software (e.g., DL algorithms) and non-determinism in the hardware (e.g., GPU). There are various practices to mitigate some of the aforementioned issues. However, many of them are either too intrusive or can only work for a specific usage context. In this paper, we propose a systematic approach to training reproducible DL models. Our approach includes three main parts: (1) a set of general criteria to thoroughly evaluate the reproducibility of DL models for two different domains, (2) a unified framework which leverages a record-and-replay technique to mitigate software-related randomness and a profile-and-patch technique to control hardware-related non-determinism, and (3) a reproducibility guideline which explains the rationales and the mitigation strategies on conducting a reproducible training process for DL models. Case study results show our approach can successfully reproduce six open source and one commercial DL models.
CODING & PROGRAMMING
arxiv.org

Reward is not enough: can we liberate AI from the reinforcement learning paradigm?

I present arguments against the hypothesis put forward by Silver, Singh, Precup, and Sutton ( this https URL ) : reward maximization is not enough to explain many activities associated with natural and artificial intelligence including knowledge, learning, perception, social intelligence, evolution, language, generalisation and imitation. I show such reductio ad lucrum has its intellectual origins in the political economy of Homo economicus and substantially overlaps with the radical version of behaviourism. I show why the reinforcement learning paradigm, despite its demonstrable usefulness in some practical application, is an incomplete framework for intelligence -- natural and artificial. Complexities of intelligent behaviour are not simply second-order complications on top of reward maximisation. This fact has profound implications for the development of practically usable, smart, safe and robust artificially intelligent agents.
COMPUTERS
arxiv.org

Communication Efficient Federated Learning via Ordered ADMM in a Fully Decentralized Setting

The challenge of communication-efficient distributed optimization has attracted attention in recent years. In this paper, a communication efficient algorithm, called ordering-based alternating direction method of multipliers (OADMM) is devised in a general fully decentralized network setting where a worker can only exchange messages with neighbors. Compared to the classical ADMM, a key feature of OADMM is that transmissions are ordered among workers at each iteration such that a worker with the most informative data broadcasts its local variable to neighbors first, and neighbors who have not transmitted yet can update their local variables based on that received transmission. In OADMM, we prohibit workers from transmitting if their current local variables are not sufficiently different from their previously transmitted value. A variant of OADMM, called SOADMM, is proposed where transmissions are ordered but transmissions are never stopped for each node at each iteration. Numerical results demonstrate that given a targeted accuracy, OADMM can significantly reduce the number of communications compared to existing algorithms including ADMM. We also show numerically that SOADMM can accelerate convergence, resulting in communication savings compared to the classical ADMM.
COMPUTERS
arxiv.org

Machine Learning Aided Holistic Handover Optimization for Emerging Networks

In the wake of network densification and multi-band operation in emerging cellular networks, mobility and handover management is becoming a major bottleneck. The problem is further aggravated by the fact that holistic mobility management solutions for different types of handovers, namely inter-frequency and intra-frequency handovers, remain scarce. This paper presents a first mobility management solution that concurrently optimizes inter-frequency related A5 parameters and intra-frequency related A3 parameters. We analyze and optimize five parameters namely A5-time to trigger (TTT), A5-threshold1, A5-threshold2, A3-TTT, and A3-offset to jointly maximize three critical key performance indicators (KPIs): edge user reference signal received power (RSRP), handover success rate (HOSR) and load between frequency bands. In the absence of tractable analytical models due to system level complexity, we leverage machine learning to quantify the KPIs as a function of the mobility parameters. An XGBoost based model has the best performance for edge RSRP and HOSR while random forest outperforms others for load prediction. An analysis of the mobility parameters provides several insights: 1) there exists a strong coupling between A3 and A5 parameters; 2) an optimal set of parameters exists for each KPI; and 3) the optimal parameters vary for different KPIs. We also perform a SHAP based sensitivity to help resolve the parametric conflict between the KPIs. Finally, we formulate a maximization problem, show it is non-convex, and solve it utilizing simulated annealing (SA). Results indicate that ML-based SA-aided solution is more than 14x faster than the brute force approach with a slight loss in optimality.
SOFTWARE

Comments / 0

Community Policy