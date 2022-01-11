ContributorsPublishersAdvertisers
Benchmarking Deep Reinforcement Learning Algorithms for Vision-based Robotics

By Swagat Kumar, Hayden Sampson, Ardhendu Behera
 7 days ago

This paper presents a benchmarking study of some of the state-of-the-art reinforcement learning algorithms used for solving two simulated vision-based robotics problems. The algorithms considered in this study include soft actor-critic (SAC), proximal policy optimization (PPO), interpolated policy gradients (IPG),...

Offline Reinforcement Learning for Road Traffic Control

Traffic signal control is an important problem in urban mobility with a significant potential of economic and environmental impact. While there is a growing interest in Reinforcement Learning (RL) for traffic control, the work so far has focussed on learning through interactions which, in practice, is costly. Instead, real experience data on traffic is available and could be exploited at minimal costs. Recent progress in offline or batch RL has enabled just that. Model-based offline RL methods, in particular, have been shown to generalize to the experience data much better than others. We build a model-based learning framework, A-DAC, which infers a Markov Decision Process (MDP) from dataset with pessimistic costs built in to deal with data uncertainties. The costs are modeled through an adaptive shaping of rewards in the MDP which provides better regularization of data compared to the prior related work. A-DAC is evaluated on a complex signalized roundabout using multiple datasets varying in size and in batch collection policy. The evaluation results show that it is possible to build high performance control policies in a data efficient manner using simplistic batch collection policies.
Machine Learning: Algorithms, Models, and Applications

Jaydip Sen, Sidra Mehtab, Rajdeep Sen, Abhishek Dutta, Pooja Kherwa, Saheel Ahmed, Pranay Berry, Sahil Khurana, Sonali Singh, David W. W Cadotte, David W. Anderson, Kalum J. Ost, Racheal S. Akinbo, Oladunni A. Daramola, Bongs Lainjo. Recent times are witnessing rapid development in machine learning algorithm systems, especially in reinforcement...
A Light in the Dark: Deep Learning Practices for Industrial Computer Vision

In recent years, large pre-trained deep neural networks (DNNs) have revolutionized the field of computer vision (CV). Although these DNNs have been shown to be very well suited for general image recognition tasks, application in industry is often precluded for three reasons: 1) large pre-trained DNNs are built on hundreds of millions of parameters, making deployment on many devices impossible, 2) the underlying dataset for pre-training consists of general objects, while industrial cases often consist of very specific objects, such as structures on solar wafers, 3) potentially biased pre-trained DNNs raise legal issues for companies. As a remedy, we study neural networks for CV that we train from scratch. For this purpose, we use a real-world case from a solar wafer manufacturer. We find that our neural networks achieve similar performances as pre-trained DNNs, even though they consist of far fewer parameters and do not rely on third-party datasets.
Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. In the search for more sample-efficient algorithms, a promising direction is to leverage as much external off-policy data as possible. One staple of this data-driven approach is to learn from expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes, encouraging expert imitation and self-imitation. First, we give a reward bonus to the transitions coming from demonstrations to encourage the agent to match the demonstrated behaviour. Then, upon collecting a successful episode, we relabel its transitions with the same bonus before adding them to the replay buffer, encouraging the agent to also match its previous successes. Our experiments focus on manipulation robotics, specifically on three tasks for a 6 degrees-of-freedom robotic arm in simulation. We show that our method based on reward relabeling improves the performance of the base algorithm (SAC and DDPG) on these tasks, even in the absence of demonstrations. Furthermore, integrating into our method two improvements from previous works allows our approach to outperform all baselines.
#Robotics#Reinforcement Learning#Benchmarking#Algorithm#Ipg#Ro
Deep Learning Based Classification System For Recognizing Local Spinach

Mirajul Islam, Nushrat Jahan Ria, Jannatul Ferdous Ani, Abu Kaisar Mohammad Masum, Sheikh Abujar, Syed Akhter Hossain. A deep learning model gives an incredible result for image processing by studying from the trained dataset. Spinach is a leaf vegetable that contains vitamins and nutrients. In our research, a Deep learning method has been used that can automatically identify spinach and this method has a dataset of a total of five species of spinach that contains 3785 images. Four Convolutional Neural Network (CNN) models were used to classify our spinach. These models give more accurate results for image classification. Before applying these models there is some preprocessing of the image data. For the preprocessing of data, some methods need to happen. Those are RGB conversion, filtering, resize & rescaling, and categorization. After applying these methods image data are pre-processed and ready to be used in the classifier algorithms. The accuracy of these classifiers is in between 98.68% - 99.79%. Among those models, VGG16 achieved the highest accuracy of 99.79%.
Review of Reinforcement Learning Papers #13

I present 4 publications from my research area: reinforcement learning. Let’s discuss it!. Ye, W., Liu, S., Kurutach, T., Abbeel, P., & Gao, Y. (2021). Mastering Atari Games with Limited Data. arXiv preprint arXiv:2111.00210. EfficientZero is the name given by the authors to their new reinforcement learning algorithm. What...
Dynamic Price of Parking Service based on Deep Learning

The improvement of air-quality in urban areas is one of the main concerns of public government bodies. This concern emerges from the evidence between the air quality and the public health. Major efforts from government bodies in this area include monitoring and forecasting systems, banning more pollutant motor vehicles, and traffic limitations during the periods of low-quality air. In this work, a proposal for dynamic prices in regulated parking services is presented. The dynamic prices in parking service must discourage motor vehicles parking when low-quality episodes are predicted. For this purpose, diverse deep learning strategies are evaluated. They have in common the use of collective air-quality measurements for forecasting labels about air quality in the city. The proposal is evaluated by using economic parameters and deep learning quality criteria at Madrid (Spain).
Reinforcement Learning to Solve NP-hard Problems: an Application to the CVRP

In this paper, we evaluate the use of Reinforcement Learning (RL) to solve a classic combinatorial optimization problem: the Capacitated Vehicle Routing Problem (CVRP). We formalize this problem in the RL framework and compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances. We measure the different approaches with the quality of the solution returned and the time required to return it. We found that despite not returning the best solution, the RL approach has many advantages over traditional solvers. First, the versatility of the framework allows the resolution of more complex combinatorial problems. Moreover, instead of trying to solve a specific instance of the problem, the RL algorithm learns the skills required to solve the problem. The trained policy can then quasi instantly provide a solution to an unseen problem without having to solve it from scratch. Finally, the use of trained models makes the RL solver by far the fastest, and therefore make this approach more suited for commercial use where the user experience is paramount. Techniques like Knowledge Transfer can also be used to improve the training efficiency of the algorithm and help solve bigger and more complex problems.
Sample Efficient Deep Reinforcement Learning via Uncertainty Estimation

In model-free deep reinforcement learning (RL) algorithms, using noisy value estimates to supervise policy evaluation and optimization is detrimental to the sample efficiency. As this noise is heteroscedastic, its effects can be mitigated using uncertainty-based weights in the optimization process. Previous methods rely on sampled ensembles, which do not capture all aspects of uncertainty. We provide a systematic analysis of the sources of uncertainty in the noisy supervision that occurs in RL, and introduce inverse-variance RL, a Bayesian framework which combines probabilistic ensembles and Batch Inverse Variance weighting. We propose a method whereby two complementary uncertainty estimation methods account for both the Q-value and the environment stochasticity to better mitigate the negative impacts of noisy supervision. Our results show significant improvement in terms of sample efficiency on discrete and continuous control tasks.
An Efficient Contact Algorithm for Rigid/Deformable Interaction based on the Dual Mortar Method

In a wide range of practical problems, such as forming operations and impact tests, assuming that one of the contacting bodies is rigid is an excellent approximation to the physical phenomenon. In this work, the well-established dual mortar method is adopted to enforce interface constraints in the finite deformation frictionless contact of rigid and deformable bodies. The efficiency of the nonlinear contact algorithm proposed here is based on two main contributions. Firstly, a variational formulation of the method using the so-called Petrov-Galerkin scheme is investigated, as it unlocks a significant simplification by removing the need to explicitly evaluate the dual basis functions. The corresponding first-order dual mortar interpolation is presented in detail. Particular focus is, then, placed on the extension for second-order interpolation by employing a piecewise linear interpolation scheme, which critically retains the geometrical information of the finite element mesh. Secondly, a new definition for the nodal orthonormal moving frame attached to each contact node is suggested. It reduces the geometrical coupling between the nodes and consequently decreases the stiffness matrix bandwidth. The proposed contributions decrease the computational complexity of dual mortar methods for rigid/deformable interaction, especially in the three-dimensional setting, while preserving accuracy and robustness.
On robust risk-based active-learning algorithms for enhanced decision support

Classification models are a fundamental component of physical-asset management technologies such as structural health monitoring (SHM) systems and digital twins. Previous work introduced \textit{risk-based active learning}, an online approach for the development of statistical classifiers that takes into account the decision-support context in which they are applied. Decision-making is considered by preferentially querying data labels according to \textit{expected value of perfect information} (EVPI). Although several benefits are gained by adopting a risk-based active learning approach, including improved decision-making performance, the algorithms suffer from issues relating to sampling bias as a result of the guided querying process. This sampling bias ultimately manifests as a decline in decision-making performance during the later stages of active learning, which in turn corresponds to lost resource/utility.
An Efficient Multi-Indicator and Many-Objective Optimization Algorithm based on Two-Archive

Indicator-based algorithms are gaining prominence as traditional multi-objective optimization algorithms based on domination and decomposition struggle to solve many-objective optimization problems. However, previous indicator-based multi-objective optimization algorithms suffer from the following flaws: 1) The environment selection process takes a long time; 2) Additional parameters are usually necessary. As a result, this paper proposed an multi-indicator and multi-objective optimization algorithm based on two-archive (SRA3) that can efficiently select good individuals in environment selection based on indicators performance and uses an adaptive parameter strategy for parental selection without setting additional parameters. Then we normalized the algorithm and compared its performance before and after normalization, finding that normalization improved the algorithm's performance significantly. We also analyzed how normalizing affected the indicator-based algorithm and observed that the normalized $I_{\epsilon+}$ indicator is better at finding extreme solutions and can reduce the influence of each objective's different extent of contribution to the indicator due to its different scope. However, it also has a preference for extreme solutions, which causes the solution set to converge to the extremes. As a result, we give some suggestions for normalization. Then, on the DTLZ and WFG problems, we conducted experiments on 39 problems with 5, 10, and 15 objectives, and the results show that SRA3 has good convergence and diversity while maintaining high efficiency. Finally, we conducted experiments on the DTLZ and WFG problems with 20 and 25 objectives and found that the algorithm proposed in this paper is more competitive than other algorithms as the number of objectives increases.
Algorithm helps robots avoid obstacles in their path

If you've ever ordered a product from Amazon, chances are that a robot selected your purchase from a shelf, read the barcode and delivered it to the counter for packaging. Hopefully, it didn't collide with a human worker on its journey and lose its way. The odds of that happening...
Deep Learning-based Predictive Control of Battery Management for Frequency Regulation

This paper proposes a deep learning-based optimal battery management scheme for frequency regulation (FR) by integrating model predictive control (MPC), supervised learning (SL), reinforcement learning (RL), and high-fidelity battery models. By taking advantage of deep neural networks (DNNs), the derived DNN-approximated policy is computationally efficient in online implementation. The design procedure of the proposed scheme consists of two sequential processes: (1) the SL process, in which we first run a simulation with an MPC embedding a low-fidelity battery model to generate a training data set, and then, based on the generated data set, we optimize a DNN-approximated policy using SL algorithms; and (2) the RL process, in which we utilize RL algorithms to improve the performance of the DNN-approximated policy by balancing short-term economic incentives and long-term battery degradation. The SL process speeds up the subsequent RL process by providing a good initialization. By utilizing RL algorithms, one prominent property of the proposed scheme is that it can learn from the data generated by simulating the FR policy on the high-fidelity battery simulator to adjust the DNN-approximated policy, which is originally based on low-fidelity battery model. A case study using real-world data of FR signals and prices is performed. Simulation results show that, compared to conventional MPC schemes, the proposed deep learning-based scheme can effectively achieve higher economic benefits of FR participation while maintaining lower online computational cost.
Solving Dynamic Graph Problems with Multi-Attention Deep Reinforcement Learning

Graph problems such as traveling salesman problem, or finding minimal Steiner trees are widely studied and used in data engineering and computer science. Typically, in real-world applications, the features of the graph tend to change over time, thus, finding a solution to the problem becomes challenging. The dynamic version of many graph problems are the key for a plethora of real-world problems in transportation, telecommunication, and social networks. In recent years, using deep learning techniques to find heuristic solutions for NP-hard graph combinatorial problems has gained much interest as these learned heuristics can find near-optimal solutions efficiently. However, most of the existing methods for learning heuristics focus on static graph problems. The dynamic nature makes NP-hard graph problems much more challenging to learn, and the existing methods fail to find reasonable solutions.
A Kernel-Expanded Stochastic Neural Network

The deep neural network suffers from many fundamental issues in machine learning. For example, it often gets trapped into a local minimum in training, and its prediction uncertainty is hard to be assessed. To address these issues, we propose the so-called kernel-expanded stochastic neural network (K-StoNet) model, which incorporates support vector regression (SVR) as the first hidden layer and reformulates the neural network as a latent variable model. The former maps the input vector into an infinite dimensional feature space via a radial basis function (RBF) kernel, ensuring absence of local minima on its training loss surface. The latter breaks the high-dimensional nonconvex neural network training problem into a series of low-dimensional convex optimization problems, and enables its prediction uncertainty easily assessed. The K-StoNet can be easily trained using the imputation-regularized optimization (IRO) algorithm. Compared to traditional deep neural networks, K-StoNet possesses a theoretical guarantee to asymptotically converge to the global optimum and enables the prediction uncertainty easily assessed. The performances of the new model in training, prediction and uncertainty quantification are illustrated by simulated and real data examples.
Reinforcing Cybersecurity Hands-on Training With Adaptive Learning

This paper presents how learning experience influences students' capability to learn and their motivation for learning. Although each student is different, standard instruction methods do not adapt to individuals. Adaptive learning reverses this practice and attempts to improve the student experience. While adaptive learning is well-established in programming, it is rarely used in cybersecurity education. This paper is one of the first works investigating adaptive learning in security training. First, we analyze the performance of 95 students in 12 training sessions to understand the limitations of the current training practice. Less than half of the students completed the training without displaying a solution, and only in two sessions, all students completed all phases. Then, we simulate how students would proceed in one of the past training sessions if it would offer more paths of various difficulty. Based on this simulation, we propose a novel tutor model for adaptive training, which considers students' proficiency before and during an ongoing training session. The proficiency is assessed using a pre-training questionnaire and various in-training metrics. Finally, we conduct a study with 24 students and new training using the proposed tutor model and adaptive training format. The results show that the adaptive training does not overwhelm students as the original static training. Adaptive training enables students to enter several alternative training phases with lower difficulty than the original training. The proposed format is not restricted to a particular training. Therefore, it can be applied to practicing any security topic or even in related fields, such as networking or operating systems. Our study indicates that adaptive learning is a promising approach for improving the student experience in security education. We also highlight implications for educational practice.
HYLDA: End-to-end Hybrid Learning Domain Adaptation for LiDAR Semantic Segmentation

In this paper we address the problem of training a LiDAR semantic segmentation network using a fully-labeled source dataset and a target dataset that only has a small number of labels. To this end, we develop a novel image-to-image translation engine, and couple it with a LiDAR semantic segmentation network, resulting in an integrated domain adaptation architecture we call HYLDA. To train the system end-to-end, we adopt a diverse set of learning paradigms, including 1) self-supervision on a simple auxiliary reconstruction task, 2) semi-supervised training using a few available labeled target domain frames, and 3) unsupervised training on the fake translated images generated by the image-to-image translation stage, together with the labeled frames from the source domain. In the latter case, the semantic segmentation network participates in the updating of the image-to-image translation engine. We demonstrate experimentally that HYLDA effectively addresses the challenging problem of improving generalization on validation data from the target domain when only a few target labeled frames are available for training. We perform an extensive evaluation where we compare HYLDA against strong baseline methods using two publicly available LiDAR semantic segmentation datasets.
Reinforcement Online Learning to Rank with Unbiased Reward Shaping

Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users' interactions, such as clicks. Clicks however are a biased signal: specifically, top-ranked documents are likely to attract more clicks than documents down the ranking (position bias). In this paper, we propose a novel learning algorithm for OLTR that uses reinforcement learning to optimize rankers: Reinforcement Online Learning to Rank (ROLTR). In ROLTR, the gradients of the ranker are estimated based on the rewards assigned to clicked and unclicked documents. In order to de-bias the users' position bias contained in the reward signals, we introduce unbiased reward shaping functions that exploit inverse propensity scoring for clicked and unclicked documents. The fact that our method can also model unclicked documents provides a further advantage in that less users interactions are required to effectively train a ranker, thus providing gains in efficiency. Empirical evaluation on standard OLTR datasets shows that ROLTR achieves state-of-the-art performance, and provides significantly better user experience than other OLTR approaches. To facilitate the reproducibility of our experiments, we make all experiment code available at this https URL.
Neuromorphic Vision Based Control for the Precise Positioning of Robotic Drilling Systems

The manufacturing industry is currently witnessing a paradigm shift with the unprecedented adoption of industrial robots, and machine vision is a key perception technology that enables these robots to perform precise operations in unstructured environments. However, the sensitivity of conventional vision sensors to lighting conditions and high-speed motion sets a limitation on the reliability and work-rate of production lines. Neuromorphic vision is a recent technology with the potential to address the challenges of conventional vision with its high temporal resolution, low latency, and wide dynamic range. In this paper and for the first time, we propose a novel neuromorphic vision based controller for faster and more reliable machining operations, and present a complete robotic system capable of performing drilling tasks with sub-millimeter accuracy. Our proposed system localizes the target workpiece in 3D using two perception stages that we developed specifically for the asynchronous output of neuromorphic cameras. The first stage performs multi-view reconstruction for an initial estimate of the workpiece's pose, and the second stage refines this estimate for a local region of the workpiece using circular hole detection. The robot then precisely positions the drilling end-effector and drills the target holes on the workpiece using a combined position-based and image-based visual servoing approach. The proposed solution is validated experimentally for drilling nutplate holes on workpieces placed arbitrarily in an unstructured environment with uncontrolled lighting. Experimental results prove the effectiveness of our solution with an average positional errors of less than 0.1 mm, and demonstrate that the use of neuromorphic vision overcomes the lighting and speed limitations of conventional cameras.
