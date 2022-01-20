ContributorsPublishersAdvertisers
Technology

Self-Awareness Safety of Deep Reinforcement Learning in Road Traffic Junction Driving

By Zehong Cao, Jie Yun
arxiv.org
 4 days ago

Autonomous driving has been at the forefront of public interest, and a pivotal debate to widespread concerns is safety in the transportation system. Deep reinforcement learning (DRL) has been applied to autonomous driving to provide solutions for obstacle avoidance. However, in a road traffic junction scenario, the vehicle typically receives...

arxiv.org

Comments / 0

Related
arxiv.org

Cooperative Multi-Agent Deep Reinforcement Learning for Reliable Surveillance via Autonomous Multi-UAV Control

CCTV-based surveillance using unmanned aerial vehicles (UAVs) is considered a key technology for security in smart city environments. This paper creates a case where the UAVs with CCTV-cameras fly over the city area for flexible and reliable surveillance services. UAVs should be deployed to cover a large area while minimize overlapping and shadow areas for a reliable surveillance system. However, the operation of UAVs is subject to high uncertainty, necessitating autonomous recovery systems. This work develops a multi-agent deep reinforcement learning-based management scheme for reliable industry surveillance in smart city applications. The core idea this paper employs is autonomously replenishing the UAV's deficient network requirements with communications. Via intensive simulations, our proposed algorithm outperforms the state-of-the-art algorithms in terms of surveillance coverage, user support capability, and computational costs.
TECHNOLOGY
arxiv.org

Hybrid Reinforcement Learning-Based Eco-Driving Strategy for Connected and Automated Vehicles at Signalized Intersections

Taking advantage of both vehicle-to-everything (V2X) communication and automated driving technology, connected and automated vehicles are quickly becoming one of the transformative solutions to many transportation problems. However, in a mixed traffic environment at signalized intersections, it is still a challenging task to improve overall throughput and energy efficiency considering the complexity and uncertainty in the traffic system. In this study, we proposed a hybrid reinforcement learning (HRL) framework which combines the rule-based strategy and the deep reinforcement learning (deep RL) to support connected eco-driving at signalized intersections in mixed traffic. Vision-perceptive methods are integrated with vehicle-to-infrastructure (V2I) communications to achieve higher mobility and energy efficiency in mixed connected traffic. The HRL framework has three components: a rule-based driving manager that operates the collaboration between the rule-based policies and the RL policy; a multi-stream neural network that extracts the hidden features of vision and V2I information; and a deep RL-based policy network that generate both longitudinal and lateral eco-driving actions. In order to evaluate our approach, we developed a Unity-based simulator and designed a mixed-traffic intersection scenario. Moreover, several baselines were implemented to compare with our new design, and numerical experiments were conducted to test the performance of the HRL model. The experiments show that our HRL method can reduce energy consumption by 12.70% and save 11.75% travel time when compared with a state-of-the-art model-based Eco-Driving approach.
CARS
arxiv.org

Dynamic Cooperative Vehicle Platoon Control Considering Longitudinal and Lane-changing Dynamics

This paper presents a distributed cascade Proportional Integral Derivate (DCPID) control algorithm for the connected and automated vehicle (CAV) platoon considering the heterogeneity of CAVs in terms of the inertial lag. Furthermore, a real-time dynamic cooperative lane-changing model for CAVs, which can seamlessly combine the DCPID algorithm and the improved sine function is developed. The DCPID algorithm determines the appropriate longitudinal acceleration and speed of the lane-changing vehicle considering the speed fluctuations of the front vehicle on the target lane (TFV). In the meantime, the sine function plans a reference trajectory which is further updated in real time using the model predictive control (MPC) to avoid potential collisions until lane-changing is completed. Both the local and the asymptotic stability conditions of the DCPID algorithm are mathematically derived, and the sensitivity of the DCPID control parameters under different states is analyzed. Simulation experiments are conducted to assess the performance of the proposed model and the results indicate that the DCPID algorithm can provide robust control for tracking and adjusting the desired spacing and velocity for all 400 scenarios, even in the relatively extreme initial state. Besides, the proposed dynamic cooperative lane-changing model can guarantee an effective and safe lane-changing with different speeds and even in emergency situations (such as the sudden deceleration of the TFV).
CARS
arxiv.org

Conservative Distributional Reinforcement Learning with Safety Constraints

Safety exploration can be regarded as a constrained Markov decision problem where the expected long-term cost is constrained. Previous off-policy algorithms convert the constrained optimization problem into the corresponding unconstrained dual problem by introducing the Lagrangian relaxation technique. However, the cost function of the above algorithms provides inaccurate estimations and causes the instability of the Lagrange multiplier learning. In this paper, we present a novel off-policy reinforcement learning algorithm called Conservative Distributional Maximum a Posteriori Policy Optimization (CDMPO). At first, to accurately judge whether the current situation satisfies the constraints, CDMPO adapts distributional reinforcement learning method to estimate the Q-function and C-function. Then, CDMPO uses a conservative value function loss to reduce the number of violations of constraints during the exploration process. In addition, we utilize Weighted Average Proportional Integral Derivative (WAPID) to update the Lagrange multiplier stably. Empirical results show that the proposed method has fewer violations of constraints in the early exploration process. The final test results also illustrate that our method has better risk control.
SCIENCE
IN THIS ARTICLE
#Reinforcement Learning#Traffic Collisions#Deep Learning#Driving#Self Awareness#Drl#Dqn#Ppo#Machine Learning#Lg
GTNationEd

Helpful Road Safety Tips You Should Follow For Long Drives

According to research, America recorded about 2.21 million road accidents in 2018, the highest number of crashes worldwide that year. Therefore, it is undoubtedly essential to be aware of the checks and safeguards designed to keep motorists and pedestrians safe as a driver. This way, you can prevent damage to your car, financial distress, severe injuries, and even death. Here are four essential road safety tips you should follow when going on a long drive.
TRAFFIC
cbs4indy.com

New traffic safety study released

The Advocates for Highway and Auto Safety says states across the country have work to do to make people safer on the roads. The group just released its 2022 Roadmap of State Highway Safety Laws.
TRAFFIC
arxiv.org

Sim-to-Lab-to-Real: Safe Reinforcement Learning with Shielding and Generalization Guarantees

Safety is a critical component of autonomous systems and remains a challenge for learning-based policies to be utilized in the real world. In particular, policies learned using reinforcement learning often fail to generalize to novel environments due to unsafe behavior. In this paper, we propose Sim-to-Lab-to-Real to safely close the reality gap. To improve safety, we apply a dual policy setup where a performance policy is trained using the cumulative task reward and a backup (safety) policy is trained by solving the reach-avoid Bellman Equation based on Hamilton-Jacobi reachability analysis. In Sim-to-Lab transfer, we apply a supervisory control scheme to shield unsafe actions during exploration; in Lab-to-Real transfer, we leverage the Probably Approximately Correct (PAC)-Bayes framework to provide lower bounds on the expected performance and safety of policies in unseen environments. We empirically study the proposed framework for ego-vision navigation in two types of indoor environments including a photo-realistic one. We also demonstrate strong generalization performance through hardware experiments in real indoor spaces with a quadrupedal robot. See this https URL for supplementary material.
COMPUTERS
arxiv.org

Deep reinforcement learning for large-eddy simulation modeling in wall-bounded turbulence

The development of a reliable subgrid-scale (SGS) model for large-eddy simulation (LES) is of great importance for many scientific and engineering applications. Recently, deep learning approaches have been tested for this purpose using high-fidelity data such as direct numerical simulation (DNS) in a supervised learning process. However, such data are generally not available in practice. Deep reinforcement learning (DRL) using only limited target statistics can be an alternative algorithm in which the training and testing of the model are conducted in the same LES environment. The DRL of turbulence modeling remains challenging owing to its chaotic nature, high dimensionality of the action space, and large computational cost. In the present study, we propose a physics-constrained DRL framework that can develop a deep neural network (DNN)-based SGS model for the LES of turbulent channel flow. The DRL models that produce the SGS stress were trained based on the local gradient of the filtered velocities. The developed SGS model automatically satisfies the reflectional invariance and wall boundary conditions without an extra training process so that DRL can quickly find the optimal policy. Furthermore, direct accumulation of reward, spatially and temporally correlated exploration, and the pre-training process are applied for the efficient and effective learning. In various environments, our DRL could discover SGS models that produce the viscous and Reynolds stress statistics perfectly consistent with the filtered DNS. By comparing various statistics obtained by the trained models and conventional SGS models, we present a possible interpretation of better performance of the DRL model.
COMPUTERS
YOU MAY ALSO LIKE
NewsBreak
Artificial Intelligence
NewsBreak
Technology
Shropshire Star

Third of motorists ‘do not know Highway Code is changing next week’

A major revamp of the code includes the introduction of a hierarchy of road uses and new rules at junctions. One in three drivers are unaware major changes to road rules aimed at protecting cyclists and pedestrians come into force next week, a new survey suggests. Some 33% of motorists...
TRAFFIC
upr.org

The Hyundai Sonata 2021: a look at the safety of self-driving cars

That’s the Sonata that Hyundai loaned us merging onto I-15. Nocomplaints here. The 1.6 liter turbo is less than half the size of an SUV engine. Car and Driver got it 0-60 in 7.3 seconds, which is decent considering the trade-off of 37 miles per gallon highway. And when...
CARS
arxiv.org

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer. The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path...
COMPUTERS
arxiv.org

Reinforcement Learning in Time-Varying Systems: an Empirical Study

Recent research has turned to Reinforcement Learning (RL) to solve challenging decision problems, as an alternative to hand-tuned heuristics. RL can learn good policies without the need for modeling the environment's dynamics. Despite this promise, RL remains an impractical solution for many real-world systems problems. A particularly challenging case occurs when the environment changes over time, i.e. it exhibits non-stationarity. In this work, we characterize the challenges introduced by non-stationarity and develop a framework for addressing them to train RL agents in live systems. Such agents must explore and learn new environments, without hurting the system's performance, and remember them over time. To this end, our framework (1) identifies different environments encountered by the live system, (2) explores and trains a separate expert policy for each environment, and (3) employs safeguards to protect the system's performance. We apply our framework to two systems problems: straggler mitigation and adaptive video streaming, and evaluate it against a variety of alternative approaches using real-world and synthetic data. We show that each component of our framework is necessary to cope with non-stationarity.
COMPUTERS
arxiv.org

Environment Generation for Zero-Shot Compositional Reinforcement Learning

Many real-world problems are compositional - solving them requires completing interdependent sub-tasks, either in series or in parallel, that can be represented as a dependency graph. Deep reinforcement learning (RL) agents often struggle to learn such complex tasks due to the long time horizons and sparse rewards. To address this problem, we present Compositional Design of Environments (CoDE), which trains a Generator agent to automatically build a series of compositional tasks tailored to the RL agent's current skill level. This automatic curriculum not only enables the agent to learn more complex tasks than it could have otherwise, but also selects tasks where the agent's performance is weak, enhancing its robustness and ability to generalize zero-shot to unseen tasks at test-time. We analyze why current environment generation techniques are insufficient for the problem of generating compositional tasks, and propose a new algorithm that addresses these issues. Our results assess learning and generalization across multiple compositional tasks, including the real-world problem of learning to navigate and interact with web pages. We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments. We contribute two new benchmark frameworks for generating compositional tasks, compositional MiniGrid and gMiniWoB for web navigation.CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.
CODING & PROGRAMMING
arxiv.org

Reinforcement Learning based Air Combat Maneuver Generation

The advent of artificial intelligence technology paved the way of many researches to be made within air combat sector. Academicians and many other researchers did a research on a prominent research direction called autonomous maneuver decision of UAV. Elaborative researches produced some outcomes, but decisions that include Reinforcement Learning(RL) came out to be more efficient. There have been many researches and experiments done to make an agent reach its target in an optimal way, most prominent are Genetic Algorithm(GA) , A star, RRT and other various optimization techniques have been used. But Reinforcement Learning is the well known one for its success. In DARPHA Alpha Dogfight Trials, reinforcement learning prevailed against a real veteran F16 human pilot who was trained by Boeing. This successor model was developed by Heron Systems. After this accomplishment, reinforcement learning bring tremendous attention on itself. In this research we aimed our UAV which has a dubin vehicle dynamic property to move to the target in two dimensional space in an optimal path using Twin Delayed Deep Deterministic Policy Gradients (TD3) and used in experience replay Hindsight Experience Replay(HER).We did tests on two different environments and used simulations.
TECHNOLOGY
arxiv.org

Enabling Deep Reinforcement Learning on Energy Constrained Devices at the Edge of the Network

Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to occasionally take random actions and update the value function, i.e., re-train the Artificial Neural Network (ANN), to ensure its performance remains optimal. Unfortunately, embedded devices often lack processing power and energy required to train the ANN. The energy aspect is particularly challenging when the edge device is powered only by a means of Energy Harvesting (EH). To overcome this problem, we propose a two-part algorithm in which the DRL process is trained at the sink. Then the weights of the fully trained underlying ANN are periodically transferred to the EH-powered embedded device taking actions. Using an EH-powered sensor, real-world measurements dataset, and optimizing for Age of Information (AoI) metric, we demonstrate that such a DRL solution can operate without any degradation in the performance, with only a few ANN updates per day.
COMPUTERS
Seattle Times

Robot trucks raising self-driving safety stakes

Shipping companies and software developers are experimenting with self-driving trucks as a way to solve a driver shortage worsened by the COVID-19 pandemic, drawing fire from safety advocates who call the technology a risk to motorists. J.B. Hunt Transport Services, Uber Technologies’ freight division and FedEx are among the operators...
TECHNOLOGY
arxiv.org

Criticality-Based Varying Step-Number Algorithm for Reinforcement Learning

In the context of reinforcement learning we introduce the concept of criticality of a state, which indicates the extent to which the choice of action in that particular state influences the expected return. That is, a state in which the choice of action is more likely to influence the final outcome is considered as more critical than a state in which it is less likely to influence the final outcome.
CODING & PROGRAMMING
arxiv.org

Reinforcement Learning to Solve NP-hard Problems: an Application to the CVRP

In this paper, we evaluate the use of Reinforcement Learning (RL) to solve a classic combinatorial optimization problem: the Capacitated Vehicle Routing Problem (CVRP). We formalize this problem in the RL framework and compare two of the most promising RL approaches with traditional solving techniques on a set of benchmark instances. We measure the different approaches with the quality of the solution returned and the time required to return it. We found that despite not returning the best solution, the RL approach has many advantages over traditional solvers. First, the versatility of the framework allows the resolution of more complex combinatorial problems. Moreover, instead of trying to solve a specific instance of the problem, the RL algorithm learns the skills required to solve the problem. The trained policy can then quasi instantly provide a solution to an unseen problem without having to solve it from scratch. Finally, the use of trained models makes the RL solver by far the fastest, and therefore make this approach more suited for commercial use where the user experience is paramount. Techniques like Knowledge Transfer can also be used to improve the training efficiency of the algorithm and help solve bigger and more complex problems.
ENGINEERING
arxiv.org

A Prescriptive Dirichlet Power Allocation Policy with Deep Reinforcement Learning

Prescribing optimal operation based on the condition of the system and, thereby, potentially prolonging the remaining useful lifetime has a large potential for actively managing the availability, maintenance and costs of complex systems. Reinforcement learning (RL) algorithms are particularly suitable for this type of problems given their learning capabilities. A special case of a prescriptive operation is the power allocation task, which can be considered as a sequential allocation problem, where the action space is bounded by a simplex constraint. A general continuous action-space solution of such sequential allocation problems has still remained an open research question for RL algorithms. In continuous action-space, the standard Gaussian policy applied in reinforcement learning does not support simplex constraints, while the Gaussian-softmax policy introduces a bias during training. In this work, we propose the Dirichlet policy for continuous allocation tasks and analyze the bias and variance of its policy gradients. We demonstrate that the Dirichlet policy is bias-free and provides significantly faster convergence, better performance and better hyperparameters robustness over the Gaussian-softmax policy. Moreover, we demonstrate the applicability of the proposed algorithm on a prescriptive operation case, where we propose the Dirichlet power allocation policy and evaluate the performance on a case study of a set of multiple lithium-ion (Li-I) battery systems. The experimental results show the potential to prescribe optimal operation, improve the efficiency and sustainability of multi-power source systems.
COMPUTERS
CAR AND DRIVER

IIHS to Add Safety Scores for Self-Driving Technology in Cars

There are no self-driving cars available to own today, but that isn't stopping automakers from hyping that these Level 2+ systems are offering more than they're capable of. Now, the Insurance Institute for Highway Safety (IIHS) wants to see just how good these "partial automation" technologies are at identifying distracted drivers, the ones who are supposed to be actively participating even if the car can change lanes or drive the speed of other cars automatically.
CARS

Comments / 0

Community Policy