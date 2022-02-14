ContributorsPublishersAdvertisers
Coding & Programming

Convex Programs and Lyapunov Functions for Reinforcement Learning: A Unified Perspective on the Analysis of Value-Based Methods

By Xingang Guo, Bin Hu
arxiv.org
 2 days ago

Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value...

arxiv.org

Comments / 0

Related
OEM Off-Highway

Eaton Cummins Automated Transmission Technologies Endurant XD Series Transmissions

Eaton Cummins Automated Transmission Technologies has released specifications for its Endurant XD series transmissions which will include Endurant XD and Endurant XD Pro models. Purpose-built, high-performance automated transmissions designed for on-highway applications with high gross combined weight ratings, such as double and triple trailer trucks, and severe-duty on-/off-highway applications like...
CARS
scitechdaily.com

Crucial Superabsorption Breakthrough Unlocks Key to Next-Generation Quantum Batteries

Researchers at the University of Adelaide and their overseas partners have taken a key step in making quantum batteries a reality. They have successfully proved the concept of superabsorption, a crucial idea underpinning quantum batteries. “Quantum batteries, which use quantum mechanical principles to enhance their capabilities, require less charging time...
ENGINEERING
electropages.com

Researchers demonstrate scalable single-molecule sensors

Researchers have recently demonstrated a full-scale integrated circuit that can detect individual molecules. What challenges does individual molecular detection face, what did the researchers demonstrate, and where could this technology be used?. What challenges does molecular detection face?. When it comes to molecular science, the ingeniousness of researchers cannot go...
SCIENCE
IN THIS ARTICLE
#Convex Programs#Rl#Acc#Machine Learning#Lg#Systems#Control#Sy
PC Gamer

Human reinforced learning could mean 'more truthful and less toxic' AI

AI has been making huge leaps in terms of scientific research, and companies like Nvidia and Meta are continuing to throw more resources towards the technology. But AI learning can have a pretty huge setback when it adopts the prejudices of those who make it. Like all those chatbots that wind up spewing hate speech thanks to their exposure to the criminally online.
SOFTWARE
Freethink

Smartphone COVID test is as accurate as PCR test

If you’re like me, then you’ve shown up at Walgreens for an appointment for a COVID test, only to find they are out of tests or even closed. You check the hospital, walk-in clinic, school, and other local pharmacies — all the usual places to get a COVID test, but it is impossible. The wait is too long, or you don’t fit the criteria for testing.
CELL PHONES
Charleston Regional Business Journal

Viewpoint: Moving innovations out of the lab and into the market

Innovation ecosystems and economic growth in the U.S. and across the world depend on an efficient lab-to-market process. A lab can be a highly sophisticated and well-equipped place at a research university or federal research facility. On the other hand, it can be in a garage or a basement, which...
ECONOMY
arxiv.org

Relating cognition to both brain structure and function: A systematic review of methods

Cognitive neuroscience explores the mechanisms of cognition by studying its structural and functional brain correlates. Here, we report the first systematic review that assesses how information from structural and functional neuroimaging methods can be integrated to investigate the brain substrates of cognition. Web of Science and Scopus databases were searched for studies of healthy young adult populations that collected cognitive data, and structural and functional neuroimaging data. Five percent of screened studies met all inclusion criteria. Next, 54% of included studies related cognitive performance to brain structure and function without quantitative analysis of the relationship. Finally, 32% of studies formally integrated structural and functional brain data. Overall, many studies consider either structural or functional neural correlates of cognition, and of those that consider both, they have rarely been integrated. We identified four emergent approaches to the characterisation of the relationship between brain structure, function and cognition; comparative, predictive, fusion and complementary. We discuss the insights provided each approach and how authors can select approaches to suit their research questions.
SCIENCE
YOU MAY ALSO LIKE
NewsBreak
Technology
NewsBreak
Computers
NewsBreak
Science
NewsBreak
Coding & Programming
NewsBreak
Computer Science
arxiv.org

Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT

The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment. According to the different requirement of end users, these data usually have high heterogeneity and privacy, while most of users are reluctant to expose them to the public view. How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue, such that it has attracted extensive attention from academia and industry. As a new machine learning (ML) paradigm, federated learning (FL) has great advantages in training heterogeneous and private data. This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments. In order to increase the model aggregation rate and reduce communication costs, we apply deep reinforcement learning (DRL) to IIoT equipment selection process, specifically to select those IIoT equipment nodes with accurate models. Therefore, we propose a FL algorithm assisted by DRL, which can take into account the privacy and efficiency of data training of IIoT equipment. By analyzing the data characteristics of IIoT equipments, we use MNIST, fashion MNIST and CIFAR-10 data sets to represent the data generated by IIoT. During the experiment, we employ the deep neural network (DNN) model to train the data, and experimental results show that the accuracy can reach more than 97\%, which corroborates the effectiveness of the proposed algorithm.
SOFTWARE
arxiv.org

Energy Management Based on Multi-Agent Deep Reinforcement Learning for A Multi-Energy Industrial Park

Owing to large industrial energy consumption, industrial production has brought a huge burden to the grid in terms of renewable energy access and power supply. Due to the coupling of multiple energy sources and the uncertainty of renewable energy and demand, centralized methods require large calculation and coordination overhead. Thus, this paper proposes a multi-energy management framework achieved by decentralized execution and centralized training for an industrial park. The energy management problem is formulated as a partially-observable Markov decision process, which is intractable by dynamic programming due to the lack of the prior knowledge of the underlying stochastic process. The objective is to minimize long-term energy costs while ensuring the demand of users. To solve this issue and improve the calculation speed, a novel multi-agent deep reinforcement learning algorithm is proposed, which contains the following key points: counterfactual baseline for facilitating contributing agents to learn better policies, soft actor-critic for improving robustness and exploring optimal solutions. A novel reward is designed by Lagrange multiplier method to ensure the capacity constraints of energy storage. In addition, considering that the increase in the number of agents leads to performance degradation due to large observation spaces, an attention mechanism is introduced to enhance the stability of policy and enable agents to focus on important energy-related information, which improves the exploration efficiency of soft actor-critic. Numerical results based on actual data verify the performance of the proposed algorithm with high scalability, indicating that the industrial park can minimize energy costs under different demands.
ENERGY INDUSTRY
techxplore.com

Accessible social media analysis through web-based application programming interfaces

Web-based application programming interfaces (APIs) provide researchers studying online social networks with a sophisticated route into those networks that can allow them to study the activity of users in detail, given ethical constraints and specific limitations of the APIs. Writing in the International Journal of Services Operations and Informatics, a team from India reviews the state-of-the-art in this constantly evolving realm. In their review, they reveal the challenges that might be faced in using an API, the suitability of a given API for particular research purposes. They also discuss how social media analytical tools might be adopted to support knowledge-based business strategies.
INTERNET
arxiv.org

Security-Aware Virtual Network Embedding Algorithm based on Reinforcement Learning

Virtual network embedding (VNE) algorithm is always the key problem in network virtualization (NV) technology. At present, the research in this field still has the following problems. The traditional way to solve VNE problem is to use heuristic algorithm. However, this method relies on manual embedding rules, which does not accord with the actual situation of VNE. In addition, as the use of intelligent learning algorithm to solve the problem of VNE has become a trend, this method is gradually outdated. At the same time, there are some security problems in VNE. However, there is no intelligent algorithm to solve the security problem of VNE. For this reason, this paper proposes a security-aware VNE algorithm based on reinforcement learning (RL). In the training phase, we use a policy network as a learning agent and take the extracted attributes of the substrate nodes to form a feature matrix as input. The learning agent is trained in this environment to get the mapping probability of each substrate node. In the test phase, we map nodes according to the mapping probability and use the breadth-first strategy (BFS) to map links. For the security problem, we add security requirements level constraint for each virtual node and security level constraint for each substrate node. Virtual nodes can only be embedded on substrate nodes that are not lower than the level of security requirements. Experimental results show that the proposed algorithm is superior to other typical algorithms in terms of long-term average return, long-term revenue consumption ratio and virtual network request (VNR) acceptance rate.
SOFTWARE
arxiv.org

VNE Solution for Network Differentiated QoS and Security Requirements: From the Perspective of Deep Reinforcement Learning

The rapid development and deployment of network services has brought a series of challenges to researchers. On the one hand, the needs of Internet end users/applications reflect the characteristics of travel alienation, and they pursue different perspectives of service quality. On the other hand, with the explosive growth of information in the era of big data, a lot of private information is stored in the network. End users/applications naturally start to pay attention to network security. In order to solve the requirements of differentiated quality of service (QoS) and security, this paper proposes a virtual network embedding (VNE) algorithm based on deep reinforcement learning (DRL), aiming at the CPU, bandwidth, delay and security attributes of substrate network. DRL agent is trained in the network environment constructed by the above attributes. The purpose is to deduce the mapping probability of each substrate node and map the virtual node according to this probability. Finally, the breadth first strategy (BFS) is used to map the virtual links. In the experimental stage, the algorithm based on DRL is compared with other representative algorithms in three aspects: long term average revenue, long term revenue consumption ratio and acceptance rate. The results show that the algorithm proposed in this paper has achieved good experimental results, which proves that the algorithm can be effectively applied to solve the end user/application differentiated QoS and security requirements.
COMPUTERS
arxiv.org

Provable Reinforcement Learning with a Short-Term Memory

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length $m$ rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.
COMPUTERS
arxiv.org

Comprehensive survey of computational learning methods for analysis of gene expression data in genomics

Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of the gene expression data. However, more complex analysis for classification and discovery of feature genes or sample observations requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though, the methods are discussed in the context of expression microarray data, they can also be applied for the analysis of RNA sequencing or quantitative proteomics datasets. We specifically discuss methods for missing value (gene expression) imputation, feature gene scaling, selection and extraction of features for dimensionality reduction, and learning and analysis of expression data. We discuss the types of missing values and the methods and approaches usually employed in their imputation. We also discuss methods of data transformation and feature scaling viz. normalization and standardization. Various approaches used in feature selection and extraction are also reviewed. Lastly, learning and analysis methods including class comparison, class prediction, and class discovery along with their evaluation parameters are described in detail. We have described the process of generation of a microarray gene expression data along with advantages and limitations of the above-mentioned techniques. We believe that this detailed review will help the users to select appropriate methods based on the type of data and the expected outcome.
SCIENCE
HackerNoon

Choosing a Computing Method: a Serverless SWOT Analysis

This is an attempt to create a decision framework and break down arguments for and against using serverless vs. other computing models. Strengths: Developers only focus on the business logic and that drastically increases development speed and time to market. Weaknesses: It's difficult to test locally and it's hard to navigate debugging data. The benefits outweigh the downsides of serverless, but if you make sure to structure your software right, the benefits greatly outweigh downsides. If you have any feedback, the comment section is all yours!
SOFTWARE
arxiv.org

Federated Reinforcement Learning for Collective Navigation of Robotic Swarms

The recent advancement of Deep Reinforcement Learning (DRL) contributed to robotics by allowing automatic controller design. Automatic controller design is a crucial approach for designing swarm robotic systems, which require more complex controller than a single robot system to lead a desired collective behaviour. Although DRL-based controller design method showed its effectiveness, the reliance on the central training server is a critical problem in the real-world environments where the robot-server communication is unstable or limited. We propose a novel Federated Learning (FL) based DRL training strategy for use in swarm robotic applications. As FL reduces the number of robot-server communication by only sharing neural network model weights, not local data samples, the proposed strategy reduces the reliance on the central server during controller training with DRL. The experimental results from the collective learning scenario showed that the proposed FL-based strategy dramatically reduced the number of communication by minimum 1600 times and even increased the success rate of navigation with the trained controller by 2.8 times compared to the baseline strategies that share a central server. The results suggest that our proposed strategy can efficiently train swarm robotic systems in the real-world environments with the limited robot-server communication, e.g. agri-robotics, underwater and damaged nuclear facilities.
COMPUTERS
arxiv.org

Reward is not enough: can we liberate AI from the reinforcement learning paradigm?

I present arguments against the hypothesis put forward by Silver, Singh, Precup, and Sutton ( this https URL ) : reward maximization is not enough to explain many activities associated with natural and artificial intelligence including knowledge, learning, perception, social intelligence, evolution, language, generalisation and imitation. I show such reductio ad lucrum has its intellectual origins in the political economy of Homo economicus and substantially overlaps with the radical version of behaviourism. I show why the reinforcement learning paradigm, despite its demonstrable usefulness in some practical application, is an incomplete framework for intelligence -- natural and artificial. Complexities of intelligent behaviour are not simply second-order complications on top of reward maximisation. This fact has profound implications for the development of practically usable, smart, safe and robust artificially intelligent agents.
COMPUTERS
arxiv.org

A Reinforcement Learning Framework for PQoS in a Teleoperated Driving Scenario

In recent years, autonomous networks have been designed with Predictive Quality of Service (PQoS) in mind, as a means for applications operating in the industrial and/or automotive sectors to predict unanticipated Quality of Service (QoS) changes and react accordingly. In this context, Reinforcement Learning (RL) has come out as a promising approach to perform accurate predictions, and optimize the efficiency and adaptability of wireless networks. Along these lines, in this paper we propose the design of a new entity, implemented at the RAN-level that, with the support of an RL framework, implements PQoS functionalities. Specifically, we focus on the design of the reward function of the learning agent, able to convert QoS estimates into appropriate countermeasures if QoS requirements are not satisfied. We demonstrate via ns-3 simulations that our approach achieves the best trade-off in terms of QoS and Quality of Experience (QoE) performance of end users in a teleoperated-driving-like scenario, compared to other baseline solutions.
COMPUTERS
arxiv.org

Machine Learning Method for Functional Assessment of Retinal Models

Nikolas Papadopoulos, Nikos Melanitis, Antonio Lozano, Cristina Soto-Sanchez, Eduardo Fernandez, Konstantina S Nikita. Challenges in the field of retinal prostheses motivate the development of retinal models to accurately simulate Retinal Ganglion Cells (RGCs) responses. The goal of retinal prostheses is to enable blind individuals to solve complex, reallife visual tasks. In this paper, we introduce the functional assessment (FA) of retinal models, which describes the concept of evaluating the performance of retinal models on visual understanding tasks. We present a machine learning method for FA: we feed traditional machine learning classifiers with RGC responses generated by retinal models, to solve object and digit recognition tasks (CIFAR-10, MNIST, Fashion MNIST, Imagenette). We examined critical FA aspects, including how the performance of FA depends on the task, how to optimally feed RGC responses to the classifiers and how the number of output neurons correlates with the model's accuracy. To increase the number of output neurons, we manipulated input images - by splitting and then feeding them to the retinal model and we found that image splitting does not significantly improve the model's accuracy. We also show that differences in the structure of datasets result in largely divergent performance of the retinal model (MNIST and Fashion MNIST exceeded 80% accuracy, while CIFAR-10 and Imagenette achieved ~40%). Furthermore, retinal models which perform better in standard evaluation, i.e. more accurately predict RGC response, perform better in FA as well. However, unlike standard evaluation, FA results can be straightforwardly interpreted in the context of comparing the quality of visual perception.
ENGINEERING

Comments / 0

Community Policy