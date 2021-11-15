ContributorsPublishersAdvertisers
Versatile Inverse Reinforcement Learning via Cumulative Rewards

By Niklas Freymuth, Philipp Becker, Gerhard Neumann
 5 days ago

Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning...

Cumulants as the Variables of Density Cumulant Theory: A Path to Hermitian Triples

We study the combination of orbital-optimized density cumulant theory and a new parameterization of the reduced density matrices in which the variables are the particle-hole cumulant elements. We call this combination O$\lambda$DCT. We find that this new ansatz solves problems identified in the previous unitary coupled cluster ansatz for density cumulant theory: the theory is now free of near-zero denominators between occupied and virtual blocks, can correctly describe the dissociation of H$_2$, and is rigorously size-extensive. In addition, the new ansatz has fewer terms than the previous unitary ansatz, and the optimal orbitals delivered by the exact theory are the natural orbitals. Numerical studies on systems amenable to full configuration interaction show that the amplitudes from the previous ODC-12 method approximate the exact amplitudes predicted by this ansatz. Studies on equilibrium properties of diatomic molecules show that even with the new ansatz, it is necessary to include triples to improve the accuracy of the method compared to orbital optimized linearized coupled cluster doubles. With a simple iterative triples correction, O$\lambda$DCT outperforms other orbital-optimized methods truncated at comparable levels in the amplitudes, as well as CCSD(T). By adding four more terms to the cumulant parameterization, O$\lambda$DCT outperforms CCSDT while having the same $\mathcal{O}(V^5 O^3)$ scaling.
PHYSICS
DQRE-SCnet: A novel hybrid approach for selecting users in Federated Learning with Deep-Q-Reinforcement Learning based on Spectral Clustering

Mohsen Ahmadi, Ali Taghavirashidizadeh, Danial Javaheri, Armin Masoumian, Saeid Jafarzadeh Ghoushchi, Yaghoub Pourasad. Machine learning models based on sensitive data in the real-world promise advances in areas ranging from medical screening to disease outbreaks, agriculture, industry, defense science, and more. In many applications, learning participant communication rounds benefit from collecting their own private data sets, teaching detailed machine learning models on the real data, and sharing the benefits of using these models. Due to existing privacy and security concerns, most people avoid sensitive data sharing for training. Without each user demonstrating their local data to a central server, Federated Learning allows various parties to train a machine learning algorithm on their shared data jointly. This method of collective privacy learning results in the expense of important communication during training. Most large-scale machine-learning applications require decentralized learning based on data sets generated on various devices and places. Such datasets represent an essential obstacle to decentralized learning, as their diverse contexts contribute to significant differences in the delivery of data across devices and locations. Researchers have proposed several ways to achieve data privacy in Federated Learning systems. However, there are still challenges with homogeneous local data. This research approach is to select nodes (users) to share their data in Federated Learning for independent data-based equilibrium to improve accuracy, reduce training time, and increase convergence. Therefore, this research presents a combined Deep-QReinforcement Learning Ensemble based on Spectral Clustering called DQRE-SCnet to choose a subset of devices in each communication round. Based on the results, it has been displayed that it is possible to decrease the number of communication rounds needed in Federated Learning.
SOFTWARE
Robust Deep Reinforcement Learning for Quadcopter Control

Deep reinforcement learning (RL) has made it possible to solve complex robotics problems using neural networks as function approximators. However, the policies trained on stationary environments suffer in terms of generalization when transferred from one environment to another. In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy, which combines ideas from Robust Control and RL. It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another. The trained control policy is tested on the task of quadcopter positional control. RL agents were trained in a MuJoCo simulator. During testing, different environment parameters (unseen during the training) were used to validate the robustness of the trained policy for transfer from one environment to another. The robust policy outperformed the standard agents in these environments, suggesting that the added robustness increases generality and can adapt to non-stationary environments.
TECHNOLOGY
FinRL-Podracer: High Performance and Scalable Deep Reinforcement Learning for Quantitative Finance

Machine learning techniques are playing more and more important roles in finance market investment. However, finance quantitative modeling with conventional supervised learning approaches has a number of limitations. The development of deep reinforcement learning techniques is partially addressing these issues. Unfortunately, the steep learning curve and the difficulty in quick modeling and agile development are impeding finance researchers from using deep reinforcement learning in quantitative trading. In this paper, we propose an RLOps in finance paradigm and present a FinRL-Podracer framework to accelerate the development pipeline of deep reinforcement learning (DRL)-driven trading strategy and to improve both trading performance and training efficiency. FinRL-Podracer is a cloud solution that features high performance and high scalability and promises continuous training, continuous integration, and continuous delivery of DRL-driven trading strategies, facilitating a rapid transformation from algorithmic innovations into a profitable trading strategy. First, we propose a generational evolution mechanism with an ensemble strategy to improve the trading performance of a DRL agent, and schedule the training of a DRL algorithm onto a GPU cloud via multi-level mapping. Then, we carry out the training of DRL components with high-performance optimizations on GPUs. Finally, we evaluate the FinRL-Podracer framework for a stock trend prediction task on an NVIDIA DGX SuperPOD cloud. FinRL-Podracer outperforms three popular DRL libraries Ray RLlib, Stable Baseline 3 and FinRL, i.e., 12% \sim 35% improvements in annual return, 0.1 \sim 0.6 improvements in Sharpe ratio and 3 times \sim 7 times speed-up in training time. We show the high scalability by training a trading agent in 10 minutes with $80$ A100 GPUs, on NASDAQ-100 constituent stocks with minute-level data over 10 years.
MARKETS
Representation Learning via Quantum Neural Tangent Kernels

Variational quantum circuits are used in quantum machine learning and variational quantum simulation tasks. Designing good variational circuits or predicting how well they perform for given learning or optimization tasks is still unclear. Here we discuss these problems, analyzing variational quantum circuits using the theory of neural tangent kernels. We define quantum neural tangent kernels, and derive dynamical equations for their associated loss function in optimization and learning tasks. We analytically solve the dynamics in the frozen limit, or lazy training regime, where variational angles change slowly and a linear perturbation is good enough. We extend the analysis to a dynamical setting, including quadratic corrections in the variational angles. We then consider hybrid quantum-classical architecture and define a large width limit for hybrid kernels, showing that a hybrid quantum-classical neural network can be approximately Gaussian. The results presented here show limits for which analytical understandings of the training dynamics for variational quantum circuits, used for quantum machine learning and optimization problems, are possible. These analytical results are supported by numerical simulations of quantum machine learning experiments.
CODING & PROGRAMMING
Learning Data Teaching Strategies Via Knowledge Tracing

Teaching plays a fundamental role in human learning. Typically, a human teaching strategy would involve assessing a student's knowledge progress for tailoring the teaching materials in a way that enhances the learning progress. A human teacher would achieve this by tracing a student's knowledge over important learning concepts in a task. Albeit, such teaching strategy is not well exploited yet in machine learning as current machine teaching methods tend to directly assess the progress on individual training samples without paying attention to the underlying learning concepts in a learning task. In this paper, we propose a novel method, called Knowledge Augmented Data Teaching (KADT), which can optimize a data teaching strategy for a student model by tracing its knowledge progress over multiple learning concepts in a learning task. Specifically, the KADT method incorporates a knowledge tracing model to dynamically capture the knowledge progress of a student model in terms of latent learning concepts. Then we develop an attention pooling mechanism to distill knowledge representations of a student model with respect to class labels, which enables to develop a data teaching strategy on critical training samples. We have evaluated the performance of the KADT method on four different machine learning tasks including knowledge tracing, sentiment analysis, movie recommendation, and image classification. The results comparing to the state-of-the-art methods empirically validate that KADT consistently outperforms others on all tasks.
EDUCATION
FinRL: Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance

Deep reinforcement learning (DRL) has been envisioned to have a competitive edge in quantitative finance. However, there is a steep development curve for quantitative traders to obtain an agent that automatically positions to win in the market, namely \textit{to decide where to trade, at what price} and \textit{what quantity}, due to the error-prone programming and arduous debugging. In this paper, we present the first open-source framework \textit{FinRL} as a full pipeline to help quantitative traders overcome the steep learning curve. FinRL is featured with simplicity, applicability and extensibility under the key principles, \textit{full-stack framework, customization, reproducibility} and \textit{hands-on tutoring}.
COMPUTERS
PrivacyRaven: Implementing a proof of concept for model inversion

During my Trail of Bits winternship and springternship, I had the pleasure of working with Suha Hussain and Jim Miller on PrivacyRaven, a Python-based tool for testing deep-learning frameworks against a plethora of privacy attacks. I worked on improving PrivacyRaven’s versatility by adding compatibility for services such as Google Colab and expanding its privacy attack and assurance functionalities.
SOFTWARE
Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field Experiments

Attitude control of fixed-wing unmanned aerial vehicles (UAVs)is a difficult control problem in part due to uncertain nonlinear dynamics, actuator constraints, and coupled longitudinal and lateral motions. Current state-of-the-art autopilots are based on linear control and are thus limited in their effectiveness and performance. Deep reinforcement learning (DRL) is a machine learning method to automatically discover optimal control laws through interaction with the controlled system, that can handle complex nonlinear dynamics. We show in this paper that DRL can successfully learn to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics, requiring as little as three minutes of flight data. We initially train our model in a simulation environment and then deploy the learned controller on the UAV in flight tests, demonstrating comparable performance to the state-of-the-art ArduPlaneproportional-integral-derivative (PID) attitude controller with no further online learning required. To better understand the operation of the learned controller we present an analysis of its behaviour, including a comparison to the existing well-tuned PID controller.
TECHNOLOGY
Time-Varying Channel Prediction for RIS-Assisted MU-MISO Networks via Deep Learning

To mitigate the effects of shadow fading and obstacle blocking, reconfigurable intelligent surface (RIS) has become a promising technology to improve the signal transmission quality of wireless communications by controlling the reconfigurable passive elements with less hardware cost and lower power consumption. However, accurate, low-latency and low-pilot-overhead channel state information (CSI) acquisition remains a considerable challenge in RIS-assisted systems due to the large number of RIS passive elements. In this paper, we propose a three-stage joint channel decomposition and prediction framework to require CSI. The proposed framework exploits the two-timescale property that the base station (BS)-RIS channel is quasi-static and the RIS-user equipment (UE) channel is fast time-varying. Specifically, in the first stage, we use the full-duplex technique to estimate the channel between a BS's specific antenna and the RIS, addressing the critical scaling ambiguity problem in the channel decomposition. We then design a novel deep neural network, namely, the sparse-connected long short-term memory (SCLSTM), and propose a SCLSTM-based algorithm in the second and third stages, respectively. The algorithm can simultaneously decompose the BS-RIS channel and RIS-UE channel from the cascaded channel and capture the temporal relationship of the RIS-UE channel for prediction. Simulation results show that our proposed framework has lower pilot overhead than the traditional channel estimation algorithms, and the proposed SCLSTM-based algorithm can also achieve more accurate CSI acquisition robustly and effectively.
COMPUTERS
Learn-Morph-Infer: a new way of solving the inverse problem for brain tumor modeling

Ivan Ezhov, Kevin Scibilia, Katharina Franitza, Felix Steinbauer, Suprosanna Shit, Lucas Zimmer, Jana Lipkova, Florian Kofler, Johannes Paetzold, Luca Canalini, Diana Waldmannstetter, Martin Menten, Marie Metz, Benedikt Wiestler, Bjoern Menze. Current treatment planning of patients diagnosed with brain tumor could significantly benefit by accessing the spatial distribution of tumor cell...
CANCER
Dealing with the Unknown: Pessimistic Offline Reinforcement Learning

Reinforcement Learning (RL) has been shown effective in domains where the agent can learn policies by actively interacting with its operating environment. However, if we change the RL scheme to offline setting where the agent can only update its policy via static datasets, one of the major issues in offline reinforcement learning emerges, i.e. distributional shift. We propose a Pessimistic Offline Reinforcement Learning (PessORL) algorithm to actively lead the agent back to the area where it is familiar by manipulating the value function. We focus on problems caused by out-of-distribution (OOD) states, and deliberately penalize high values at states that are absent in the training dataset, so that the learned pessimistic value function lower bounds the true value anywhere within the state space. We evaluate the PessORL algorithm on various benchmark tasks, where we show that our method gains better performance by explicitly handling OOD states, when compared to those methods merely considering OOD actions.
CODING & PROGRAMMING
Dueling RL: Reinforcement Learning with Trajectory Preferences

We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional reinforcement learning, an agent receives feedback only in terms of a 1 bit (0/1) preference over a trajectory pair instead of absolute rewards for them. The success of the traditional RL framework crucially relies on the underlying agent-reward model, which, however, depends on how accurately a system designer can express an appropriate reward function and often a non-trivial task. The main novelty of our framework is the ability to learn from preference-based trajectory feedback that eliminates the need to hand-craft numeric reward models. This paper sets up a formal framework for the PbRL problem with non-markovian rewards, where the trajectory preferences are encoded by a generalized linear model of dimension $d$. Assuming the transition model is known, we then propose an algorithm with almost optimal regret guarantee of $\tilde {\mathcal{O}}\left( SH d \log (T / \delta) \sqrt{T} \right)$. We further, extend the above algorithm to the case of unknown transition dynamics, and provide an algorithm with near optimal regret guarantee $\widetilde{\mathcal{O}}((\sqrt{d} + H^2 + |\mathcal{S}|)\sqrt{dT} +\sqrt{|\mathcal{S}||\mathcal{A}|TH} )$. To the best of our knowledge, our work is one of the first to give tight regret guarantees for preference based RL problems with trajectory preferences.
CODING & PROGRAMMING
Evolving Reinforcement Learning Agents Using Genetic Algorithms

Utilizing evolutionary methods to evolve agents that can outperform state-of-the-art Reinforcement Learning algorithms in Python. I started this project with the intention of applying genetic algorithms to predictive or classification neural networks. After some testing, I noticed that a genetic algorithm was able to minimize the loss function fast but when I ran it on the test data, it failed terribly. I could have probably spent more time trying different techniques and methods to improve it but this problem sparked the idea of applying the same concept but for a Reinforcement Learning Environment where the problem simply cannot exist.
CODING & PROGRAMMING
On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods

The increasing adoption of Reinforcement Learning in safety-critical systems domains such as autonomous vehicles, health, and aviation raises the need for ensuring their safety. Existing safety mechanisms such as adversarial training, adversarial detection, and robust learning are not always adapted to all disturbances in which the agent is deployed. Those disturbances include moving adversaries whose behavior can be unpredictable by the agent, and as a matter of fact harmful to its learning. Ensuring the safety of critical systems also requires methods that give formal guarantees on the behaviour of the agent evolving in a perturbed environment. It is therefore necessary to propose new solutions adapted to the learning challenges faced by the agent. In this paper, first we generate adversarial agents that exhibit flaws in the agent's policy by presenting moving adversaries. Secondly, We use reward shaping and a modified Q-learning algorithm as defense mechanisms to improve the agent's policy when facing adversarial perturbations. Finally, probabilistic model checking is employed to evaluate the effectiveness of both mechanisms. We have conducted experiments on a discrete grid world with a single agent facing non-learning and learning adversaries. Our results show a diminution in the number of collisions between the agent and the adversaries. Probabilistic model checking provides lower and upper probabilistic bounds regarding the agent's safety in the adversarial environment.
COMPUTERS
Uncovering Material Deformations via Machine Learning Combined with Four-Dimensional Scanning Transmission Electron Microscopy

Chuqiao Shi, Michael C. Cao, Sarah M. Rehn, Sang-Hoon Bae, Jeehwan Kim, Matthew R. Jones, David A. Muller, Yimo Han. Understanding lattice deformations is crucial in determining the properties of nanomaterials, which can become more prominent in future applications ranging from energy harvesting to electronic devices. However, it remains challenging to reveal unexpected deformations that crucially affect material properties across a large sample area. Here, we demonstrate a rapid and semi-automated unsupervised machine learning approach to uncover lattice deformations in materials. Our method utilizes divisive hierarchical clustering to automatically unveil multi-scale deformations in the entire sample flake from the diffraction data using four-dimensional scanning transmission electron microscopy (4D-STEM). Our approach overcomes the current barriers of large 4D data analysis and enables extraction of essential features even without a priori knowledge of the sample. Using this purely data-driven analysis, we have uncovered different types of material deformations, such as strain, lattice distortion, bending contour, etc., which can significantly impact the band structure and subsequent performance of nanomaterials-based devices. We envision that this data-driven procedure will provide insight into the intrinsic structures and accelerate the discovery of novel materials.
CHEMISTRY
Explainable Deep Reinforcement Learning for Portfolio Management: An Empirical Approach

Deep reinforcement learning (DRL) has been widely studied in the portfolio management task. However, it is challenging to understand a DRL-based trading strategy because of the black-box nature of deep neural networks. In this paper, we propose an empirical approach to explain the strategies of DRL agents for the portfolio management task. First, we use a linear model in hindsight as the reference model, which finds the best portfolio weights by assuming knowing actual stock returns in foresight. In particular, we use the coefficients of a linear model in hindsight as the reference feature weights. Secondly, for DRL agents, we use integrated gradients to define the feature weights, which are the coefficients between reward and features under a linear regression model. Thirdly, we study the prediction power in two cases, single-step prediction and multi-step prediction. In particular, we quantify the prediction power by calculating the linear correlations between the feature weights of a DRL agent and the reference feature weights, and similarly for machine learning methods. Finally, we evaluate a portfolio management task on Dow Jones 30 constituent stocks during 01/01/2009 to 09/01/2021. Our approach empirically reveals that a DRL agent exhibits a stronger multi-step prediction power than machine learning methods.
MARKETS
How to build a versatile marketing team

Every business needs some level of marketing support. From start-ups that need to introduce a new brand to large corporations that are navigating growth, the role of marketing is critical and ever-evolving. As your marketing, communications and advertising needs become more complex, so do the roles of an effective marketing...
ECONOMY
Bayesian Approach to Inverse Problems: an Application to NNPDF Closure Testing

We discuss the Bayesian approach to the solution of inverse problems and apply the formalism to analyse the closure tests performed by the NNPDF collaboration. Starting from a comparison with the approach that is currently used for the determination of parton distributions (PDFs) by the NNPDF collaboration, we discuss some analytical results that can be obtained for linear problems and use these results as a guidance for the more complicated non-linear problems. We show that, in the case of Gaussian distributions, the posterior probability density of the parametrized PDFs is fully determined by the results of the NNPDF fitting procedure. In the particular case that we consider, the fitting procedure and the Bayesian analysis yield exactly the same result. Building on the insight that we obtain from the analytical results, we introduce new estimators to assess the statistical faithfulness of the fit results in closure tests. These estimators are defined in data space, and can be studied analytically using the Bayesian formalism in a linear model in order to clarify their meaning. Finally we present numerical results from a number of closure tests performed with current NNPDF methodologies. These further tests allow us to validate the NNPDF4.0 methodology and provide a quantitative comparison of the NNPDF4.0 and NNPDF3.1 methodologies. As PDFs determinations move into precision territory, the need for a careful validation of the methodology becomes increasingly important: the error bar has become the focal point of contemporary PDFs determinations. In this perspective, theoretical assumptions and other sources of error are best formulated and analysed in the Bayesian framework, which provides an ideal language to address the precision and the accuracy of current fits.
COMPUTERS

