#Reinforcement Learning

Cracking Random Number Generators using Machine Learning – Part 1: xorshift128

This blog post proposes an approach to crack Pseudo-Random Number Generators (PRNGs) using machine learning. By cracking here, we mean that we can predict the sequence of the random numbers using previously generated numbers without the knowledge of the seed. We started by breaking a simple PRNG, namely XORShift, following the lead of the post published in [1]. We simplified the structure of the neural network model from the one proposed in that post. Also, we have achieved a higher accuracy. This blog aims to show how to train a machine learning model that can reach 100% accuracy in generating random numbers without knowing the seed. And we also deep dive into the trained model to show how it worked and extract useful information from it.
Picture for Cracking Random Number Generators using Machine Learning – Part 1: xorshift128

DeepMind Introduces ‘RGB-Stacking’: A Reinforcement Learning Based Approach For Tackling Robotic Stacking of Diverse Shapes

For many people stacking one thing on top of another seems to be a simple job. Even the most advanced robots, however, struggle to manage many tasks at once. Stacking necessitates a variety of motor, perceptual, and analytical abilities and the ability to interact with various things. Because of the complexity required, this simple human job has been elevated to a “grand problem” in robotics, spawning a small industry dedicated to creating new techniques and approaches.
Picture for DeepMind Introduces ‘RGB-Stacking’: A Reinforcement Learning Based Approach For Tackling Robotic Stacking of Diverse Shapes

Shift from learning to support in the learning technology stack

Remote work is one of the key drivers governing the working and learning behaviors of employees. Between juggling work and life, formal learning could seem like the last thing on employees’ minds. Additionally, the emotional stress caused by the pandemic can badly hamper any motivation to learn something in classic terms.
Picture for Shift from learning to support in the learning technology stack

The Science Behind The Effectiveness Of Microlearning

Ever noticed how your attention wanders off midway through a meeting or lecture? Why can't people remember everything that they are being taught and trained to do? You may have a feeling that traditional learning methods aren’t the most valuable type. And to prove the point, recent studies have shown the deficiencies of these traditional methods and verified the effectiveness of a new approach: microlearning.

Parcel+Post Expo: Fizyr reveals latest developments in deep learning technology for automated picking robots

') } // --> Exhibiting its product in conjunction with EuroSort’s Split Tray Sorter, Fizyr’s AI demonstrated how the company has been able to automate the last manual link in the picking sortation chain. The ability to detect and pick up a box that is slightly out of place has so far been a uniquely human skill. To keep up with the increase in parcels over the past year, sortation centers would have had to employ a much greater number of people to do these very simple tasks. However, Fizyr’s technology rises to the challenge.

Examining learning coherence in group decision-making: triads vs. tetrads

This study examined whether three heads are better than four in terms of performance and learning properties in group decision-making. It was predicted that learning incoherence took place in tetrads because the majority rule could not be applied when two subgroups emerged. As a result, tetrads underperformed triads. To examine this hypothesis, we adopted a reinforcement learning framework using simple Q-learning and estimated learning parameters. Overall, the results were consistent with the hypothesis. Further, this study is one of a few attempts to apply a computational approach to learning behavior in small groups. This approach enables the identification of underlying learning parameters in group decision-making.

Tech News: uncomplicated learning | #education | #technology | #training

All over the world Covid-19 has crudely interrupted centuries-old teaching and learning practices resulting in about 1.3 billion children and students studying at home. As a result, education has changed radically, with online and remote learning on digital platforms becoming the norm during the Covid-19 pandemic. This largely unplanned and...

Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning 2nd Edition (Original Book by Richard S. Sutton,Andrew G. Barto) How to contribute and current situation (9/11/2021~) I have been working as a full-time AI engineer and barely have free time to manage this project any more. I want to make a simple guidance of how to response to contributions:

Review: Pretraining Representations for Data-Efficient Reinforcement Learning

In this week's Deep Learning Paper Review, we look at the following paper: Pretraining Representations for Data-Efficient Reinforcement Learning. In recent years, pretraining has proved to be an essential ingredient for success in the fields of NLP and computer vision. The idea is to first pretrain a general model in an unsupervised manner, before fine tuning it on smaller supervised datasets. This simultaneously makes the fine tuning part much more data efficient and achieves superior performance at the same time. Nevertheless, in the field of reinforcement learning, pretraining has yet to become the standard. As a result, RL algorithms are notoriously data inefficient: a simple Atari game requires tens of millions of frames of training data to converge to human performance. Intuitively, this is because the RL agent has to learn two difficult tasks at once: visual representation from raw pixels and learning the policy and value functions.

Training data: the milestone of machine learning

Machine learning is a type of AI that teaches machines how to learn, interpret and predict results based on a set of data. As the world — and internet — have grown exponentially in the past few years, Machine learning processes have become common for organizations of all kinds. For example, companies in the healthcare sector use ML to detect and treat diseases better, while in the farming sector Machine learning helps predict harvest yields.

Advance your career with AI and Machine Learning with this course

The significance of AI has increased rapidly over time, transforming industries and cutting across various sectors. According to a report by Accenture (AI Research: How AI Boosts Industry Profits and Innovation), the impact of AI technologies on business has the scope of increasing labor productivity by up to 40% which means that the skillset demand is on the surge, making it a must-have for growth-focused professionals.

Editorial: Is Texas failing to address COVID-19 learning loss?

State lawmakers hammered out a plan to help students overcome COVID-19 learning losses by requiring school districts to match them with high-performing teachers or to provide tutoring in a failing subject. The law, known as House Bill 4545, was designed with good intentions, but in practice, school leaders in North Texas say it is straining their resources beyond capacity.

Connection Management xAPP for O-RAN RIC: A Graph Neural Network and Reinforcement Learning Approach

Connection management is an important problem for any wireless network to ensure smooth and well-balanced operation throughout. Traditional methods for connection management (specifically user-cell association) consider sub-optimal and greedy solutions such as connection of each user to a cell with maximum receive power. However, network performance can be improved by leveraging machine learning (ML) and artificial intelligence (AI) based solutions. The next generation software defined 5G networks defined by the Open Radio Access Network (O-RAN) alliance facilitates the inclusion of ML/AI based solutions for various network problems. In this paper, we consider intelligent connection management based on the O-RAN network architecture to optimize user association and load balancing in the network. We formulate connection management as a combinatorial graph optimization problem. We propose a deep reinforcement learning (DRL) solution that uses the underlying graph to learn the weights of the graph neural networks (GNN) for optimal user-cell association. We consider three candidate objective functions: sum user throughput, cell coverage, and load balancing. Our results show up to 10% gain in throughput, 45-140% gain cell coverage, 20-45% gain in load balancing depending on network deployment configurations compared to baseline greedy techniques.

Automating Performance Tuning with Machine Learning

SRE's main goal is to achieve optimal application performance, stability, and availability. A crucial role is played by configurations (e.g. container resources limits and replicas, runtime settings, etc): wrong settings are among the top causes of poor performance, efficiency, and incidents. But tuning configurations is a very complex and manual task, as there are hundreds of settings in the stack. We present a novel approach that leverages machine learning to find optimal configurations of the tech stack in an automated fashion. This approach leverages reinforcement learning techniques to find the best configurations based on an optimization goal that SREs can define (e.g. minimize service latency or cloud costs). We show an example of optimizing Kubernetes microservice cost and latency tuning container resource and JVM options. We analyze the optimal configurations that were found, the most impactful parameters, and the lesson learned for tuning microservices.

Omni-Training for Data-Efficient Deep Learning

Learning a generalizable deep model from a few examples in a short time remains a major challenge of machine learning, which has impeded its wide deployment to many scenarios. Recent advances reveal that a properly pre-trained model endows an important property: transferability. A higher transferability of the learned representations indicates a better generalizability across domains of different distributions (domain transferability), or across tasks of different semantics (task transferability). Transferability has become the key to enable data-efficient deep learning, however, existing pre-training methods focus only on the domain transferability while meta-training methods only on the task transferability. This restricts their data-efficiency in downstream scenarios of diverging domains and tasks. A finding of this paper is that even a tight combination of pre-training and meta-training cannot achieve both kinds of transferability. This motivates the proposed Omni-Training framework towards data-efficient deep learning. Our first contribution is Omni-Net, a tri-flow architecture. Besides the joint representation flow, Omni-Net introduces two new parallel flows for pre-training and meta-training, respectively responsible for learning representations of domain transferability and task transferability. Omni-Net coordinates the parallel flows by routing them via the joint-flow, making each gain the other kind of transferability. Our second contribution is Omni-Loss, in which a mean-teacher regularization is imposed to learn generalizable and stabilized representations. Omni-Training is a general framework that accommodates many existing pre-training and meta-training algorithms. A thorough evaluation on cross-task and cross-domain datasets in classification, regression and reinforcement learning problems shows that Omni-Training consistently outperforms the state-of-the-art methods.

Offline Reinforcement Learning with Soft Behavior Regularization

Most prior approaches to offline reinforcement learning (RL) utilize \textit{behavior regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty measuring divergence between the policy and the offline data. However, these approaches lack guaranteed performance improvement over the behavior policy. In this work, we start from the performance difference between the learned policy and the behavior policy, we derive a new policy learning objective that can be used in the offline setting, which corresponds to the advantage function value of the behavior policy, multiplying by a state-marginal density ratio. We propose a practical way to compute the density ratio and demonstrate its equivalence to a state-dependent behavior regularization. Unlike state-independent regularization used in prior approaches, this \textit{soft} regularization allows more freedom of policy deviation at high confidence states, leading to better performance and stability. We thus term our resulting algorithm Soft Behavior-regularized Actor Critic (SBAC). Our experimental results show that SBAC matches or outperforms the state-of-the-art on a set of continuous control locomotion and manipulation tasks.
Dallas News

Is Texas failing to address COVID-19 learning loss?

State lawmakers hammered out a plan to help students overcome COVID-19 learning losses by requiring school districts to match them with high-performing teachers or to provide tutoring in a failing subject. The law, known as House Bill 4545, was designed with good intentions, but in practice, school leaders in North Texas say it is straining their resources beyond capacity.

Rockford University’s Center for Learning Strategies Receives Certification for Peer Tutoring Program

Rockford University issued the following announcement on Oct. 5. Rockford University is proud to announce that its Center for Learning Strategies (CLS) received the College Reading & Learning Association (CRLA) Level 1 certification for its Peer Tutoring Academic Support Program. CLS provides a range of academic support services to Rockford...

Summary: What is reinforcement learning?

Reinforcement learning is a branch in ML, which deals an agent trying to do something in an environment. The agent can be trying to start a fire, when stranded in an island, or the agent can be car…. Policy is used to calculate the probability in which an agent will...

What is an Optimizer

Optimizers are a group of tools that can be used to train a machine learning model. They use the same principles as gradient descent to find a locally optimal point in terms of a loss function. The use of optimizers in machine learning is currently being adopted by the big...