An Introduction to Reinforcement Learning

Reinforcement Learning has been one of the areas of machine learning I’ve been most interested in recently. Recent research results have shown promise across a range of applications, from mastery of the complex game of Go through to providing a means for efficient architecture search of neural networks. Many of these advances have been in combination with the application of deep learning, though it is still unclear how effective this really is. Nevertheless, the underlying theories and algorithms of the field are mathematically elegant, and the tools they describe provide an alternative way of aproaching many types of interesting problems that we deal with. This is the first in a series of articles which provide an overview of these topics for those interested yet new to this field.

These articles will largely follow the material taught in David Silver’s course, and Sutton and Barto (2nd ed) is also highly recommended as a resource for more in-depth reading.


Reinforcement learning is the process of training agents to interact with their environments, while maximizing their total reward from these interactions. This differs from the usual methods of supervised and usupervised learning in machine learning; supervised learning is the problem of generalizing knowledge from a set of labeled training examples, and unsupervised learning is the problem of recovering hidden structure from unlabeled training data. Neither of these types of learning are concerned with the maximization of a cumulative total reward, nor with the interaction between an agent and it’s environment. This interaction, and the balance between exploration and axploitation of an environment, is one of the primary challenges in reinforcement learning.

This field has been at the intersection of many other fields, including computer science, psychology, control theory, amongst others.

Markov Decision Processes

A fundamental assumption of reinforcement learning is that any problem can be modeled by a Markov Decision Process. We formally define an agent’s MDP as a tuple:

  • is a
  • is
Written on May 10, 2018