Reading list – October 2017

This month’s reading list has two parts: a non-Reinforcement Learning list, and a Reinforcement Learning list. Since our next blog post will be on Reinforcement Learning, readers might like to refer to our RL reading list separately.

Non-Reinforcement Learning reading list

  • A Framework for searching for General Artificial Intelligence

Authors: Marek Rosa, Jan Feyereisl, The GoodAI Collective
Type: arXiv preprint
Publication date: 2 November 2016

This document outlined a framework that provides a unified collection of principles, ideas, definition and formalisations for the development of general AI. It aims to define a basis on which AI researchers can build. The authors also proposed strategies to build and educate general AI systems quickly and effectively through gradual and guided learning. The document also identified a list of next steps that includes research areas that AI researchers should focus on to maximise the progress in this field. By the authors’ own admission, the document is a work in progress and should be regarded as an early (but important) analytic stage which will eventually be developed into a robust framework for building general AI.

  • Guidelines for Artificial Intelligence Containment

Authors: James Babcock, Janos Kramar, Roman V. Yampolskiy
Type: arXiv preprint
Publication date: 24 July 2017

This paper proposed a set of guidelines that will help AI safety researchers develop reliable sandboxing (or containment) software. These are software that will enable the study and analysis of intelligent artificial systems while safeguarding against unwanted problems that could potentially threaten human existence. The authors identified seven major subproblems to the AI containment issue: threat modeling, navigating the security/usability tradeoff, concrete mechanisms, sensitive information management, human factors, tripwires and graceful degradation. The paper also outlined several evaluation and success criteria for an AI containment software.

  • Learning with Opponent-Learning Awareness (LOLA)

Authors: Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch
Type: arXiv preprint
Publication date: 13 September 2017

LOLA, a new research by researchers at OpenAI and the University of Oxford, lets an RL agent take account of the learning of others when updating its own strategy. Each LOLA agent adjusts its policy in order to shape the learning of the other agents in a way that is advantageous. Results showed that the encounter of two LOLA agents leads to the emergence of self-interested yet collaborative (‘tit-for-tat’) behaviour in the iterated prisoner’s dilemma. While state-of-the-art deep reinforcement learning methods typically learn to take selfish actions that disregard the goals of other agents, LOLA agents solves this by learning to cooperate out of selfish interests.

  • CommAI: Evaluating the first steps towards a useful general AI

Authors: Marco Baroni, Armand Joulin, Allan Jabri, Germàn Kruszewski, Angeliki Lazaridou, Klemen Simonic, Tomas Mikolov
Type: arXiv preprint
Publication date: 27 March 2017

This paper introduced a new framework developed by researchers from Facebook AI research to evaluate the progress towards general AI. Their approach emphasises the development of AIs that are useful for us and focuses on communication-based tasks. The four evaluation criteria they proposed are: 1) communication through natural language – we must be able to communicate with it; 2) learning to learn – ability to master various new tasks; 3) feedback – ability to master complex tasks with decreasing amounts of reward; 4) interface – the machine should learn the best way to process different kinds of input and output. The authors have developed an open source CommAI-env platform for training and testing AI systems.

  • The hippocampus as a predictive map

Authors: Kimberly L Stachenfeld, Matthew M Botvinick, Samuel J Gershman
Type: Article in Nature Neuroscience
Publication date: 2 October 2017

This paper proposed a new representation for hippocampal functions in the form of a predictive map. The hippocampus has long been thought to encode a cognitive map, but researchers are uncertain about the exact nature of this map. The traditional view is that the map is essentially spatial. The authors argued instead that the map is predictive, representing each state in terms of its successor states. They formalised a predictive function in a reinforcement learning framework, providing a novel perspective on how the hippocampus supports adaptive behaviours.

  • The Seven Deadly Sins of Predicting the Future of AI

Authors: Rodney Brooks
Type: Essay
Publication date: 7 September 2017

In a detailed and insightful essay, Rodney Brooks outlined the seven common ways of thinking that lead to mistaken predictions about Artificial Intelligence. Brooks first lists the four popular topics of predictions – Artificial General Intelligence, The Singularity, Misaligned Values and Humanity Destroying AI entities. He then explained the seven errors that we tend to make, all of them influence the assessments about the timescale for and likelihood of each of the four scenarios mentioned before.

The other essays in his series ‘Future of Robotics and Artificial Intelligence’ are well worth reading too.

Reinforcement Learning reading list

  • Autonomous Quadrotor Landing using Deep Reinforcement Learning

Authors: Riccardo Polvara, Massimiliano Patacchiola, Sanjay Sharma, Jian Wan, Andrew Manning, Robert Sutton, Angelo Cangelosi
Type: arXiv preprint
Publication date: 11 September 2017

A group of UK researchers has created a semi-autonomous quadcopter that can learn to navigate to a landmark and land on it. Their approach is based on deep reinforcement learning that applied a hierarchy Deep Q-Network (DQN). Learning is achieved without human supervision. Two networks are used to let the drones achieve their goals, one for landmark spotting and another for vertical descent. The DQN outperformed human pilots in certain conditions.

  • An Empirical Study of AI Population Dynamics with Million-agent Reinforcement Learning

Authors: Yaodong Yang, Lantao Yu, Yiwei Bai, Jun Wang, Weinan Zhang, Ying Wen, Yong Yu
Type: arXiv preprint
Publication date: 5 October 2017

This paper described the experiment by a group of researchers from University College London and Shanghai Jiao Tong University to put AI agents into a simulated natural context in order to understand their dynamics at the population level. The simulation mimics a predator-prey setting and involves up to a million entities. The agents are trained via deep reinforcement learning. Results showed that what happened closely resembled what happens in real life – while agents are driven by self interest, those who collaborate with each other gain an upper hand. This simulated behaviour is similar to the Lotka-Volterra model used to explain phenomena in a natural world and suggests that such systems could let us simulate dynamic problems where behaviours emerge through learning rather than programming.

  • A Deep Reinforcement Learning Chatbot

Authors: Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Mudumba, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio
Type: arXiv preprint
Publication date: 7 September 2017

This paper presented MILABOT, a deep reinforcement learning chatbot developed by researchers from the Montreal Institute of Learning Algorithm (MILA), which was entered into Amazon’s Alexa Prize competition. The bot is capable of conversing with humans through both speech and text in open-ended conversational interactions, e.g. small talk topics. MILABOT uses an ensemble of natural language generation and retrieval models. It then applies reinforcement learning to crowdsourced data and real-world user interactions to work out during training how to select between the different models.

  • Deep Reinforcement Learning that Matters

Authors: Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, David Meger
Type: arXiv preprint
Publication date: 19 September 2017

This paper tackled the problem of reproducibility in deep reinforcement learning research. The authors argue that reproducing results for state-of-the-art deep RL methods is seldom straightforward. They investigated the challenges associated with reproducibility, including factors that are intrinsic (e.g. random seeds, environment properties) and extrinsic (e.g. hyperparameters, codebases). They found highly varied results due to intrinsic sources that illustrate the need for proper significance testing and proposed a few such methods to make future deep RL research more reproducible.

  • Hybrid Reward Architecture for Reinforcement Learning

Authors: Harm van Seijen, Mehdi Fatemi, Joshua Romoff, Romain Laroche, Tavian Barnes, Jeffrey Tsang
Type: arXiv preprint
Publication date: 13 June 2017

One of the major challenges of reinforcement learning is to scale methods such that they can be applied to large, real-world problems. This paper proposed a new method called Hybrid Reward Architecture (HRA) to solve this generalisation problem. The main strategy for constructing a training objective is by decomposing the reward function of the environment into several different reward functions and assign each of them to a separate reinforcement learning agent. The authors tested HRA on a toy-problem and the Atari game Ms. Pac-Man and achieved above-human performance.

  • Imagination-augmented agents for deep reinforcement learning

Authors: Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li, Razvan Pascanu, Peter Battaglia, David Silver, Daan Wierstra
Type: arXiv preprint
Publication date: 19 July 2017

Researchers from Google’s DeepMind developed Imagination-Augmented Agents (I2As), a new deep reinforcement learning architecture that incorporates elements of imaginative planning. Their approach combines aspects from both model-free and model-based learning. The agents learn in an end-to-end way that extract useful knowledge gathered from model simulations and do not rely exclusively on simulated returns. Their method enables the agent to take advantage of model-based imagination without the drawbacks of conventional model-based planning.

Also published on Medium.

Yi-Ling Hwong