Our roadmap, and progress to date. The map has two principal stages – improved representations, and generating behaviour using those representations. The end goal is a “General Purpose Agent” capable of efficiently learning to perform many tasks in complex simulated or real environments. The agent will be able to use the same learning algorithms, representations and control systems in many applications and environments. At the “Representation” end, our research is divided into two streams – Semantic and Episodic memory. Together, these learned representations provide the necessary flexibility for many tasks.
Red filled areas are fully or partially complete to a satisfactory level of performance. It goes without saying that the final result will not be a superhuman AGI, but rather a software agent that has more general capabilities than current agents. Many additional areas of research necessary for AGI are not even mentioned – such as learning by imitation, transfer learning by language, and improvements in ML technology more broadly. This roadmap has been chosen to address the questions we believe are most crucial, while also having a reasonable chance of leading to a functional result in the near-term.
To force a separation between our work and the dominant trajectory of Machine Learning we self-impose "Biological Plausibility" constraints. That doesn't mean we need to do everything exactly as it is done in "Wet Brains": Rather, we aim to exclude the most implausible features of ML and develop biologically-plausible alternatives with similar computational qualities.Read more
Rather than focus on building control and planning systems explicitly, later moving towards more abstract reasoning systems, we take the approach of building a sophisticated simulation of the world first, and then exploiting this simulation for planning and goal-directed behaviour. Our strategy will be to learn to build predictive models, extend predictions many steps into the future in Generative mode, and then learn to control these predictions resulting in the ability to evaluate action choices over many time steps. This is our definition of mental simulation.Read more
In practice, we have found that sparse representations have many desirable properties including combinatoric representational capacity and the ability to represent novel combinations of concepts without additional learning.Read more
Disentangled latent variable learning
Capsules networks learn to describe observations using only a few parameters of a subset of cells. This bottleneck forces Capsule networks to disentangle the latent variables and entities in the world and describe them efficiently. These representations are more powerful and flexible than conventional neural models.Read more
Attetional filtering has recently delivered state of the art results in a number of tasks especially natural language modelling. However, the dominant approach to attentional filtering still relies on deep backpropagation to learn attentional strategies. To avoid biologically implausible time-travel we will use the Bellman equation (a key part of discounted reinforcement learning) to associate delayed rewards with their causes.Read more
Rodney Brooks' early work on the subsumption archhitecture and the importance of embodiment in robotics still resonates with us today. Many of the capabilities and features we intend to demonstrate only fit into an agent-based paradigm, which means we must develop an environment for it to operate in. Fortunately, these days there are a number of excellent (mostly RL-based) simulated environments to play with, such as the OpenAI Gym and the AnimalAI Olympics.Read more
One of the most biologically-implausible aspects of conventional ML is the use of deep-backpropagation as the core mechanism for credit assignment (i.e. learning network parameters). We then face the considerable challenge of building entirely different learning systems that are still able to perform as effectively as those trained with deep-BP.Read more
One of the key constraints we apply is that our methods must learn continually even in nonstationary environments (i.e. the statistics of the data may change at any time, and learning effectiveness must not be impaired by this). Many ML methods fail to function well in nonstationary environments and therefore cannot learn continually. The criterion of "lifelong" learning is also often used, meaning that learning cannot stop.Read more
We assume that external feedback can only take the form of sparse (rare) feedback, whether in the form of sparse labels or rewards. In addition we require that our methods function with only rewards (the quality of a state or action) rather than the ideal state or action. This matches the conditions in which people learn.Read more
Unsupervised or Self-Supervised Learning
Given only sparse global rewards, the vast majority of learning must be either unsupervised or self-supervised. One of the most promising learning rules may be Predictive Coding - learning to predict the next input, whatever it is. The rule needs only local information and naturally learns the dynamics of observed sensor input.Read more
One of the key inspirations for our work was Jeff Hawkins' elimination of the symbol-grounding problem by redefining it as an incremental process of accumulating invariances. Later, researchers generalized this to include notions of equivariance (Hinton) and disentanglement (Y. Bengio). Whichever qualities are produced in the process of abstraction, the transformation must be incremental (and reversible) not discrete.Read more
Having established that representations must be symbolic to varying degrees, and building on the effective functioning of deep neural networks, it is natural that representations should be hierarchical - that is, concepts at different levels in a graph should have varying scope and specificity. For efficiency, the same representation and hierarchy should be used for planning - meaning that plans would not exist as a series of steps but as a set of features in the graph that are incrementally unfolded an developed to match observed reality. This is the process by which we hope to translate mental simulations to effective action in simulated environments.Read more