Category Archives

3 Articles

Experiment/Experimental Framework

AGI Experimental Framework

Posted by Gideon Kowadlo on

We’re very excited to launch AGI Experimental Framework, AGIEF, our open source framework.

We first introduced it a while back, at the end of 2015 here, and it has certainly come a long way.

AGIEF was created to make running rigorous AI experiments convenient, reproducible and scalable. The goals are:

  • Repeatability: ability to save/load, stop/start an experiment from any execution step, and know that it will execute deterministically
  • Visualisation: ability to visualise all the data structures at any step
  • Distributed operation for performance

The Github wiki and Readme describe the project in detail and how to get started.

The framework comprises 3 repositories.

agi – Java project comprising core algorithmic code and framework package to support compute nodes.

run-framework – Python scripts to run and interact with the compute nodes covering aspects such as generating input files, launching cloud infrastructure, running those experiments (locally or remotely), executing parameter sweeps and exporting and uploading the output artefacts.

experiment-definitions – contains the experiment definitions, the files required to run and repeat specific experiments.

AGI/Experimental Framework/MNIST/unsupervised learning

Region-Layer Experiments

Posted by ProjectAGI on
Region-Layer Experiments
Typical results from our experiments: Some active cells in layer 3 of a 3 layer network, transformed back into the input pixels they represent. The red pixels are positive weights and the blue pixels are negative weights; absence of colour indicates neutral weighting (ambiguity). The white pixels are the input stimulus that produced the selected set of active cells in layer 3. It appears these layer 3 cells collectively represent a generic ‘5’ digit. The input was a specific ‘5’ digit. Note that the weights of the hidden layer cells differ from the input pixels, but are recognizably the same digit.

We are running a series of experiments to test the capabilities of the Region-Layer component. The objective is to understand to what extent these ideas work, and to expose limitations both in implementation and theory.

Results will be posted to the blog and written up for publication if, or when, we reach an acceptable level of novelty and rigor.

We are not trying to beat benchmarks here. We’re trying to show whether certain ideas have useful qualities – the best way to tackle specific AI problems is almost certainly not an AGI way. But what we’d love to see is that AGI-inspired methods can perform close to state-of-the-art (e.g. deep convolutional networks) on a wide range of problems. Now that would be general intelligence!

Dataset Choice

We are going to start with the MNIST digit classification dataset, and perform a number of experiments based on that. In future we will look at some more sophisticated image / object classification datasets such as LabelMe or Caltech_101.

The good thing about MNIST is that it’s simple and has been extremely widely studied. It’s easy to work with the data and the images are a practical size – big enough to be interesting, but not so big as to require lots of preprocessing or too much memory. Despite only 28×28 pixels, variations in digit appearance gives considerable depth to the data (example digit ‘5’ above).

The bad thing about MNIST is that it’s largely “solved” by supervised learning algorithms. A range of different supervised techniques have reached human performance and it’s debatable whether any further improvements are genuine.

So what’s the point of trying new approaches? Well, supervised algorithms have some odd qualities, perhaps due to the narrowness of training samples or the specificity of the cost function. For example, the discovery of “adversarial examples” – images that look easily classifiable to the naked eye but cannot be classified correctly with a trained network because they exploit weaknesses in trained neural networks.

But the biggest drawback of supervised learning is the need to tell it the “correct” answer for every input. This has led to a range of techniques – such as transfer learning – to make the most of what training data is available, even if not directly relevant. But fundamentally, supervised learning is unlike the experience of an agent learning as it explores its world. Animals can learn without a tutor.

However, unsupervised results with MNIST are less widely reported. Partially this is because you need to come up with a way to measure the performance of an unsupervised method. The most common approach is to use unsupervised networks to boost the performance of a final supervised network layer – but in MNIST the supervised layer is so powerful it’s hard to distinguish the contribution of the unsupervised layers. Nevertheless, these experiments are encouraging because having a few unsupervised layers seems to improve overall performance, compared to all-supervised networks. In addition to the limited data problem with supervised learning, unsupervised learning actually seems to add something.

One possible method of capturing the contribution of unsupervised layers alone is the Rand Index, which measures the similarity between two clusters. However, we are intending to use a distributed representation where there will be overlap between similar representations – that’s one of the features of the algorithm!

So, for now we’re going to go for the simplest approach we can think of, and measure the correlation between the active cells in selected hidden layers and each digit label, and see if the correlation alone is enough to pick the right label given a set of active cells. If the concepts defined by the digits exist somewhere in the hierarchy, they should be detectable as features uniquely correlated with specific labels…

Note also that we’re not doing any preprocessing of the MNIST images except binarization at threshold 0.5. Since the MNIST dataset is very high contrast, hopefully the threshold doesn’t matter much: It’s almost binary already.

Sequence Learning Tests

Before we start the experiments proper we conducted some ad-hoc tests to verify the features of the Region-Layer are implemented as intended. Remember, the Region-Layer has two key capabilities:

  • Classification … of the feedforward input, and
  • Prediction … of future classification results (i.e. future internal states)

See here and here to understand the classification role, and here for more information about prediction. Taken together, the ability to classify and predict future classifications allows sequences of input to be learned. This is a topic we have looked at in detail in earlier blog posts and we have some fairly effective techniques at our disposal.

We completed the following tests:

  • Cycle 0,1,2: We verified that the algorithm could predict the set of active cells in a short cycle of images. This ensures the sequence learning feature is working. The same image was used for each instance of a particular digit (i.e. there was no variation in digit appearance).
  • Cycle 0,1,…,9: We tested a longer cycle. Again, the Region-Layer was able to predict the sequence perfectly.
  • Cycle 0,1,2,3, 0,2,3,1: We tested an ambiguous cycle. At 0, it appears that the next state can be 1 or 2, and similarly, at 3, the next state can be 1 or 2. However, due to the variable order modelling behaviour of the Region-Layer, a single Region-Layer is able to predict this cycle perfectly. Note that first-order prediction cannot predict this sequence correctly.
  • Cycle 0,1,2,3,1,2,4,0,2,3,1,2,1,5,0,3,2,1,4,5: We tested a complex graph of state sequences and again a single Region-Layer was able to predict the sequence perfectly. We also were able to predict this using only first order modelling and a deep hierarchy.

After completion of the unit tests we were satisfied that our Region-Layer component has the ability to efficiently produce variable order models of observed sequences using unsupervised learning, assuming that the states can reliably be detected.


Now we come to the harder part. What if each digit exemplar image is ambiguous? In other words, what if each ‘0’ is represented by a randomly selected ‘0’ image from the MNIST dataset? The ambiguity of appearance means that the observed sequences will appear to be non-deterministic.

We decided to run the following experiments:

Experiment 1: Random image classification

In this experiment there will be no predictable sequence; each digit must be recognized solely based on its appearance. The classic experiment is used: Up to N training passes over the entire MNIST dataset, followed by fixing the internal weights and a single pass to calculate the correlation between each active cell in selected hidden layer[s] and the digit labels. Then, a single pass over the test set recording, for each test input image, the most highly correlated digit label for each set of active hidden cells. The algorithm gets a “correct” result if the most correlated label is the correct label.

  • Passes 1-N: Train networks

Present each digit in the training set once, in a random order. Train the internal weights of the algorithm. Repeated several times if necessary.

  • Pass N+1: Measure correlation of hidden layer features with training images.

Present each digit in the training set once, in a random order. Accumulate the frequency with which each active cell is associated with each digit label. After all images have been seen, convert the observed frequencies to correlations.

  • Pass N+2: Predict label of test images. 

Present each digit in the testing set once, in a random order. Use the correlations between cell activity and training labels to predict the most likely digit label given the set of active cells in selected Region-Layer components (they are arranged into a hierarchy).

Experiment 2: Image classification & sequence prediction

What if the digit images are not in a random order? We can use the English language to generate a training set of digit sequences. For example, we can get a book, convert each character to a 2 digit number and select random appropriate digit images to represent each number.

The motivation for this experiment is to see how the sequence learning can boost image recognition: Our Region-Layer component is supposed to be able to integrate both sequential and spatial information. This experiment actually has a lot of depth because English isn’t entirely predictable – if we use a different book for testing, there’ll be lots of sub-sequences the algorithm has never observed before. There’ll be uncertainty in image appearance and uncertainty in sequence, and we’d like to see how a hierarchy of Region-Layer components responds to both. Our expectation is that it will improve digit classification performance beyond the random image case.

In the next article, we will describe the specifics of the algorithms we implemented and tested on these problems.

A final article will present some results.

AGI/AGIEF/Architecture/Experimental Framework

AGI Experimental Framework: A platform for AGI R&D

Posted by Gideon Kowadlo on

By Gideon Kowadlo and David Rawlinson


We’ve been building and testing AGI algorithms for the last few years. As the systems become more complex, we have found it ever more difficult to run meaningful experiments. To summarise, the main challenges are:

  • testing a version of the algorithm repeatedly and over some range of parameters or conditions,
  • scaling it up so that it can run quickly,
  • debugging: the complexity of the ‘brain’ makes visualising and interpreting its state almost as hard as the problem itself!

Platforms for testing AIs already exist, such as the Arcade Learning Environment. There are also a number of standard datasets and frameworks for testing them. What we want is a framework for understanding the behaviour of an AI that can be applied successfully to any problem – it is supposed to be an Artificial General Intelligence, after all. The goal isn’t to advance the gold-standard incrementally; instead we want to better understand the behaviour of algorithms that might work reasonably well on many different problems.

Whereas most AI testing frameworks are designed to facilitate a particular problem, we want to facilitate understanding of the algorithms used. Further, the algorithms will have complex internal state and be variably parameterised from small instances on trivial problems to large instances – comprising many computers – on complex problems. As such there will be a lot of emphasis on interfaces that allow the state of the algorithm to be explored.

These design goals mean that we need to look more at the enterprise and web-scale frameworks for distributed systems, than test harnesses for AIs. There’s a huge variety of tools out there: Distributed filesystems, cloud resourcing (such as Elastic Compute), and cluster job management (e.g. many scientific packages available in Python). We’ll design a framework with the capability to jump between platforms as available technologies evolve.

Developing distributed applications is significantly harder than single-process software. Synchronization and coordination is harder (c.f. Apache Zookeeper), and there’s a lot of crud to get right before you can actually get to the interesting bits (i.e. the AGI). We’re going to try to get the boring stuff done nicely, so that others can focus on the interesting bits!

Foundational Principles

  • Agent/World conceptualisation
    • For AGI, we have developed a system based around Experiments, with each Experiment having Agents situated in a World.
  • Reproducible
    • All data is persisted by default so that any experiment can be reproduced from any time step.
  • Easy to run and use
    • Minimal setup and dependencies.
    • No knowledge of the implementation is required to implement a custom module (primarily the intelligent Agent or World in which it operates).
  • Highly modular (Scalability)
    • Different parts of the system can be customised, extended or overridden independently.
    • Distributed architecture (Scalability)
    • Modules can be run on physically separated machines, without any modification to the interactions between modules (i.e. the programmer’s perspective is not affected by scaling of the system to multiple computers).
  • Easy to develop
    • Code is open source.
    • Code is well documented.
    • All API’s well documented and using standard protocols (at the moment RESTful, in future could be websockets or other).
  • Explorable / Visualisable
    • High priority placed on debugging and understanding of data rather than simply efficiency and throughput. We don’t yet know what the algorithm should look like!
    • All state is accessible, relations are can be explored.
    • Execution is on demand (step-by-step) or automatic (until criteria, or batches of experiments completed).
    • It must be easy for anyone to build a UI client that can explore the state of all parts of the system.

Conceptual Entities

We have defined a number of components that make up an experiment. We refer to these components as Entities, and give them a specific interface.

  • World
    • The simulated environment within which all the other simulated components exist.
  • Agent 
    • The intelligent agent itself. It operates within a World, and interacts with that World and (optionally) other Agents via a set of Sensors and Actuators.
  • Sensor
    • A means by which the Agent senses the world. The output is a function of a subset of the World state. For example, a unidirectional light sensor may provide the perceived brightness at the location of the sensor.
  • Actuator
    • A means by which an Agent acts on the World. The output is a simulated physical action. For example, a motor rotating a wheel.
  • Experiment
    • The Experiment Entity is a container for a World, and a set of Agents (each of which have a set of Sensors and Actuators), and an Objective Function which determines the terminating condition of the experiment (which may be a time duration).
  • Laboratory
    • A collection of Experiments that form a suite to be analysed collectively. This may be a set of Experiments that have similar setups with minor parameter variations.
  • ObjectiveFunction
    • The objective function computes metrics about the World and/or Agents that are necessary to provide Supervised Learning or Reinforcement Learning signals. It might instead provide a multivariate Optimization function. The ObjectiveFunction is a useful encapsulation because it is often easy to separate objective measurements from the AI that is needed to achieve them.


To enforce good design principles, the architecture is multi-layered and highly modular. Multiple layers (also known as multi-tier architecture) allows you to work with concepts that are at the appropriate level of abstraction, which simplifies development and use of the system.

Each entity is a module. Use of particular entities is optional and extensible. A user will inherit the entities that they choose, and implement the desired functionality. Another modularisation occurs with the AGIEF Nodes. They communicate via interprocess conventions so that components can be split between multiple host computers.

Interprocess communication occurs via a central interface called the Coordinator, which is a single point of contact for all Entities and the shared system state. This also enables graphical user interfaces to be built to control and explore the system.

These concepts are expanded in the sections below.

Design Considerations

The various components of the system may have huge in-memory data-structures. This is an important consideration for persisting state, distributed operation, and ability to visualise the state.

Processing to update the state of Worlds and Agents will be compute-intensive. Many AI methods can easily be accelerated by parallel execution. Therefore, the system can be broken down into many computing nodes, each tasked with performing a specific computational function on some part of the shared system state. We hope to support massively parallel hardware such as GPUs in these compute nodes.

We will write the bulk of the framework and initial algorithm implementations in Java. Others can extend on this, or develop against the framework in other languages. We will also write a graphical user interface using web technologies that will allow easy management of the system.

Perspectives on the system design

The architectural layers are shown in the diagram below.

Figure 1: ‘Architectural Layers’

Each layer is distinct, with strict separation. No layer has access to the layers above, which operate at a higher level of abstraction.

  • State:
    • State persistence: storage and retrieval of state of all parts of the system at every time step. This comprises the shared filesystem.
  • Interprocess:
    • Communications between all modules running in the system, locally and/or across a network.
    • Provides a single point of contact via a local interface, to any part of the system (which may be running in different physical locations), for both control signals and state.
  • Experiment:
    • Provides all of the entities that are required for an experiment. These are expanded shortly.
  • UI:
    • The user interface that an experimenter uses to run experiments, debug and visualise results.
    • The typical features would be:
      • set up parameters of an experiment,
      • run, stop, step through an experiment,
      • save/load an experiment,
      • visualise the state of any part of the experiment.
  • Specific Experiments:
    • This is defined by the person experimenting with the system. For example, a specific Agent that seeks light, a specific World that contains a light source, and an objective function that defines the time span for operation.

Another perspective on the design is to view the Services and Entities and their lines of communication. The diagram is colour coded to indicate Layers, as per the diagram above.

Figure 2: ‘Services and Entities’

The Coordinator and Database are services. The Coordinator is shown at the centre, as described earlier (Architecture section), being the primary point of contact for Entities and potentially other clients such as a Graphical User Interface.

A similar perspective is shown in an expanded diagram below that illustrates the Database API module and the distributed implementation of the Coordinator in the Interprocess layer, enabling Entities to run on separate machines. This is just one possible configurations; there can be multiple slaves, each with multiple entities.

Each bounding box indicates what we refer to as an AGIEF Node (or node for short). The node comprises a process that provides a context for execution of one or more entities, or other clients such as the GUI.
Figure 3: ‘AGIEF Nodes’

We looked at popular No-SQL web storage systems (basically key-value stores) which are very convenient and flexible due to the inherently dynamic, software-defined schemas and HTTP interfaces. However, we have a relatively static schema for our data, on which we will build utilities for managing experiments and visualising data. In addition, relational databases such as MySQL and PostgreSQL are beginning to offer HTTP interfaces as well. Whether we pick a NoSQL or Relational Database, we will require a HTTP interface.

A third perspective is the data model that represents the system in its entirety. This is the model implemented in the database.

Figure 4: ‘Data Model’

The data model stores the entire system state, including hierarchy and relationship between entities, as well as the state of each entity. With a RESTful API exposing the database, we have a shared filesystem accessible as a service, essential for distributed operation and restoring the system at any point in time.

Future Work

We will shortly be releasing an initial version of our framework and we’ll post about the technology choices we’ve made, and some alternatives. We’ll include a demonstration problem with the initial release and then start rolling out some more exciting algorithms and graphics, including lots of AI methods from the literature (we have hundreds in our old codebase ready to go).