Category Archives

3 Articles

Predictive Coding/pyramidal cell/Rao & Ballard/unsupervised learning

Pyramidal Neurons and Predictive Coding

Posted by David Rawlinson on

Today’s post tries to fit the theoretical concept of Predictive Coding with the unusual structure and connectivity of Pyramidal cells in the Neocortex.

A reconstruction of a pyramidal cell (source: Wikipedia / Wikimedia Commons). Soma and dendrites are labeled in red, axon arbor in blue. 1) Soma (cell body) 2) Basal dendrite (feed-forward input) 3) Apical dendrite (feed-back input) 4) Axon (output) 5) Collateral axon (output).

Pyramidal neurons

Pyramidal neurons are interesting because they are one of the most common neuron types in the computational layers of the neocortex. This almost certainly means they are critical to many of the key cortical functions, such as forming representations of knowledge and reasoning about the world.

Anatomy of a Pyramidal Neuron

Pyramidal neurons are so-called because they tend to have a triangular body (soma). But this isn’t the most interesting feature! While all neurons have dendrites (inputs) and at least one axon (output), Pyramidal cells have more than one type of input – Basal and Apical dendrites.

Apical Dendrite

Pyramidal neurons tend to have a single, long Apical dendrite that extends with few forks a long way from the body of the neuron. When it reaches layer 1 of the cortex (which contains mostly top-down feedback from cortical areas that are believed to represent more abstract concepts), the apical dendrite branches out. This suggests the apical dendrite likes to receive feedback input. If feedback represents more abstract, longer-term context, then this data would be useful for predicting bottom-up input. More on this later.

Basal Dendrites

Pyramidal cells tend to have a few Basal dendrites that branch almost immediately, in the vicinity of the cell body. Note that this means the input provided to basal and apical dendrites is physically separated. We know from analysis of cortical microcircuits that axons terminating around the body of pyramidal cells in cortex layers 2 and 3 contain bottom-up data that is propagating in a feed-forward direction – i.e. information about the external state of the world.


Pyramidal cells have a single Axonal output that may fork, and may travel a very long distance to its targets including other areas of the cortex.

Predictive Coding

Predictive Coding (PC) is a method of transforming data from its original form, to a representation in terms of prediction errors. There’s not much interest in PC In the Machine Learning community, but in Neuroscience there is substantial evidence that the Cortex encodes information in this way. Similar but unrelated concepts have also been used for efficient compression of data in signal processing. The benefit of this transformation is due to compression: We assume that only prediction errors are important, because by definition, everything else can be predicted and is therefore sufficiently described elsewhere.

There are several research groups looking at computational models of Predictive Coding – in particular those of Karl Friston and Andy Clark.

Two uses for feedback

Assuming feedback contains a more processed and abstract representation of a broader set of data, it has two uses.

  • Prediction for a more efficient representation of the world (e.g. Predictive Coding)
  • Prediction for more robust interpretation (via integration of top-down information in perception)

Predictive coding aims to transform the representation inside the cortex to a more efficient one that encodes only the relationships between prediction errors. Take some time to decide for yourself whether this loses anything…!

But there are many perceptual phenomena that show how internal state affects perception and interpretation of external input. For example, the phenomenon of multistable perception in some visual illusions: We need to know what we’re looking for before we can see it, and we can deliberately change from one interpretation to another (see figure).

A Necker Cube: This object can be interpreted in two distinct ways; as a cube from slightly above or slightly below. With a little practice you can easily switch between interpretations. One explanation of this is that a high-level decision as to the preferred interpretation is provided as feedback to hierarchically-lower processing areas.

Now consider Bayesian inference, such as Belief Propagation, or Markov Random Fields – in all cases we combine a Prior (e.g. top-down feedback) with a Likelihood produced from current, bottom-up data. Good inference depends on effective integration of both inputs.

Ideally we would be able to resolve how both the modelling and inference benefits could be realized in the pyramidal cell, and how physical segregation of apical & basal dendrites might help this happen.

False-Negative Error Coding

The simplest scheme for predictive coding is simply to propagate only false-negative errors – where something was observed, but it was not predicted in advance. In this encoding, if the event was predicted, simply suppress any output. (Note: This assumes that another mechanism limits the number of false-positive errors – for example a homeostatic system to limit the total number of predictions.)

When a neuron fires, it represents a set of coincident input on a number of synapses. A pattern of input was observed. If the neuron was in a “predicted” state, immediately prior to firing, then we could safely suppress the output and achieve a simple predictive coding scheme. If a neuron is not in a predicted state when it fires, then the output should be propagated as normal.

False-Negative Error Coding in Pyramidal Cells

Since Pyramidal cells have 2 distinct inputs – basal and apical dendrites – we can implement the false negative coding as follows:

  • Basal dendrites recognize patterns of bottom-up input; the neuron “represents” those patterns by generating a spike on its axonal output when stimulated by the basal dendrites.
  • Apical dendrite learns to detect input that allows the cell’s spiking to be predicted. The apical dendrite determines the “predicted” state of the cell. Top-down feedback input is used for this purpose.
  • If the cell is “predicted” when the basal dendrite tries to generate an output, then suppress that output.
  • The cell internally self-regulates to ensure that it is rarely in a predicted state, and typically only at the right times.
  • Physical segregation of the two dendrite types ensures that they can target feedback data for prediction and feed-forward data for classification.

Spike bursts (spike trains)

When Pyramidal cells fire, they usually don’t fire just once. They tend to generate a short sequence of spikes known as a “burst” or “train”. So it’s possible that False-Negative coding doesn’t completely eliminate the spike, but rather truncates the sequence of spikes to make the output far less significant and less likely to significantly drive activity in other cells. There may also be some benefit to being able to broadcast the event in a subtle way, perhaps as a form of timing signal.

Time series plots of typical spike trains produced by pyramidal cells.

So to evidence this theory, we could look for truncated or absent spike trains in presence of predictive input to the apical dendrite. Specifically, to observe that input causing a spike in the apical dendrite truncates or eliminates an expected spike train resulting from basal stimulation.

Is there any direct neurological evidence for different integration of spikes from Apical and Basal dendrites in Pyramidal cells? It turns out, yes, there is! Metz, Spruston and Martina [1] say: “… our data present evidence for a dendritic segregation of Kv1-like channels in CA1 pyramidal neurons and identify a novel action for these channels, showing that they inhibit action potential bursting by restricting the size of the [afterdepolarization]”.

Now for the AI/ML audience it’s necessary to translate this a bit. An “action potential” “occurs when the membrane potential (voltage) of a specific axon location rapidly rises and falls. Action potentials in neurons are also known as “nerve impulses” or “spikes” So bursting is the generation of a short sequence of rapid spikes.

So in other words, apical stimulation inhibits bursts of axonal output spikes from a pyramidal neuron. There’s our smoking gun!

According to this paper, the Apical dendrite uniquely inhibits the spike burst from soma (the basal dendrites don’t). This matches the behaviour we would expect, if pyramidal cells implement false-negative predictive coding via the different inputs to different dendrite types: If the apical dendrite fires, there’s no axonal burst. If there wasn’t a spike in the apical dendrite, but basal activity drives the cell over its threshold, then the cell output does burst.

Note there are many other papers with similar claims; we found that search terms such as “differential basal apical dendrite integration” to be helpful.

[1] “Dendritic D-type potassium currents inhibit the spike afterdepolarization in rat hippocampal CA1 pyramidal neurons”
Alexia E. Metz, Nelson Spruston and Marco Martina. J. Physiol. 581.1 pp 175–187 (2007)


We’ve seen how we might combine the observed phenomena of multistable perception via separation of feedback and feed-forward input to the basal and apical dendrites, and predictive coding, via a simple model of pyramidal cell function by false-negative error coding.

Unlike existing models of predictive coding within the cortex, which often posit separate populations of cells representing predictions and residual errors (e.g. Rao and Ballard, 1999), we have proposed that coding could occur within the known biology of individual pyramidal cells, due to the different integration of apical and basal dendrite activity. At the same time, the proposed method allows feedback and feedforward information to be integrated within the same mechanism.

Over the next few months we’ll be testing some of these ideas in simulation!

Baar/CLA/Cortical Learning Algorithm/Friston/Global Workspace/Lee & Mumford/Predictive Coding/Rao & Ballard/Ryan McCall

Cortical Learning Algorithms with Predictive Coding for a Systems-Level Cognitive Architecture

Posted by ProjectAGI on
This is a quick post to link a poster paper by Ryan McCall, who has experimented with a Predictive-Coding / Cortical Learning Algorithm (PC-CLA) hybrid approach. We found the paper via Ryan writing to the NUPIC theory mailing list.

What’s great about the paper is it links to some of the PC papers we mentioned in a previous post and covers all the relevant literature, with clear and detailed descriptions of key features of each method.

So we have Lee & Mumford, Rao and Ballard, Friston (Generalized Filtering)… It’s also nice to see Baar’s Global Workspace Theory and LIDA (a model of consciousness or, at least, attention).

Ryan has added a PC-CLA module to LIDA and tested robustness to varying levels of input noise. So, early days with the experiments but great start. 

CLA/Generative Models/Hierarchical Generative Models/HTM/Predictive Coding/Rao & Ballard/Temporal Pooling

On Predictive Coding and Temporal Pooling

Posted by ProjectAGI on


Predictive Coding (PC) is a popular theory of cortical function within the neuroscience community. There is considerable biological evidence to support the essential concepts (see e.g. “Canonical microcircuits for predictive coding” by Bastos et al).

PC describes a method of encoding messages passed between processing units. Specifically, PC states that messages encode prediction failures; when prediction is perfect, there is no message to be sent. The content of each message is the error produced by comparing predictions to observations.

A good introduction to the various theories and models under the PC umbrella has been written by Andy Clark (“Whatever next? Predictive brains, situated agents, and the future of cognitive science”). As Clark explains, the history of the PC concept goes back at least several decades to Ashby, quote: “The whole function of the brain is summed up in: error correction.” Mumford pretty much nailed the concept back in 1992, before it was known as predictive coding (the cited paper gives a good discussion of how the neocortex might implement a PC-like scheme).

The majority of PC theories also model uncertainty explicitly, using Bayesian principles. This is a natural fit when providing explicit messaging of errors and attempting to generate predictions. Of course, it is also a robust framework for generative models.

It can be difficult to search for articles regarding PC because a similar concept exists in Signal Processing, although this seems to be coincidental, or at least the connection goes back beyond our reading. Unfortunately, many articles on the subject are written at a high level and do not include sufficient detail for implementation. However, we found work by Friston et al (example) and Rao et al (example, example) to be well described, although the former is difficult to grasp if one is not familiar with dynamical systems theory.

Rao’s papers include application of PC to visual processing and Friston’s work includes both the classification of birdsong and extends the concept to the control of motor actions. Friston et al wrote a paper titled “Perceptions as hypotheses; saccades as experiments” in which they suggest that actions are carefully chosen to optimally reduce uncertainty in internal predictive models. The PC concept throws up interesting new perspectives on many topics!

Comparison to MPF/CLA

There are significant parallels between MPF/CLA and PC. Both postulate a hierarchy of processing units with FeedForward (FF) and reciprocal FeedBack (FB) connections. MPF/CLA explicitly aims to produce increasingly stable FF signals in higher levels of the hierarchy. MPF/CLA tries to do this by identifying patterns via spatial and temporal pooling, and replacing these patterns with a constant signal.

Many PC theories create “hierarchical generative models” (e.g. Rao and Ballard). Hierarchical is enforced by restrictions on the topology of the model. The generative part refers to the fact that variables (in the Bayesian sense), in each vertex of the model, are defined by identification of patterns in input data. This agrees with MPF/CLA.

Both MPF/CLA and PC posit that processing units use FB data from higher layers to improve local prediction. In conjunction with local learning, this serves to reduce errors and therefore, in PC also stabilizes FF output.

In MPF/CLA it is assumed that cells’ input dendrites determine the set of inputs the cell represents. This performs a form of Spatial Pooling – the cell comes to represent a set of input cells firing simultaneously, and hence the cell becomes a label or symbol representing that set. In PC it is similarly assumed that the generative model will produce objects (cells, variables) that represent combinations of inputs.

However, MPF/CLA and PC differ in their approach to Temporal Pooling, i.e. changes in input over time.

Implicit Temporal Pooling

Predictive coding does not expressly aim to produce stability in higher layers, but increasing stability over time is an expected side-effect of the technique. Assuming successful learning within a processing unit, its FF output will be stable (no signal) for the duration of any periods of successful prediction.

Temporal Pooling in MPF/CLA attempts to replace FF input with a (more stable) pattern that is constantly output for the duration of some sequence of events. In contrast, PC explicitly outputs prediction errors whenever they occur. If errors do not occur, PC does not produce any output, and therefore the output is stable. A similar outcome has occurred, but via different processes.

Since the content of PC messages differs to MPF/CLA messages, it also changes the meaning of the variables defined in each vertex of the hierarchy. In MPF/CLA the variables will represent chains of sequences of sequences … in PC, variables will represent a succession of forks in sequences, where prediction failed.

So it turns out that Predictive Coding is an elegant way to implement Temporal Pooling.

Benefits of Predictive Coding

Where PC gets really interesting is that the amplitude or magnitude of the FF signal corresponds to the severity of the error.  A totally unexpected event will cause a signal of large amplitude, whereas an event that was considered a possibility will produce a less significant output.

This occurs because most PC frameworks model uncertainty explicitly, and these probability distributions can account for the possibility of multiple future events. Anticipated events will have some mass in the prior distribution; unanticipated events have very little prior probability. If the FF output is calculated as the difference between prior and posterior distributions, we naturally get an amplitude that is correlated with the surprise of the event.

This is a very useful property. We can distribute representational resources across the hierarchy, giving the resources preferentially to the regions where larger errors are occurring more frequently. These events are being badly represented and need improvement.

In biological terms this response would be embodied as a proliferation of cells in columns receiving or producing large or frequent FF signals.

Next post

In the next post we will describe a hybrid Predictive-Coding / Memory Prediction Framework which has some nice properties, and is appealingly simple to implement. We will include some empirical results that show how well the two go together.