Reading list – May 2016
|Digit classification error over time in our experiments. The image isn’t very helpful but it’s a hint as to why we’re excited 🙂|
A few weeks ago we paused the “How to build a General Intelligence” series (part 1, part 2, part 3, part 4). We paused it because the next article in the series requires us to specify everything in detail, and we need working code to do that.
We have been testing our algorithm on a variety of MNIST-derived handwritten digit datasets, to better understand how well it generalizes its representation of digit-images and how it behaves when exposed to varying degrees of predictability. Initial results look promising: We will post everything here once we’ve verified them and completed the first batch of proper experiments. The series will continue soon!
Deep Unsupervised Learning
Our algorithm is a type of Online Deep Unsupervised Learning, so naturally we’re looking carefully at similar algorithms.
We recommend this video of a talk by Andrew Ng. It starts with a good introduction to the methods and importance of feature representation and touches on types of automatic feature discovery. He looks at some of the important feature detectors in computer vision, such as SIFT and HoG and shows how feature detectors – such as edge detectors – can emerge from more general pattern recognition algorithms such as sparse coding. For more on sparse coding see Shakir’s excellent machine learning blog.
For anyone struggling to intuit deep feature discovery, I also loved this post on yCombinator which nicely illustrates how and why deep networks discover useful features, and why the depth helps.
The latter part of the video covers Ng’s latest work on deep hierarchical sparse coding using Deep Belief Networks, in turn based on AutoEncoders. He reports benchmark-beating results on video activity and phoneme recognition with this framework. You can find details of his deep unsupervised algorithm here:
Finally he presents a plot suggesting that training dataset size is a more important determiner of eventual supervised network performance than algorithm choice! This is a fundamental limitation of supervised learning where the necessary training data is much more limited than in unsupervised learning (in the latter case, the real world provides a handy training set!)
|Effect of algorithm and training set size on accuracy. Training set size more significant. This is a fundamental limitation of supervised learning.|
Online K-sparse autoencoders (with some deep-ness)
We’ve also been reading this paper by Makhzani and Frey about deep online learning with auto-encoders (a type of supervised learning neural network that is used in an unsupervised way to reconstruct its input, often known as semi-supervised learning). Actually we’ve struggled to find any comparison of autoencoders to earlier methods of unsupervised learning both in terms of computational efficiency and ability to cover the search space effectively. Let us know if you find a paper that covers this.
The limitation introduced can be thought of as an inability to escape from local minima that result from prior training. This paper by Choromanska et al tries to explain why this happens.
For more information here’s some papers on deep sparse networks built from autoencoders:
Variations on Supervised Learning – a Taxonomy
Back to supervised learning, and the limitation of training dataset size. Thanks to a discussion with Jay Chakravarty we have this brief taxonomy of supervised learning workarounds for insufficient training datasets:
Cross-modal adaptation: where one mode of data supervises another – e.g. audio supervises video or vice-versa.
Domain adaptation: model learnt on one set of data is adapted, in unsupervised fashion, to new datasets with slightly different data distributions.
Transfer learning: using the knowledge gained in learning one problem on a different, but related problem. Here’s a good example of transfer learning, a finalist in the NVIDIA 2016 Global Impact Award. The system learns to predict poverty from day and night satellite images, with very few labelled samples.
Interactive Brain Concept Map
We enjoyed this interactive map of the distribution of concepts within the cortex captured using fMRI and produced by the Gallant Lab (source papers here).
Thanks to David Ray @ http://cortical.io for the link.
|Interactive brain concept map|
OpenAI Gym – Reinforcement Learning platform
We also follow the OpenAI project with interest. OpenAI have just released their “Gym” – a platform for training and testing reinforcement learning algorithms. Have a play with it here:
According to Wired magazine, OpenAI will continue to release free and open source software (FOSS) for the wider impact this will have on uptake. There are many companies now competing to win market share in this space.
The Talking Machines Blog
How the brain generates actions
This is probably quite informative for reverse-engineering purposes. Full paper here.
Hierarchical Temporal Memory
HTM is an online method for feature discovery and representation and now we have a baseline result for HTM on the famous MNIST numerical digit classification problem. Since HTM works with time-series data, the paper compares HTM to LSTM (Long-Short-Term Memory), the leading supervised-learning approach to this problem domain.
It is also interesting that the paper deals with adaptation to sudden changes in the input data statistics, the very problem that frustrates the deep belief networks described above.