Choosing a Machine Learning Framework in 2018
There are plenty of established machine learning frameworks out there, and new frameworks are popping up frequently to address specific niches. We were interested in examining if one of these frameworks fits in our workflow. The number of frameworks makes it very difficult to examine all of these frameworks simultaneously. I found that many of the comparisons available online focus on one or two frameworks. I surveyed the most popular frameworks, and aim to provide a helpful comparative analysis. It’s focused on distributed execution, optimisation on relevant architectures, community support and portability.
It’s important to note that most comparisons have some bias. Authors may not be aware of their bias when comparing frameworks, and sometimes choose not to disclose it. Therefore, I thought it is worth mentioning that I have experience using both TensorFlow and PyTorch, and have a personal preference for PyTorch.
What are Computational Graphs?
One of the main differences between some of these machine learning frameworks is how they handle creating the computational graph. Computational graphs are an easy way to think about mathematical expressions. Consider the following expression e = (a + b) * (b + 1), where there are three operations: two additions and one multiplication. This can be expressed differently by introducing two additional variables:
c = a + b
d = b + 1
e = c * d
A computational graph can then be created to express these operations, where each operation and input variable become nodes in the graph. Directed connections (visualised with an arrow) connect nodes when one node’s value is an input for the connected node.
Computational graphs are quite common in computer science and are the core behind many of the popular machine learning frameworks. Computational graphs typically come in two forms: static and dynamic. Chris Olah provides a great introduction to computational graphs and how they work in relation to backpropagation.
Frameworks that declare computational graphs statically, such as TensorFlow, allow you to first define the architecture of the graph, then you can execute the graph many times by running some data through the graph. This allows you to easily distribute tasks by execute graph across different machines. In addition to that, frameworks can optimise the graph for you beforehand. It can also allow users to serialise the graph once it is built, and execute it without needing the code that built it originally. As a result, static graph frameworks effectively become a new programming language where common and familiar features are re-invented (e.g. string.split() in Python becomes tf.string_split() in TensorFlow).
Frameworks that declare computational graphs dynamically, such as PyTorch, allow you to implicitly define the computational graph as the forward computation is executed. Dynamic graph frameworks are typically less invasive and deeply integrate with the programming language used. This results in cleaner and easily debuggable code, and allows you to utilise operations, such as conditionals and loops, to build more complicated graph structures, such as recurrent neural networks. The graph building and execution are intertwined, which leaves little time for the graph to be optimised.
Now that we’ve covered one of the fundamental differences between the machine learning frameworks, I want to provide a brief overview of the most popular and widely used frameworks, highlighting some of their strengths and weaknesses.
TensorFlow is a framework developed by Google that uses a static graph, which means building the graph once then executing it many times. TensorFlow is a very low level numerical library which led to a number of different libraries that aim to provide a high level abstraction layers such as Keras, Sonnet, TFLearn and others. In TensorFlow, you can define Placeholders which are input nodes in the computational graph, such as the input data. Variables are values that exist within the computational graph that may be updated, such as weights and biases. The same graph can then be executed many times in a session.
Static graphs have many advantages as outlined in 2.1. Unfortunately, it also has some drawbacks that Google attempted to address recently. It can be a bit difficult to debug the code as it is not executed imperatively, and it has a huge overhead on prototyping which is not ideal for research work. Google introduced Eager Execution, an imperative, define-by-run interface for TensorFlow. However, this feature is still in its early stages at the time of writing.
PyTorch is a framework developed by Facebook that uses dynamic graphs which means building a new computational graph on each forward pass. It is quite similar to Torch, and shares some of its backend. PyTorch is deeply integrated with Python and follows an object-oriented paradigm. It also allows you to easily extend functionality by simply defining your own classes that extend PyTorch. For example, creating a custom neural network class that extends nn.Module.
The imperative nature of PyTorch makes it really easy to write clean code that is easy to debug, and utilise typical Python functionality such as conditionals and loops.
PyTorch ships with three levels of abstractions to make things easier to use. A Tensor in PyTorch is an imperative nd-array, similar to numpy while having the ability to run on the GPU. A Variable is a node in the computational graph, which is very similar to the Tensor, Variable and Placeholder in TensorFlow. A Module is a neural network layer which can store weights or learned weights, which can be used to create your own neural network classes.
While PyTorch is less mature compared to TensorFlow, it gained a lot of popularity within the research community due to its imperative nature and Pythonic API. There is a development community around the framework that includes various libraries such as visualisation tools.
CNTK is a framework developed by Microsoft, and uses a static graph similar to TensorFlow. The library includes feed-forward neural networks, convolutional nets and recurrent networks. CNTK offers a Python API over C++ code. It’s main advantage is to easily build models for products in speech and image problems. With Microsoft’s backing, this also allows easy integration with Azure Cloud services.
Some of the criticisms of CNTK include its strict license as they have not adopted conventional open-source licenses like GPL, ASF or MIT. The licensing also affects features which makes distributed training easier, as the feature is not licensed for commercial use.
MXNet is a framework developed by Apache Software Foundation, and supported by Amazon, Microsoft and others. It aims to bring the best of both worlds by attempting to support both imperative and symbolic programming. On the surface it seems like a relatively new framework due to the recent news about it. It seems to have been in development from around 2015, just didn’t seem to gain any traction until recently. The framework started gaining traction with support from Amazon and Microsoft, and the introduction of the high-level wrapper Gluon. With the first official release, it became an Apache incubating project. It is not yet clear whether the framework brings anything new to the table as it has not been widely adopted by the community.
Torch is a framework maintained by various contributors from DeepMind, Facebook, Twitter and others that utilises dynamic graphs. Prior to the introduction of TensorFlow and PyTorch, it was the main competitor to Theano. It shares the main advantages as discussed in PyTorch such the use of dynamic graphs and extendability. While it is more mature compared to PyTorch, one of its major disadvantages is that it uses the Lua, the programming language. It also does not have automatic differentiation built-in. The community seems to have moved on from Torch, aside from managing legacy codebases, where Facebook and Twitter moved to PyTorch and DeepMind moved to TensorFlow. It is no longer recommended to start new projects in Torch.
Caffe2 is a successor to Caffe developed by Facebook. It is a lightweight and modular framework that is built to excel at mobile and large scale deployments. It uses static graphs similar to TensorFlow, and has a friendlier Python interface compared to Caffe. The models can be trained normally in Python, serialised and then deployed without Python. It can be used with other frameworks through ONNX, such as PyTorch, for efficiently deploying models.
Caffe originally developed by UC Berkeley is quite different from typical frameworks, and heavily relies on loading model configurations using prototxt files. It allows users to use Python and MATLAB bindings, and can be used through a Python interface. Caffe is good for feedforward networks, fine tuning existing networks and training models without really writing code. However, the Python interface is not very documented, requires writing custom C++/CUDA code for GPU, not very good for RNNs and very cumbersome for large neural networks. Caffe is not commonly used research wise, but may be used for production.
Theano was a widely used framework developed by the Montreal Institute of Learning Algorithms (MILA) which TensorFlow drew a lot of ideas from. Active development on the framework has been halted since version 1.0. It’s a worthy mention, but the community is moving away from it and would not recommend for newer projects.
This is supported by most frameworks: TensorFlow is built with distributed execution in mind because of Google and the computational graph can be distributed across a cluster. PyTorch can perform distributed operations and training and supports different backends: TCP, MPI and Gloo. CNTK has a distributed module for using multiple GPUs and machines; however, as mentioned previously, there are some caveats in terms of licensing. MXNet is also capable of distributed training; however, it seems to behind other frameworks and the documentation lacks details. Caffe2 includes built-in distributed training that also utilises the Gloo backend. Torch, Caffe and Theano have no built-in support for distributed execution.
All frameworks are optimised to run on either a CPU or a GPU, and most if not all have an option to easily switch between them. In TensorFlow, there are separate packages for CPU and GPU enabled versions. TensorFlow attempts to figure out which device to use depending on the operation and the available devices. However, you can also explicitly specify which device you want to use:
# this will be on the CPU
a = ...
b = ...
c = tf.matmul(a, b) # this will be on the GPU
In PyTorch, both versions are in a single package and you must explicitly define whether it should operate on either the CPU or the GPU, and you can easily transfer variables to and from devices. For example, if a tensor is defined in the GPU, then any following operations on it will take place on the GPU unless it is then transferred back to the CPU.
# Create Tensor on CPU
dtype = torch.FloatTensor # or, torch.cuda.FloatTensor for GPU
x = torch.randn(4, 4).type(dtype)
# Transfer to GPU
x = x.cuda() # or, x.cpu() to transfer to CPU
The majority of frameworks utilise NVIDIA GPUs and rely on CUDA and cudNN libraries for operations, such as convolutions and matrix multiplication. This means that performance generally becomes a non-issue across different frameworks. It’s important to note that performance may still vary depending on other factors, such as network sizes. Due to performance issues with OpenCL, the majority of frameworks lack any support for GPUs outside the NVIDIA ecosystem at the time of writing. There is ongoing progress to support more libraries such as HIP by AMD and MKL-DNN by Intel.
TensorFlow comes with a suite of visualisation tools called TensorBoard which allows you to visualise training progress, convergence and other things. TensorBoard relies on summary data that is written to disk. When you run an experiment, you save the “summary data” for a particular run to disk. TensorBoard can then visualise this data e.g. you can do multiple runs for hyperparameter search, run TensorBoard and directly compare all runs at the same time. You should also be able to do it as you go, TensorBoard periodically refreshes, so as long as you’re writing stuff to where it’s looking, it should be updating its visualisation. So you can re-visualise any run, whenever, as long as you retain the data you saved.
PyTorch and Torch can utilise the visdom package from Facebook for visualisation. Other frameworks either provide minimal visualisation tools or rely on open source libraries, such as graphviz and matplotlib.
Fortunately, since TensorBoard is an open-source library, it is no longer exclusive to TensorFlow. There are a number of great libraries that allow users to use any framework, including PyTorch and CNTK, with TensorBoard.
TensorFlow beats all frameworks in terms of community as it is backed by Google and has been available for much longer compared to the others. It seems to be the go-to tool for beginners, and Google is making it easier to access through it’s Google Cloud services. There is a large developer community surrounding TensorFlow with different libraries, such as Keras and others to make developing in TensorFlow easier. It is being used by Google in their products and their research teams, Google Brain and DeepMind. It is also used by AirBnb, Dropbox, SAP, eBay, and others. It also has great documentation, tutorials and books.
PyTorch has been mainly adopted by researchers due to the advantages mentioned previously. It is being developed and used by Facebook, Twitter, NVIDIA, Salesforce, Uber, Stanford, CMU, NYU and others. While it has a smaller community compared to TensorFlow, it has an active discussion board and Slack community. PyTorch also has great and extensive documentation, official tutorials and books.
CNTK is backed predominately by Microsoft and it’s not clear if other companies are actively using the framework. It is positioned as a framework for deploying applications at scale, which unfortunately leaves behind the other use cases such as research community. The community seems to consist of Windows developers who would like to include machine learning models in either desktop or mobile applications. It does seem lacking in general compared to all other frameworks, even though Microsoft has recently been trying to embrace the open-source community more.
MXNet joined Apache and entered incubation in 2017, and there’s a community as part of this. MXNet has a discussion board and mailing list for the community. It is now backed heavily by Amazon as their framework of choice, and getting integrated in AWS. There’s also some support from Microsoft with the introduction of the high-level wrapper Gluon as it will eventually support CNTK as a backend as well. There was some criticism by the community earlier in the year due to lack of good documentation and tutorials; however, this has been improving.
All frameworks have the capability of saving and loading the model configuration, as well as the learned parameters. This can be taken further with Open Neural Network Exchange (ONNX) where the saved model can be used in a different framework for inference. For instance, as PyTorch models can be saved using ONNX to be deployed on Caffe2 or CNTK. This also allows you to integrate models on mobile devices. ONNX is supported by default in PyTorch, Caffe2, CNTK and MXNet. While TensorFlow does not yet support ONNX, open-source converters are available to bring some of this functionality to TensorFlow models.
|Framework||Distributed Execution||Architecture Optimisations||Visualisations||Community Support||Portability|
In this comparison, there are some obvious winners and losers. It was worth mentioning Torch, Theano and Caffe to provide some background information; however, I would not recommend starting new projects using them. This is not entirely about their features or performance, rather the lack of active development and community support moving forward. This leaves us with TensorFlow, PyTorch, MXNet, CNTK and Caffe2, although I have some reservations about CNTK.
The purpose of this article was to provide an overview of the most widely used machine frameworks and examine them against our selection criteria, which can narrow down the options. Most framework comparisons are akin to Vim vs Emacs arguments, where use cases and biases heavily influence the comparisons and the choices. In most cases, the real answer is: it depends. Some frameworks have a specific use case, and it’s best to evaluate frameworks within your actual workflow to see which one is best for you.
PyTorch and TensorFlow are both perfectly suited for our workflow and general low level research. While PyTorch is easy to use and technically impressive due to its Pythonic API and object-oriented design, we will be mostly focusing our efforts on TensorFlow. This is largely due to its wide usage, ecosystem and community support, as it’s essential that we are able to easily understand and execute other researchers’ code, and vice versa.
UPDATED: As of 18 March 2018, Microsoft switched over to the MIT License for CNTK which addresses the criticisms regarding the license limitation for distributed training. The conclusion remains largely the same with a split between PyTorch and TensorFlow. We have already started incorporating TensorFlow in new projects, and recently examined Eager Execution as workflow option within the TensorFlow ecosystem.
Also published on Medium.