Eagerly awaiting TensorFlow Eager Execution?
If you’re using TensorFlow and recently heard about Eager Execution and don’t know if or when to switch, or you’re getting started on TensorFlow and don’t know if you should use Eager or Graph Execution…… then this article is for you.
We are currently in the first scenario as we have recently started using TensorFlow, and were intrigued by the official introduction of Eager Execution at the TensorFlow Dev Summit a few months ago. I went through the process of evaluating Eager Execution and wrote this review to help guide our decision for our next TensorFlow projects.
What is Eager Execution?
Eager Execution is an imperative, object oriented and more Pythonic way of using TensorFlow. Operations are immediately evaluated and return concrete values, instead of constructing a computational graph that is executed later.
It is a flexible machine learning platform for research and experimentation which provides an intuitive interface that allows you to naturally structure your code and use standard Python data structures. It also allows you to easily debug and profile your models by calling operations directly, and using standard Python debugging tools for immediate error reporting. More complex and dynamic models can make use of standard Python control flow instead of using the graph control flow.
There is no magic happening in the background to attempt to convert static computational graphs into dynamic ones. Eager Execution is a completely separate execution engine, and TensorFlow will no longer build graphs when it is enabled. It is similar to other frameworks that adopted dynamic computational graphs such as PyTorch, DyNet and Chainer.
It has been officially moved to the TensorFlow Core as of version 1.7 after being available for a while in the Contrib package. This is a good indication that this functionality is here to stay and will be supported going forward as it is seen as a essential part of the TensorFlow experience.
Why Enable Eager Execution?
- You want to learn TensorFlow, as it’s more intuitive and a much nicer experience
- You work on research and want to iterate quickly, and work on non-trivial complex models that are not common in the TensorFlow community
- You want to debug your model using standard Python debugging tools
- You want to profile your model to find bottlenecks using standard Python profiling tools
- You work on dynamic models with complex control flow
Graph vs Eager Execution
TensorFlow was always known as the graph execution engine for machine learning. The ability to do automatic differentiation was a big factor with static computational graphs. Constructing a computational graph that is independent of the host programming language allows you to easily deploy to a Python-free environment, such as mobile. This also allows you to easily and automatically distribute the graph to 100s of machines. Due to the static nature of the graph execution, TensorFlow is able to perform optimisations on the computational graph prior to execution.
If graph execution is so good, why did the TensorFlow team decide to move beyond that and allow eager execution? It turns out that you don’t need to give up automatic differentiation when using eager execution as tape-based automatic differentiation methods (e.g. Autograd) allow you to trace back and compute gradients. Eager Execution allows developers and researchers to iterate more quickly, inspect their models and poke & prod more easily. Models can be debugged using standard debuggers (e.g. pdb), profilers (e.g. cProfile) and other tools available in Python. All the available functionality of the host programming language (i.e. Python) can be fully utilised, which is a key for dynamic models. This will allow you to use more complex data structures and highly dynamic control & data flows.
The Pythonic nature of Eager Execution allows for a more object-oriented approach in programming the models. This is demonstrated across most of the new APIs introduced by Eager Execution, some of which are highlighted below.
The variables are standard Python objects instead of being complex entities in a computational graph. This allows variables to be easily assigned and modified, and memory, including GPU memory, can be returned after the variables are no longer used, unlike in graph execution. This property also allows variables to be easily shared by reusing those objects without worrying about TensorFlow variable scopes and other potential complications.
tfe.metrics is a new object-oriented API for metrics, similar to tf.metrics, that supports both eager and graph execution environments. The metric can be updated by passing new data to the callable, and the result is retrieved using the tfe.metrics.result method. The current metrics available are: Mean and Accuracy, and the Metric class can be extended to create custom metrics.
Saving and loading checkpoints was typically quite painful in a graph execution environment, especially when you’re not reloading the checkpoint within the same program with the exact structure. This is due to the fact that graph checkpoints depend on exact variable names, which also depend on all kinds of other variables.
Eager Execution introduced standard Python object-oriented checkpoints which allow you to save and load checkpoints with ease without worrying about variable names, which is quite similar to the Python object serialization module, pickle. This also allows you to save and load the whole model, or subsets of the model. For example, you can save the generator and discriminator in a GAN separately, and they can then be loaded separately as standalone models.
Computations are automatically offloaded to GPUs during eager execution, similar to how it works in graph execution. The same device allocation will also work if you want to control where a computation runs.
Graph Execution enables TensorFlow to perform additional optimisations prior to execution, which helps improve performance. This makes performance a deciding factor when considering switching to Eager Execution. There is an additional overhead (typically a few microseconds) compared to Graph Execution when using small models that require less computation. On the other hand, the performance of computationally expensive models with Eager Execution enabled is comparable to Graph Execution with no (or negligible) additional overhead. The latter is where it really matters for most cases as standard models in computer vision or natural language processing would fit this criteria.
Working with Graphs
TensorFlow Graph Execution has advantages for distributed training and production deployment. The new high level APIs introduced in TensorFlow version 1.7 allow developers to write code compatible with both Eager and Graph Execution. There are also additional APIs which allows calls into the Graph or Eager Execution from any execution mode. For example, you can use tfe.py_func to evaluate functions eagerly while using Graph Execution.
Interoperating with Graphs
New APIs introduced in version 1.7 allow both Eager Execution and Graph Execution to do some magic in the background to allow developers to call into either mode. To call into graph execution from eager use tfe.make_template to convert a standard Python function (with TensorFlow operations) into a graph operation. To call into eager execution from a graph, use tfe.py_func to immediately execute the graph operation.
Most TensorFlow code can work in both Eager and Graph execution environments. Developers can easily write code that is compatible with both environments by using the recommended APIs highlighted below. This may be more tricky with complex and highly dynamic models that utilise Pythonic control & data flows.
High-level APIs to use for better compatibility
- Use tf.data for input processing instead of queues or placeholders
- Use object-oriented layer APIs: tf.keras.layers and tf.keras.Model
- Use tf.contrib.summary, the new summary APIs that are compatible with both Eager and Graph execution environments
- Use tfe.metrics
- Use object-based saving
Switching to using Eager Execution from the standard TensorFlow Graph Execution has various implications due to the differences between the modes of execution. Here’s a summary which highlights the main differences for switching to Eager Execution:
- There’s no more graph or session management
- There are no more placeholders and no more graph operations. Using TensorFlow operations will return concrete values
- Better and seamless integration with NumPy and other Python packages
- More difficulty deploying complex dynamical models that rely on control flow as it would require converting these parts to graph for production deployment
- Lack of support for automatic distributed training across machines
- Note: Distribution Strategy, a new high-level API for distributed training, currently exists in the contrib package which supports both Graph and Eager execution
- Getting used to using new automatic differentiation APIs for computing gradients
- tf.GradientTape can be used to trace operations to compute gradients later
- Gradient tape can record the operation to compute the derivative of the loss with respect to weights and biases
- Replay the gradient tape in the training loop to apply gradients using the optimizer
- Additional Autograd-style APIs are available for automatic differentiation
- tfe.gradients_function — returns a function that computes the derivative of the input function with respect to its arguments
- tfe.value_and_gradients_function — similar to the above, but it returns the value of the input function in addition to the derivatives
- Custom gradients
- tf.GradientTape can be used to trace operations to compute gradients later
Based on the previous framework assessment I’ve done and gaining more experience working with TensorFlow since then. I still find static graph execution unintuitive and difficult to use and debug. The benefits of graph execution are well known and marketed; however, I found that it ends up fighting against you when working with more complex and dynamic models, and with the need for quick iteration in research. I think we have all experienced this and it’s likely to continue moving forward.
I have been following Eager Execution for a while, and it’s a good sign that it’s now a core part of TensorFlow. The performance overhead turns out to be negligible when using Eager Execution. The disadvantages are still the same. There’s no official support for distributed training across hundreds of machines unless we write graph-compatible code, although this will change soon if and when the new Distribution Strategy APIs become officially supported. In addition to that, there’s no support for production deployment (e.g. mobile), which means that deployment-related APIs and libraries, such as TensorFlow Serving and TensorFlow Lite, are not supported, and developers must resort to manually converting the model to the graph and running inference via some Python API. While the deployment aspect is not a currently top priority for us at Project AGI, but it may be a big factor for others considering switching to Eager Execution.
I’m a bit biased from my opinions on static computational graphs in general which is supported by experience using with them, but I think it’s an objective view that the benefits of a dynamic framework really outweigh any disadvantages for research purposes. These disadvantages are not necessarily disadvantages unless you’re at a scale that requires distributed training and production deployment. Switching to Eager Execution is not a magical solution that will solve all of your problems, one still needs to tackle standard problems experienced during research, especially when building complex architectures that don’t fit the standard machine learning models. But at least we can avoid the standard graph-related issues.
- Getting Started with Eager Execution
- Example Models using Eager Execution
- Programmer’s Guide for Eager Execution
- TF Dev Summit Introductory Notebook
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
Also published on Medium.