This paper introduces TensorFlow, which uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster and within a machine across multiple computational devices. TensorFlow is a familiar tool for machine learning users, and this paper provides insight into how the system design is generated, and also implementation and performance. The paper first states the motivation for the TensorFlow: previous work lacks properties in the following aspects: 1)use familiar programming language to define new layers 2)easy experiment with new optimization methods 3)define new training algorithms. In comparison, TensorFlow provides a single programming model and runtime which fulfills these goals, and is also flexible for both large and small scale system setups. There are two main abstractions in TensorFlow: 1)execution is converted to graph of operators, where the graph vertices are operations and edges are data passing 2)abstraction for heterogeneous accelerators is provided, so different scale of hardware (CPU/GPU/TPU/accelerator/cluster) can be utilized. I think one most important architecture included is Parameter Server (PS). PS is a set of servers to manage shared state to be updated by a set of parallel workers, and is the key to the scalability of the system (since PS is updated and servers as a synchronization point in the system). It's worth mentioning that TensorFlow differs from batch dataflow systems in two aspects: 1)TensorFlow supports multiple concurrent executions on overlapping subgraphs 2)Individual vertices have mutable state to be shared between different executions of the graph I enjoyed reading this paper very much, since it's easy to read and clearly states the technical choices made and the reasons/applications that derive those choices. I like TensorFlow due to three reasons: 1)it's flexible, providing abstractions easy to use 2)for company like Google the system can focus on large clusters; however, TensorFlow also does (partial) optimizations for small-scale users 3)TensorFlow has an abstraction for different devices, and learning how the abstraction layers are designed is very helpful. |
Machine learning has been applied in various fields in recent years. Previously, Google Brain developed DistBelief based on deep learning neural networks. However, the python-based scripting interface of DistBelief is limited to some simple requirements. Advanced requirement like defining new layers, refining the train algorithms and defining new training algorithms cannot be fulfilled by DistBelief. Therefore, the team designed Tensorflow to meet those more flexible requirements. TensorFlow provides a simple dataflow-based programming abstraction that allows users to deploy applications on distributed clusters, local workstations, mobile devices, and custom-designed accelerators. Some of the design principles of Tensorflow are dataflow graphs of primitive operators, deferred execution and Common abstraction for heterogeneous accelerators. Those pricinples ensure the flexibility of Tensorflow. Some of the contributions and strengths of Tensorflow are: 1. Tensorflow offers better graph visualization than other libraries like Torch and Theano. 2. Tensorflow maintain high scalability in both large-scale and small-scale machine learning. 3. Tensorflow is highly parallel and designed to use various backends like GPU, ASICs, etc. Some of the drawbacks of Tensorflow are: 1. Tensorflow doesn’t support symbolic loops like the scan feature in Theano, which is needed when it comes to variable length sequences. 2. Tensorflow doesn’t support Windows environment directly. 3. Tensorflow doesn’t support other GPU and languages other than NVIDIA and Python. |
DistBelief is the distributed system for training neural network used in Google which uses the parameter server architecture. The interface provided by DistBelief lack three advanced feature: 1)Defining new layers, 2)Refining the training algorithms and 3)Defining new training algorithms. Therefore, this paper proposed TensorFlow, a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses a single dataflow graph to represent all computation and state in a machine learning algorithm, including the individual mathematical operations, the parameters and their update rules, and the input preprocessing. It has two distinctive features: 1. The model supports multiple concurrent executions on overlapping subgraphs of the overall graph. 2. Individual vertices may have mutable state that can be shared between different executions of the graph. TensorFlow uses a dataflow graph to represent all possible computations in a particular application. The API for executing a graph allows the client to specify declaratively the subgraph that should be executed. Stateful operations allow steps to share data and synchronize when necessary. The TensorFlow runtime places operations on devices, subject to implicit or explicit constraints in the graph. The main contributions of this paper are as follows: 1. TensorFlow enables developers to experiment with novel optimizations and training algorithms. 2. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. The main advantages of TensorFlow are as follows: 1. It provides a state of the art machine learning library. 2. It has high performance matching the best in the industry. 3. Packages are available that will let user easily program voice recognition, machine translation, video tagging, and other advanced artificial intelligence tasks. 4. It provides unique approach allows monitoring the training progress of your models and tracking several metrics. 5. It has great community support due to large number of users. The main disadvantages of TensorFlow are as follows: 1. The only GPUs supported are Nvidia GPUs. 2. Some machine learning packages support more types of models out of the box. 3. The only fully supported programming language is Python. 4. Lack of authoritative examples for data ingestion. |
Problem & Motivation Machine learning has drive advances in many different fields. Originally, Google uses the DistBelief, which is the first generation of the distributed system for training neural networks. The DistBelief from my perspective is the system introduced by another paper we suppose to read. It has many advantages especially run independent tasks simultaneously. However, it doesn’t offer much flexibility for the user to define the layer, refining the training algorithms and defining complex new training algorithms which different from the forward and backpropagating rounds. Contributions: The authors propose the system (and is also known as a strong, well-known ML library) TensorFlow. It has many key insights. Firstly, it allows user to define a high-level dataflow without actually taking the input data into the system. In this way, the system has the global knowledge of the whole graph and can cache the data that will be used afterwards and defer the operation to enhance the performance. Secondly, it provides a well abstract interface that enables the user to switch the whole model into GPU/TPU training mode. The third insight is that it considers not the general case but also the corner case when designing the system. For example, if the whole model is too big, the TensorFlow actually partitions it into subgraph and union them with the operation Gather. Fourthly, there are many optimizations involved for this system. One is that Tensor, which is an stateful variable that memorizes the position itself within the graph, which enables the user to dynamic add new elements into the system. Drawbacks: Nowadays, not many people use the TensorFlow, at least from my experience. It is because of several reasons. Firstly, there is no doubt the static graph offers great performance; yet, it is really hard for a people to master it. In other words, its learning curve is steeping. The second reason is that the way is defining the API is not explicit enough. You can refer to examples within the article “TensorFlow sucks”. |
In recent years, machine learning has become increasingly important and pervasive across more and more industries. Along with collecting data and developing suitable algorithms, having a robust, high performance and flexible architecture for supporting these machine learning calculations is important for a lot of people working in the field. At the time, there were a number of deep learning frameworks such as Torch, Caffe, DistBelief, and others, but many of them suffered from various limitations such as a lack of flexibility in experimenting with new neural network architectures. This paper introduces a new system developed by Google Brain called TensorFlow that uses a unified dataflow graph to represent both the algorithm calculations as well as its state, with the goal of allowing programmers to quickly experiment with different parallelization schemes as well as handle many of the other low level details automatically. In general, it tries to be more flexible in its implementation, such as by representing individual operators like matrix multiplication as nodes in a dataflow graph (as opposed to DistBelief which is composed of relative few complex layers). Additionally, TensorFlow abstracts out the type of computing device, regardless if it is running on a CPU, GPU, TPU, etc. As previously mentioned, TensorFlow represents all computations and state in a single dataflow graph and communication between subcomputations is expressed explicitly. Each vertex represents a single computation, and an edge represents the output from a vertex or an input to another vertex. Data is modeled as dense n-dimensional arrays called tensors, and operations work on these tensors. Execution of the dataflow graph can be done partially or concurrently, adding to its flexibility, since it allows users to represent a wide variety of neural network models without having to make modifications to the internal code for TensorFlow. The decision to make communication between subcomputations explicit simplifies dataflow execution, which allows a TensorFlow to be deployed to a heterogeneous array of computing devices. Support for both conditional and iterative control flow allows TensorFlow to support more sophisticated algorithms like recurrent neural networks. In addition to this support, TensorFlow includes a user-level library that gives users the ability to add functionality without having to hard code values. For example, differentiation and backpropagation can be easily implemented, allowing users to experiment with a wide range of optimization algorithms like stochastic gradient descent and momentum, to name a few. To be able to handle very large datasets or computations as a distributed representation, TensorFlow implements sparse embedding layers as a composition of primitive operations. The Gather operator extracts a sparse set of rows from a tensor, while Part (dynamic partition) divides indices into variable-sized tensors that contain indices for each shard, and Stitch operations, which reassembles partial results from each shard into a single result tensor. Consistent with TensorFlow’s design philosophy of abstracting out the low level details, these graphs do not have to be constructed manually. In order to provide fault tolerance, TensorFlow provides user-level checkpointing. It does not attempt to provide consistent checkpoints without direct user direction (specifying a synchronous approach instead), but this is usually not a problem due to the fact that neural networks and stochastic gradient descent are generally robust to small variations. TensorFlow is implemented with a C API, with the core libraries written in C++ for performance reasons. The distributed master turns user requests into execution along a set of tasks, while the dataflow executor handles requests from the distributed master and schedules the kernel executions for the given subgraph. The runtime contains over200 standard operations, and also contains other visualization tools for users to track the progress of training. The main strength of this paper is that it provides a working, flexible system capable of performing machine learning computational tasks with reasonably good performance. At the time of the paper’s writing, it was already widely used by Google internally, but its explosive adoption by a huge part of the community since its release speaks to its flexibility in handling a wide array of arbitrary architectures, as well as its success in abstracting out many of the low-level programming details. In addition to these advantages, TensorFlow also manages to achieve comparable or even superior performance compared to other contemporary deep learning frameworks like Caffe, Neon, and Torch, in tasks such as training several widely used neural network image classification architectures (AlexNet, GoogleNet, etc.). One potential weakness of this paper is that while it does very well in terms of flexibility and abstraction, it is not necessarily the best choice when it comes to raw performance. For example, even in the benchmarks that they used for training AlexNet, etc., Torch manages to outperform TensorFlow in nearly all architectures compared. While these values are similar, with further optimization and development, Torch could probably outperform TensorFlow further. |
The purpose of this paper is to describe Google’s TensorFlow’s dataflow model and show how it performs in some real world applications. A brief history of recent developments in Machine Learning applications can be attributed in part to recent advances in tools for handling large datasets and having the computational resources to train on these datasets. Google’s TensorFlow system has drawn from dataflow systems as well as parameter servers, and has since been released as an open source project. The paper uses neural net training as an example task to illustrate TensorFlow’s efficiency and scalability. The two specific problems they focus on are image classification which tests computational throughput and language modeling which tests aggregate model size. We look into the predecessor of TensorFlow which was Google’s DistBelief, which was built on top of the parameter server architecture which comprised of stateless worker nodes and a stateful parameter server that kept track of shared data. Distbelief layers were implemented as C++ classes and the scripting interface of DistBelief would run Python. In neural network training algorithms, stochastic gradient descent is typically used, but if the user wanted to change the optimization method they would have to modify the parameter server implementation. It could also handle simple feed-forward problems but couldn’t handle recurrent neural networks, which contained loops. Tensorflow was built on design principles based on the disadvantages of its predecessor DistBelief. It was built on top of a dataflow graph that includes a node for each simple operation to enable distribution of computation. It supports deferred execution which is only executing computation after the entire problem has been defined and is available. I like how this paper did not spend too much time on the faults of the contributions predecessors; just enough to give us context. I did not like how they included the related works separating the history and the main contribution however, as it seemed to derail my concentration on the material. |
Tensorflow This paper introduces tensorflow which a successful system developed by google for large-scale machine learning. There are several reasons of doing this. First, the reason we need to use large scale distributed system because scientists believe that the more data you use for training the machine learning algorithm, the better performance you get from it. Second, the dataset is in large scale. ImageNet and One Billion Word Benchmark for instance, they contain huge amount of data which gives birth to parallel training. Third, the deep learning model is big, ResNet for example, contain more than 200 layers of convolutional neural network which consists of more than 1 billion of weights(float value). Forth, large scale data processing bring acceleration of model model which is very good for the product iteration. Fifth, large batch processing of model training brings better performance. The core component of Tensorflow model is its execution model. It uses a single data-flow graph to represent all computation and state in a machine learning algorithm, including the individual mathematical operations, the parameters and their update rules, and the input pre-processing. There are several elements in Dataflow graph: tensors, operations, stateful operations including variables and queues. For the distributed executions, since Dataflow makes communication between subcomputations explicit so that Dataflow make the distributed execution simple. Tensorflow supports dynamic control flow for supporting advanced machine learning algorithm that contains conditional iterative control flow. Also Tensorflow considers differentiation and optimization which is the most important part in the machine learning algorithm and that can be parallelized well in large dataset and large training model. For the fault tolerance in Tensorflow, since a model training process will take a log time relatively, it supports user checkpoint to do the recovery so that user can choose the previous checkpoint they like to continue the training process. The contribution of this tensorflow is that it first is an open source which allows the whole community to contribute and takes the advantages of it. And the library is under excellent management and can be deployed on lots of machines. The whole project supports debugging. The advantage of Tensorflow is that it supports large scale dataset training and inference which can support hundreds of GPU training together. Tensorflow also supports multiple platform such as distributed clusters and mobile device. It is also flexible and general which can support experiments such as studying on new machine learning model and support system level optimization. The drawback of this tensorflow, when I use it, is that it does not support windows and miss symbolic loops. And it is not as user friendly as pytorch, and not as fast as Caffe. |
TensorFlow is a general-purpose computing framework developed by Jeff Dean's Google brain team based on Google's first generation of deep learning system, DistBelief. The TensorFlow Computation Framework supports a variety of algorithms for deep learning, but its application is not limited to deep learning. Tensorflow use a dataflow like model and the dataflow is composed by dirested graph. The nodes/vertices in the figure represent operations, and the edges represent tensors that are transferred between nodes. A special edge called a control dependency can also exist in the graph, indicating that the source node must complete execution before the target node begins execution. Nodes are assigned to computing devices and are executed asynchronously in parallel once all tensors on their input edges become available. The operation takes one or more tensors as input and produces one or more tensors as an output. The compile-time operational properties determine the expected type and relevance of the input and output. The operation may include a variable state that is read and/or written each time it is executed. The special Variable operation only has a variable buffer that can be used to store the shared parameters of the model during model training. In this way, parameters may be included in the data stream itself, rather than being "external" to the system in the parameter server. So, what's the feature of tensorflow. Dataflow simplifies distributed execution because it creates explicit communication between sub-computes. It enables the same TensorFlow program to be deployed to a GPU cluster for training, a TPU cluster for services, and a mobile phone for mobile speculation. TensorFlow supports advanced machine learning algorithms that include conditional and iterative control flows. For example, a recurrent neural network (RNN) such as LSTM can generate predictions from sequential data. TensorFlow includes a user-level library that distinguishes the symbolic expression of the loss function and generates a new symbolic expression for the gradient. Fault-tolerance is supported by user-level checkpointing operations. The contribution of this paper is that it proposed the tensorflow platform and enable it to be more flexible. The researchers can use tensorflow to define new layers to run RNN and other stuffs. Also, the tensorflow model is more efficient than the parameter servers. What's more, tensorflow enable training and infering from gpu cluster to a small mobile phone. Which is awesome. The downside of this paper is that I don't think it makes it very clear about the concept of variable. |
TensorFlow is a system for large-scale machine learning implemented by Google. Its design is partially based on Google’s old system, DistBelief, which was used for training deep neural networks on huge datasets. TensorFlow aims to provide more flexibility in terms of datasets and machine learning algorithms that are supported. The model of a dataflow graph is made up of nodes that represent computations and edges as that carry tensors (matrices) to nodes. Users have the opportunity to only run small subgraphs if needed. While it is impossible to cover all of the details of TensorFlow in this review, one area that I thought was interesting was the ideas regarding synchronization schemes. In particular, it was once believed that it was necessary to use asynchronous parameter updates for scalability, the addition of GPUs which require fewer machines means that it may be possible to rely on synchronous updates. TensorFlow uses backup workers which are similar to MapReduce backup tasks to further improve throughput. I actually see many similarities between TensorFlow and MapReduce. The idea is to abstract away the distributed/parallel systems aspects for the programmer, so that everybody who can understand the abstraction well enough to write a simple machine learning model can scale it to huge amounts of data. While it may not do everything perfectly compared to a specific system designed for a specific algorithm and dataset, it makes it significantly easier to hit the ground running. TensorFlow’s primary value comes in two places: 1) it’s suitability for distributed workloads and 2) its ability to work for novice and expert users, on anything from a huge cluster to a single laptop. I have already discussed the first, so I will discuss the second more here. TensorFlow attempts to provide an abstraction that works well for novice users to train their first neural network, while also providing the knobs needed for machine learning researchers to try new things. Additionally, it allows for training small-scale networks on a single machine in addition to training networks with a gigantic amount of training data on a distributed cluster. This is really valuable for debugging and creating a program. Finally, the Dataflow abstraction even allows inference over trained TensorFlow models on a mobile phone, which is impressive and necessary in today’s world. While I thought this was generally a well-written paper, I had some questions and concerns. First, in the evaluation section, it is noted that “we defer the analysis of such improvements to other papers” (276). This left me wondering what the follow-up papers were to this, and what else would need to be covered in the future. How has Tensorflow continued to evolve, and when is that evolution publishable material vs. another version 1.X.X release? I also was a bit concerned about the huge usage numbers presented in the conclusion section. While it is clearly extremely flexible, it does seem like Tensorflow is a bit bloated for personal, smaller machine learning projects that run on a single computer. However, I think I’ve seen a bunch of people using it that way. Is it a system for large-scale machine learning, or machine learning in general? |
This paper introduces Tensorflow, a machine learning system which is able to operate at large scale and in heterogeneous environments. The motivation of Tensorflow is to overcome several limitations in the DistBelief system. DistBelief uses the parameter server architecture. However, this architecture is not flexible enough to meet the needs of advanced users. For example, it is not a trivial task to define a new neural network layer in DistBelief, where all the layers are implemented as C++ classes and users may not be familiar with it. It’s also difficult to modify or create a new training algorithm, which involves modifying the parameter server implementation. Another problem of DistBelief is that it’s hard to scale down, so users can’t run the machine learning program in their local development environment before pushing the model into production. To solve these problems, the design of Tensorflow focus on the following: a high-level scripting interface, dataflow graph of primitive operators, deferred execution and a common abstraction for heterogeneous accelerators. The high-level interface allows users to experiment with new layers, architectures and optimization algorithm easily. The Tensorflow team also uses these API to create many different optimizers, sparse embedding operator, etc. To enable such a high-level and easy to use interface the underlying graph model should only contain primitive math operators, for example, matrix multiplication, addition, convolution, etc. In this way, users can use them as building blocks to create new operators. Tensorflow also supports algorithms that contain conditional and iterative control flow through special operators. Deferred execution is closely related to the common abstraction for heterogeneous accelerators. Once users created a dataflow graph, the system runtime will try to optimize the graph and divide the graph into subgraphs and assign to all the available devices, for example, a TPU, GPU or CPU. This is only possible under the deferred execution model, where the graph is built first. To make sure an operator in the graph can be assigned to different devices and executed properly, each operator contains multiple kernels, each with a specialized implementation for a particular device or data type. Now when a device is assigned an operator, only the kernel written for that device need to be executed. I think it will be better if the paper can talk more about the architecture of the Tensorflow system, like how different devices communicate with each other, how states of the computation graph are maintained, what happens if a graph can’t be stored at a single computer. Now, it seems like it’s just an engineering problem on how to improve the extensibility and scaling problem of their previous system. |
In the paper "TensorFlow: A system for large-scale machine learning", Martin Abadi and Co. discuss TensorFlow, a machine learning system that operates at large scale and in heterogeneous environments. In the past several years, machine learning has been a booming topic that has accelerated the growth of many different fields. This is, in part, due to the creation of more sophisticated models, the availability of larger datasets, and the development of software that enables users to direct their computational resources to train agents over these datasets. To target these points, TensorFlow efficiently uses hundreds of GPU-enabled servers for fast training and displays flexibility by allowing experimentation on new training models and system-level optimizations. Mimicking the high-level programming models of dataflow systems and the low-level efficiency of parameter servers, TensorFlow uses a unified dataflow graph to represent both the computation and states at which an algorithm operates. By unifying these two models into one, programmers are now able to play around with different parallelization schemes to control network traffic to their liking. Unsurprisingly, TensorFlow has a huge community backing it - even employees at Google integrate TensorFlow into their application development. With a sizable community and far ranging use cases, it is clear that TensorFlow is a system that comes with great advantages. Thus, this is both an important and interesting system to explore. The paper is divided into several sections, outlining their motivation and architecture: 1) DistBelief: DistBelief is the ancestor to TensorFlow and uses the parameter server architecture. In this architecture, stateless worker processes performs most of the computation while training, while the parameter server processes maintain the current version of the model parameters. DistBelief creates a DAG and uses knowledge of layer semantics to compute gradients for each of the model parameters (values that we want to optimize). However, some problems arose due to this structure: Unfamiliar programming languages create barriers for programmers, it is hard to experiment with new optimization methods without modifying parameter server implementations, and new training algorithms that do loops suffer from poor performance. 2) TensorFlow execution model: TensorFlow is written in C++ and uses dataflow graphs to represent simpler operators in ML. Each vertex (operation) represents a unit of local computation, while an edge (Tensor) represents the output from, or input to a vertex. Tensors are n-dimensional arrays and these represent common mathematical operations done in machine learning algorithms. Operations take these tensors and return outputs. On the user side, the client views the dataflow graph that represents all possible computations and selects a subgraph that should be executed. Each invocation counts as a step - and multiple steps can occur concurrently. To help with this, dataflow is simplified to a distributed execution and is optimized for executing large subgraphs repeatedly with low latency. 3) Case Studies: TensorFlow mainly helps with four features that were originally hard coded into DistBelief: Differentiation and optimization, training on large datasets, fault tolerance and synchronous replica coordination. The most important of these is fault tolerance. Running algorithms for several days has a high susceptibility for crashing, so user level check pointing is implemented to give breathing room for developers. In the case of failures, an algorithm is replayed from its last known checkpoint. 4) Implementation: The greatest fact is that this is an open-source project. It also uses a master worker interface to schedule jobs accordingly. Much like other papers, this paper also had some drawbacks. One thing to note is that this paper is difficult to understand for those that have very little experience in machine learning. I personally felt that the motivation was much less appealing since I have not experienced the pain points of machine learning myself. Despite this, the first drawback that I noticed was in the graphs for their evaluation. Information visualization is a very subtle field that has some big consequences. When observing the x-axis, you can see that it is logarithmically scaled. This was a consequence of trying to prove the scalability of TensorFlow, but it makes the data very misleading. They have very few data points with a greater number of workers which makes the graphs look very linear. Another drawback is that I felt they could have ran experiments that varied in the types of jobs ML could handle such as spam detection. This would most definitely boost the support of TensorFlow and community contributions. |
This paper describes TensorFlow, a machine learning framework. TensorFlow is built as an improvement to DistBelief, a previous system. DistBelief operated on a directed graph of computation layers, DistBelief is divided into worker nodes, which do most of the actual computation, but are stateless, and parameter servers, which maintain the current parameters. This approach had several issues. There wasn’t a way to easily define new layers, there wasn’t an easy way to refine the existing training algorithm, and there wasn’t an easy way to introduce new training algorithms. TensorFlow, instead of representing its execution plan as a graph of layers, has a graph of mathematical operations, which are much less complex than layers, and so allows for more varied graphs. TensorFlow also delays execution until the entire graph is constructed, allowing for more optimizations. Each of the vertices on the graph represents an operation, and the operations are performed on data arrays, called tensors. Each of the tensors is an arbitrary dimensional array of primitive data types, such as integers, floats, or strings. Each operation can hold state variables, so that it can be updated for subsequent execution. TensorFlow is designed to work on any kind of architecture, since some machine learning tasks work on CPUs, and others work on GPUs. TensorFlow also has a custom architecture, called a Tensor Processing Unit, which is optimized for TensorFlow tasks. Any architecture that implements the correct API can be used with TensorFlow. A TensorFlow workload is first defined on a single large graph. However, smaller jobs can be defined on subgraphs. The user just needs to determine which edges of the graph will be used for input tensors, and which edges will be used for output. Then, the rest of the graph will be pruned. TensorFlow has many advantages. It has very broad usability; it can be built on almost any architecture, and its operations can be spread easily across several devices, so TensorFlow works well on a distributed system, and can scale easily. As well, TensorFlow allow conditional execution of its operations. In addition, TensorFlow is easily extensible; new training algorithms can easily be defined without modifying the underlying system. The user can also define their own checkpointing system for fault-tolerance; users usually don’t need strong fault tolerance, which can be slow, so that’s not built in. TensorFlow can also be run in either synchronous or asynchronous mode, trading off speed with learning rate. |
The paper presents Tensorflow, a distributed machine learning platform developed by Google. TensorFlow uses a unified dataflow graph to represent both the computation in an algorithm and the state on which the algorithm operates. Tensorflow incorporates the high-level programming models of dataflow systems and the low-level efficiency of parameter servers. Edges carry tensors (multi-dimensional arrays) between nodes, and TensorFlow transparently inserts the appropriate communication between distributed subcomputations. TensorFlow’s dataflow representation subsumes existing work on parameter server systems, and offers a set of uniform abstractions that allow users to harness large-scale heterogeneous systems, both for production tasks and for experimenting with new approaches. Benefits: (1) Making everything part of a dataflow makes it easier for users to compose novel layers using just a high-level scripting interface. Having state in the dataflow graph enables experimentation with different update rules. (2) Having global information about the computation enables optimization of the execution phase – for example, TensorFlow achieves high GPU utilization by using the graph’s dependency structure to issue a sequence of kernels to the GPU without waiting for intermediate results. Weak points: (1) The paper mentioned that the limitations of a static dataflow graph, especially for algorithms like deep reinforcement learning. The problem has become a big problem for tensorflow recently as more and more researchers choose to use pytorch instead of tensorflow (2) The application interface is less friendly compared with pytorch |
TensorFlow is a machine learning system that operates at large scale. It uses dataflow graphs to represent computation and shared state, and maps the nodes of a dataflow graph across many machines in a cluster. TensorFlow supports a variety of applications, with a focus on training and deep neural networks. This paper describes the TensorFlow dataflow model and demonstrates the performance of TensorFlow. The successor of TensorFlow is DistBelief, which is the distributed system for training neural networks. However it is not so flexible, and users sought three further kinds of flexibility. First one is to design new layers that does not use a separate programming language so that machine learning researchers can experiment with new layer architectures. Second one is to refine the training algorithm so that researchers can experiment with new optimization methods without having to modify the parameter server implementation. The third is to define new training algorithms to support more advanced models such as recurrent neural networks and adversarial networks. TensorFlow is designed based on these requirements. The design principles are supporting functional operators and representation of mutable state, issuing a sequence of kernels to the GPU without waiting for intermediate results, and using tensors of primitive values as a common interchange format that all devices understand. The main consequence of these principles is that in TensorFlow there is no such thing as a parameter server. The paper discusses the execution model of TensorFlow in details. TensorFlow differs from batch dataflow systems mainly in two aspects: the model supports multiple concurrent executions on overlapping subgraphs of the overall graph, and individual vertices may have mutable state that can be shared between different executions of the graph. In TensorFlow, all data are modeled as tensors. Tensors naturally represent the inputs to and results of the common mathematical operations in many machine learning algorithms. At the lowest level, all TensorFlow tensors are dense. Operations in TensorFlow takes tensors as input and produces tensors as output. An operation can contain mutable state that is read and/or written each time it executes. Partial and concurrent execution is responsible for much of TensorFlow’s flexibility. This asynchrony makes it straightforward to implement machine learning algorithms with weak consistency requirements. Dataflow simplifies distributed execution, because it makes communication between subcomputations explicit. Each operation resides on a particular device, such as a CPU or GPU in a particular task. A device is responsible for executing a kernel for each operation assigned to it. TensorFlow allows multiple kernels to be registered for a single operation, with specialized implementations for a particular device or data type. The execution of iterations can overlap, and TensorFlow can also partition conditional branches and loop bodies across multiple devices and processes. The paper also evaluates extensibility case studies including optimization, training large models, fault tolerances, and synchronous replica coordination. The advantage of this paper is that the evaluation part is very thorough and convincing. However, instead of explaining a lot of details of previous system, maybe it's better to spend more time on technical details of TensorFlow. |
"TensorFlow: A System for Large-Scale Machine Learning" by Abadi et al. at Google Brain presents TensorFlow, a new system that supports machine learning computations via a distributed dataflow architecture. TensorFlow improves on Google’s prior system, DistBelief, in that it allows machine learning researchers more flexibility in the machines and cluster architectures they train and run on. TensorFlow also allows for creation of more complex algorithms (e.g., RNNs, LSTMs) without needing to modify low-level internal library code, as would be required with DistBelief. This is made possible in TensorFlow through the high level programming language it provides. TensorFlow is made up of a dataflow graph whose vertices are the operations that can be performed (e.g., Const, MatMul, Assign, Read) and whose edges are tensors, or the values (n-dimensional arrays) that are output from or input to operations. Different subgraphs of the dataflow graph can be performed on different machines in the utilized cluster, which enables partial and concurrent execution. Stateful operations enable coordination between different processes running different subgraphs. The paper goes on to explain 4 extensions Google Brain built using the high-level API TensorFlow provides: differentiation and optimization (i.e., specialized gradient descent algorithms), training very large models (e.g., with sparse embedding layers), fault tolerance (e.g., with user-level checkpointing), and synchronous replica coordination (i.e., modifying how/when reads and writes are done in what is the default asynchronous training dataflow). The authors evaluated TensorFlow on several workload types, against other machine learning frameworks, and using different configurations of itself. For single-machine benchmarks, TensorFlow achieved shorter step times than the Caffe library, and comparable times to Torch, but was outperformed by Neon on 3 of the 4 benchmarks. For the synchronous replica microbenchmark of a null model, the authors test TensorFlow with scalar, sparse, and dense accesses with different number of workers to understand the performance and overhead for different configurations; scalar performs the best. The authors also evaluate TensorFlow for common neural network applications, in particular image classification and language modeling. The paper does a good job demonstrating use cases TensorFlow would be helpful for (section 4; the extensions it supports), as well as experiments for considering performance for different kinds of workloads (section 6). It is also good to hear that many groups within and outside of Google have already used TensorFlow for their work. I was a little confused on the low-level details of how data moves around in TensorFlow. The example dataflow graph in figure 2 is nice, but I think a more detailed and end-to-end example, explained in a diagram, prose, and/or code, of TensorFlow’s usage would have been very helpful. |
This paper proposes Tensorflow, which is a large-scale machine learning that can be executed in heterogeneous environments. The key design of Tensorflow is using dataflow graphs to represent computation, shared stete, and operations that mutate that state. As an open-source project funded by Google, Tensorflow has caught a lot of attention from both the academia and the industry. There are a lot of projects built upon Tensorflow. Also, Tensorflow is the most popular deep learning platform in academia, a lot of papers have experimented with Tensorflow. This paper gives the design principles of Tensorflow, including dataflow graphs of primitive operators, deferred execution, common abstraction for heterogenous accelerators. The execution model of tensorflow is a single dataflow graph to represent all computation and state ina machine learning algorithm, including the individual mathematical operations, the parameters and their update rules, and the input preprocessing. The paper also gives the dataflow model detail such as graph elements and partial and concurrent execution, dynamic control flow. I think the strong part of the paper is actually the project of Tensorflow itself. Tensorflow is so popular that the influence of the paper is never can be ignored. The most importnt part of Tensorflow is that, it has so many contributers to maintain the code, adding features and fixing bugs. It is a really active group of Tensorflow to implement almost all deep learning components, and we can use them in one line! The short part of Tensorflow to me is that it is difficult to debug and write code. That's why other systems like Pytorch are getting more attention in recent years. |
In this paper, many researchers from Google Brain proposed a novel system for large-scale machine learning called TensorFlow, as we all know, this system is almost one of the most popular and active machine learning systems that are widely used in both academia and industry. Design and implement a novel machine learning platform to handle large-scale machine learning workload is definitely an important question. As they said in their paper, machine learning has driven advances in many different fields, it is necessary to follow this trend and invent more sophisticated machine learning models and develop software platforms that enable the easy use of a large amount of computational resource for model training is a very large dataset. This contribution will do good to all machine learning researchers and engineer to make their life much easier and reduce the unnecessarily repeated programming. Besides, previous systems like DistBelief are using parameter server architecture which is subject to limitations. Based on these demands, TensorFlow system is introduced which focus on experimenting new models, training them on a large dataset and moving them into production, next I will summarize the crux of the design of TensorFlow with my understanding. First of all, they discussed their previous system called DistBelief, although this DisBelief is good enough for many applications, it still has some limitation in providing flexibility. The goal of TensorFlow is to provide more flexibility like supporting a new definition of layers, refining the training algorithms and defining new training algorithms. The design principles of TensorFlow includes data flow graphs of primitive operators, deferred execution and common abstraction for heterogeneous accelerators. TensorFlow uses a unified data flow graph to represent both the computation in an algorithm and the state on which the algorithm operates. Unlike traditional dataflow systems, in which graph vertices represent functional computation on immutable data, TensorFlow allows vertices to represent computations that own or update mutable state. Edges carry tensors between nodes, and TensorFlow transparently inserts the appropriate communication between distributed sub-computations. By unifying the computation and state management in a single programming model, TensorFlow allows programmers to experiment with different parallelization schemes that. They also built various coordination protocols and achieved encouraging results with synchronous replication, echoing recent results that contradict the commonly held belief that asynchronous replication is required for scalable learning. As they said in their paper, there are more than 150 teams at Google have used TensorFlow, and it is also very popular in the open-source community. In this paper, they introduced two representative application for image classification and language modeling which illustrates that TensorFlow features high efficiency, scalability, and flexibility. As we all know, TensorFlow is super popular in machine learning community and the main technical contribution of this paper is the introduction of TensorFlow, it shares many great insights in building such system focusing on machine learning workloads. There are many advantages of TensorFlow which make it become successful. First of all, from a system design perspective, TensorFlow is efficient and scalable, it is a very successful model for distributed computation aiming at machine learning workloads. TensorFlow is configurable and very flexible, it provides several interface abstractions. Also, TensorFlow support GPU which provide very high-performance parallel computing for training and support multi-GPU which can be used across the different machine. TensorFlow provides a various model and useful tools which make the algorithm design and implementation much easier. Besides, the TensorFlow open-source community is very popular and there are many people making a contribution to this project, this means that people can easily learn and use this product easily. Generally speaking, this is a nice paper with great insights, I think the downsides of this paper are minor. I used TensorFlow to build some deep neural network and I think there is some drawback of this model. First of all, I think PyTorch is much cleaner and easier to use when comparing to TensorFlow. Besides, the speed of TensorFlow is not as good as MXNet. There are so many high-level interfaces for TensorFlow which make people overwhelming, there are so many encapsulations of the high-level interfaces which make it not so flexible. Besides, I think TensorFlow is not friendly to users with limited computational resources (GPUs), it can perform well under a huge cluster, but normal developers do not have these resources. |
This paper introduces TensorFlow, which is a machine learning system developed by Google. TensorFlow was designed as an improvement to DistBelief—the main complaints against DistBelief were that it did not offer advanced developers enough flexibility. Specifically, users wanted the ability to add new layers, experiment with new optimization methods, and work with different kinds of training algorithms such as adversarial networks and reinforcement learning. Thus, TensorFlow’s selling point is its flexibility to advanced users. Here are the main architectural features of TensorFlow: 1) TF is a dataflow-oriented architectures, where data is modeled as tensors (multidimensional arrays). DB is also dataflow-oriented, but one key difference is that layers in DB are much more complicated than in TF, where each layer simply represents a single mathematical operator—this makes adding new layers much easier in TF. 2) A TF application consists of 2 main phases — defining the program as a data flow graph, and then executing an optimized version **after** more information about the entire program is available, which helps make advanced optimizations available. 3) State is mutable during execution of the graph. This is a very important difference from DB / parameter server style as when this was not possible, this inhibited the ability of developers to use certain algorithms which required modifying parameters during execution. 4) User-level checkpointing is used for fault-tolerance, and both synchronous and asynchronous replication is possible for TF. Backup workers are present to help deal with the case of straggling workers. By allowing a much higher degree of user control, TensorFlow offers the flexibility that DistBelief could not provide, which is its main contribution. One weakness of this is that increasing flexibility for users and increasing options almost always involves making the learning curve for using a system higher, so for users with relatively simple tasks, TF is probably overkill. Another weakness that is mentioned is that TF fails to achieve the performance that Neon does due to Neon’s assembly-language kernel implementation. |