AI weekly (13/2020)

My selection of news on AI/ML and Data Science

AI weekly (13/2020)

My selection of news on AI/ML and Data Science

+++ Uber details Fiber, a framework for distributed AI model training +++ Google open-sources framework that helps massively scale RL workloads +++ Rightsizing Neural Nets +++ Generating music in the waveform domain +++

Breakthrough — Or So They Say

Tools and Frameworks

Uber details Fiber, a framework for distributed AI model training. Increasing computation underlies many recent advances in machine learning. In classic use cases such as supervised learning, distributed training is often straightforward to set up thanks to established machine learning frameworks (e.g. TensorFlow or PyTorch). But reinforcement and population-based methods pose challenges for reliability, efficiency, and flexibility that some frameworks fall short of satisfying. Fiber addresses these challenges with a lightweight strategy to handle task scheduling. It leverages cluster management software for job scheduling and tracking, doesn’t require preallocating resources, and can dynamically scale up and down on the fly, allowing users to migrate from one machine to multiple machines seamlessly. It supports Linux systems running Python 3.6 and up and Kubernetes running on public cloud environments like Google Cloud, and the research team says that it can scale to hundreds or even thousands of machines. More details can be found on the GitHub site and the preprint arXiv:2003.11164.

Google open-sources framework that helps massively scale RL workloads. Addressing the same problem as Uber’s Fiber (see above), Google AI has now open-sourced SEED RL, an RL agent that scales to thousands of machines, which enables training at millions of frames per second, and significantly improves computational efficiency. This is achieved with a novel architecture that takes advantage of accelerators (GPUs or TPUs) at scale by centralizing model inference and introducing a fast communication layer. The architecture of an RL agent is usually separated into actors and learners: The actors typically run on CPUs and iterate between taking steps in the environment and running inference on the model to predict the next action, the learner trains the model on GPUs using input from distributed inference on hundreds of machines. In contrast to that, the SEED RL architecture executes inference by the learner centrally on specialized hardware (GPUs or TPUs), enabling accelerated inference and avoiding data transfer bottlenecks by ensuring that the model parameters and state are kept local. While observations are sent to the learner at every environment step, latency is kept low due to a very efficient network library based on the gRPC framework with asynchronous streaming RPCs. This makes it possible to achieve up to a million queries per second on a single machine. The learner can be scaled to thousands of cores. More details can be found in the related blog post and preprint arXiv:1910.06591, the code is published on github.

Business News and Applications

Publications

Rightsizing Neural Nets — A Constructive Prediction of the Generalization Error Across Scales. With the success and heightened adoption of neural networks for real world tasks, some questions remain poorly answered. For a given task and model architecture, how much data would one require to reach a prescribed performance level? How big a model would be needed? New research estimates the impact of dataset and model sizes on neural network performance. Researchers from MIT, York University, Harvard and other institutions have introduced an equation — a so-called error landscape — that predicts how much data and how large a model it takes to generalize well. The researchers trained various state-of-the-art vision and language models on a number of benchmark datasets. Considering 30 combinations of architecture and dataset, they observed three effects when varying data and model size: (1) For a fixed amount of data, increasing model size initially boosted performance on novel data, though effect leveled off. (2) The researchers observed a similar trend as they increased the amount of training data. (3) The effect of boosting both model and dataset size was approximately the same as the combined impact of changing each one individually. An equation captures these observations. It describes the error as a function of model and data size, forming a 3D surface or error landscape.

Tutorials

Generating music in the waveform domain. Sander Dielman is research scientist at DeepMind. His blog post is a broad introduction to music generation in the waveform domain, contain links to lots of related papers and code.

See also