AI weekly (02/2020)

My selection of news on AI/ML and Data Science

Kleinmachnow (2020)

AI weekly (02/2020)

My selection of news on AI/ML and Data Science

+++ Uber is Open Sourcing Manifold, a Visual Debugging Tool for Machine Learning +++ Lyft releases Flyte, a platform for maintaining AI workflows +++ Google Research: Looking Back at 2019, and Forward to 2020 and Beyond +++ Facebook AI year in review: 2019 +++ An Idea From Physics Helps AI See in Higher Dimensions +++

Breakthrough — Or So They Say

AI system outperforms radiologists in first reading of breast cancer screening. An AI system designed to read mammogram images and identify breast cancer “outperformed” radiologists in the first reading, a study published in Nature has claimed. The research, funded by Google and carried out by Google employees and consultants, tested the AI model using two datasets from the UK (25 856 mammograms) and US (3097 mammograms) and found it reduced the number of false negatives and positives, compared with radiologists. The area under the receiver operating characteristic curve (AUC-ROC) for the AI system was greater than the AUC-ROC for the average radiologist by an absolute margin of 11.5%. Also, in a simulation in which the AI system participated in the double-reading process that is used in the UK, it was found that the AI system maintained non-inferior performance and reduced the workload of the second reader by 88%.

Tools and Frameworks

Uber is Open Sourcing Manifold, a Visual Debugging Tool for Machine Learning. In January 2019, Uber introduced Manifold, a model-agnostic visual debugging tool for machine learning. Now they have released it as an open source project on github with a related demo website. As described in their blog post (and a later one here), Manifold helps engineers and scientists identify performance issues across ML data slices and models, and diagnose their root causes by surfacing feature distribution differences between subsets of data. One of the nice additions of the current release is that Manifold now accepts data input as Pandas DataFrame objects and renders a visualization of this data within Jupyter.

Lyft releases Flyte, a platform for maintaining AI workflows. Lyft has announced the open sourcing of Flyte, a structured programming and distributed processing platform for highly concurrent, scalable, and maintainable workflows. Flyte has been serving production model training and data processing at Lyft for over three years now, becoming the de-facto platform for teams like Pricing, Locations, Estimated Time of Arrivals (ETA), Mapping, Self-Driving (L5), and more. In fact, Flyte manages over 7,000 unique workflows at Lyft, totaling over 100,000 executions every month, 1 million tasks, and 10 million containers. Workflows are often composed of heterogeneous steps. One step, for example, might use Spark to prepare data, while the next uses this data to train a deep learning model. Each step can be written in a different language and utilize different frameworks. Flyte supports heterogeneity by having container images bound to a task. They can be anything from a single container execution, to a remote query in a Hive cluster, to a distributed Spark execution. When fine-grained control is desirable on the execution semantics of a task, Flyte provides backend plugins. These can be used to create and manage Kubernetes resources, including CRDs like Spark-on-k8s, or any remote system like Amazon Sagemaker, Qubole, BigQuery, and more. Code and documentation can be found on github.

Business News and Applications

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond. A rare post by Jeff Dean, Senior Fellow and SVP of Google Research and Health, on behalf of the entire Google Research community. In his look back, he provides links to plenty of Google achievements and publications in areas such as ethical AI, AI for social good, applications to various fields of science, AI in mobile phones, applications to healthcare, quantum computing, ML algos, AutoML, NLP, and machine perception. The breadth of topics and publications is simply overwhelming. In his look forward to 2020 and beyond, Jeff Dean mentions the following topics (1) Redesigning computing platforms with specialized TPUs for inference and others for training-and-inference, (2) building machine learning systems that can handle millions of tasks, and that can learn to successfully accomplish new tasks automatically, (3) avoiding bias, increasing interpretability & understandability, improving privacy and ensuring safety, (4) applying computation and machine learning to make advances in important new areas of science, and (5) ensuring diversity in the research communities.

Facebook AI year in review: 2019. One day ahead to the above-mentioned post by Google, Facebook has also looked back at their last year in AI. Among the highlights they mention are beating human players at Hannabi and Poker, the application of deep learning to symbolic mathematics, achievements in NLP and their contribution to open source resources, foremost PyTorch. And they don’t fail to mention that their chief AI scientist, Yann LeCun, was co-recipient of the 2019 Turing Award.

Publications

An Idea From Physics Helps AI See in Higher Dimensions. Convolutional neural networks (CNNs) have proved surprisingly adept at learning patterns in two-dimensional data — especially in computer vision tasks like recognizing handwritten words and objects in digital images. But when applied to data sets without a built-in planar geometry — say, models of irregular shapes used in 3D computer animation, or the point clouds generated by self-driving cars to map their surroundings — this powerful machine learning architecture doesn’t work well. Around 2016, a new discipline called geometric deep learning emerged with the goal of lifting CNNs out of flatland. Already, so-called gauge CNNs have greatly outperformed their predecessors in learning patterns in simulated global climate data, which is naturally mapped onto a sphere. Technical details may be found in a preprint on arXiv:1902.04615, but the topic is also featured in Quanta Magazine in a less technical manner.

Tutorials

See also