AI weekly (49/2019)

My selection of news on AI/ML and Data Science

AI weekly (49/2019)

My selection of news on AI/ML and Data Science

+++ An Epidemic of AI Misinformation +++ AWS launches SageMaker Studio, a web-based IDE for machine learning +++ Netflix open-sources Metaflow framework +++ ‘Short window’ to stop AI taking control of society, warns ex-Google employee +++ Deep Learning For Symbolic Mathematics +++ Alibaba: Dynamic Pricing with Deep RL +++ Training multi-agent AI systems to solve complex tasks through cooperation +++ A Visual Guide to Using BERT for the First Time +++ Vaex: How to analyse 100 GB of data on your laptop with Python +++

Breakthrough — Or So They Say

An Epidemic of AI Misinformation. An article by Gary Marcus in The Gradient highlights the risk of overpromising current achievements in AI and draws connections to the historical “AI winters”. A similar, earlier article can be found here. Several examples of stark overpromising are given, e.g. from luminaries such as Hinton, Ng, LeCunn, or Sutskever. On the other hand, Gary Marcus also highlights contributions by Yoshua Bengio and Pieter Abdeel which are much more balanced.

Tools and Frameworks

AWS launches SageMaker Studio, a web-based IDE for machine learning. AWS has announced the launch of SageMaker Studio, a web-based IDE for building and training machine learning workflows. It includes everything a data scientist would need to get started, including ways to organize notebooks, data sets, code and models, for example. It essentially wants to be a one-stop shop for all the machine learning tools and results you need to get started. At the core of Studio is also the ability to share projects and folders with others who are working on the same project, including the ability to discuss notebooks and results. SageMaker Studio is integrated with AWS’s SageMaker machine learning service, which can automatically scale based on your needs. It now includes a debugger, a monitoring tool and Autopilot, which automatically creates the best models for you based on your data, with full visibility into how it decides to build your models. Related to this, AWS also launched SageMaker Notebooks, also integrated into Studio, as a managed service. Data scientists won’t have to provision instances for this as they will automatically provision them as necessary.

AWS launches Braket, its quantum computing service. AWS has announced the preview launch of Braket (named after the common bra-ket notation for quantum states), its own quantum computing service. It’s not building its own quantum computer, though. Instead, it’s partnering with D-Wave, IonQ and Rigetti and making their systems available through its cloud. In addition, it’s also launching the AWS Center for Quantum Computing and AWS Quantum Solutions Lab. Amazon Braket is a fully managed service that makes it possible for customers to get started on quantum computing by providing a single development environment to build quantum algorithms, test them on simulated quantum computers, and try them on a range of different quantum hardware architectures. Customers can assess the present and future capabilities of a variety of different quantum technologies that initially include quantum annealing from D-Wave, ion trap devices from IonQ, and superconducting chips from Rigetti, with more to be added in the coming months. News coverage can be found here, here, here, and here.

Netflix open-sources Metaflow, its Python framework for building and managing data science projects. Netflix announced on their tech blog that they open source Metaflow, a framework to “build and manage real-life data science projects with ease”. Metaflow is a simple Python library. Data scientists can structure their workflow as a Directed Acyclic Graph of steps which can be arbitrary Python code. While there are many existing frameworks, such as Apache Airflow or Luigi, which allow execution of DAGs consisting of arbitrary Python code, Metaflow is supposed to differentiate itself through many carefully designed details like, for instance, storing data and models as normal Python instance variables or supporting execution on a distributed compute platform by design. Netflix is one of the largest users of Amazon Web Services (AWS) and have accumulated plenty of operational experience and expertise in dealing with the cloud. For this open-source release, Netflix partnered with AWS to provide a seamless integration between Metaflow and various AWS services, leveraging elasticity of the cloud by design — both for compute and storage. A critical discussion can be found on Hacker News. Alternative frameworks are mlflow, Dagster, Prefect, and others.

Probability Distribution Explorer. This tool by Justin Bois makes it easy to explore commonly used probability distributions, including information about the stories behind them, their probability mass/probability density functions, their moments, etc. Each distribution includes interactive vignettes and syntax for NumPy, SciPy, and Stan.

Business News and Applications

‘Short window’ to stop AI taking control of society, warns ex-Google employee. An article by The Irish Times, featuring a presentation by Meredith Whittaker at the Falling Walls conference during Berlin Science Week. As one of the original organisers of Google Walkout she has something to say about the vast amount of power global technology companies like Google. For here, working for Google ultimately was incompatible with her role at the AI Now Institute, which she co-founded to examine the social implications of artificial intelligence. She states: “What’s frightening about AI isn’t terminators and super intelligent machines: it’s the way AI works to centralise knowledge and power in the hands of those who already have it and further disempower those who don’t.” As a case in point she mentions image recognition and the ImageNet dataset, claiming that ImageNet was created on the back of questionable data collection practices and cheap human labour. “It consists of over 15 million images scraped from Flickr and the web without consent, then labelled by low-paid Amazon Mechanical Turk workers.” She explains that large technology companies happen to have all the necessary ingredients to push the envelope on AI: they have masses of social data thanks to their vast consumer market reach, as well as powerful infrastructure designed to collect, process and store such data. “In short, the current crop of AI, the AI that is touching our lives and institutions, is a corporate technology. Only five or so companies in the West have the resources to develop this technology at scale, which means that we cannot talk about AI without confronting issues of power.” The article contains many more aspects and examples and is well worth reading.

Publications

Deep Learning For Symbolic Mathematics (arXiv:1912.01412). Two researchers from Facebook AI Research claim they have developed neural nets that can solve complicated integrals and differential equations. Quote: “Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data. In this paper, we show that they can be surprisingly good at more elaborated tasks in mathematics, such as symbolic integration and solving differential equations. We propose a syntax for representing mathematical problems, and methods for generating large datasets that can be used to train sequence-to-sequence models. We achieve results that outperform commercial Computer Algebra Systems such as Matlab or Mathematica.”

Dynamic Pricing On E-commerce Platform With Deep Reinforcement Learning (arXiv:1912.02572v1). A team from Alibaba Supply Chain Platform (Hangzhou) outlines a framework that represents dynamic pricing as a Markov Decision Process (MDP). The agent periodically changes product prices as its action after observing the environment state. Each pricing episode ends when the product is out of stock. The model is pre-trained based on historical sales data and previous specialists’ pricing actions. The Alibaba team uses ‘difference of revenue conversion rates’ as a reward function which is supposed to indicate if pricing actions had a positive impact on revenue conversion. The framework was tested offline and online and showed better performance than manual pricing by operations experts.

Training multi-agent AI systems to solve complex tasks through cooperation (arXiv:1910.08809). A novel approach to cooperative multi-agent reinforcement learning that assigns tasks to individual agents within a group, thereby improving the entire group’s ability to collaborate, is published by facebook AI with source code on github. The approach focuses on multi-agent collaborative (MAC) problems where agents have to carry out multiple intermediate tasks in order to accomplish a larger one. As the number of agents and tasks in these kinds of MAC problems increases, the complexity grows exponentially, which prevents systems from learning directly from large-scale scenarios. Systems must instead generalize from smaller scenarios and tackle tasks that were not part of their RL-based trial-and-error training runs. Since RL-trained systems often struggle with this exact type of generalization, the approach breaks MAC policies down into high- and low-level policies. The high-level policies determine which agents should be assigned specific tasks. To encourage collaboration between agents, a quadratic cost function is employed that optimizes for long-term performance.

Tutorials

A Visual Guide to Using BERT for the First Time. Jay Alammar has a great blog under the motto “Visualizing machine learning one concept at a time”. His latest post is a simple tutorial for how to use a variant of BERT to classify sentences. It is an example that is basic enough as a first intro, yet advanced enough to showcase some of the key concepts involved.

Vaex: How to analyse 100 GB of data on your laptop with Python. This article by Jovan Veljanoski shows a fast, secure, and convenient way to do data science using data of almost arbitrary size, with an extended example in a Jupyter notebook. Vaex is an open-source DataFrame library which enables the visualisation, exploration, analysis and even machine learning on tabular datasets that are as large as your hard-drive. To do this, Vaex employs concepts such as memory mapping, efficient out-of-core algorithms and lazy evaluations. All of this is wrapped in a familiar Pandas-like API, so anyone can get started right away.

Forecasting in Python with Facebook Prophet. In this extensive blog post, Greg Rafferty shows how to tune and optimize Prophet using domain knowledge to add greater control to forecasts. Quote: “In this post, I’ll explain how to forecast using Facebook’s Prophet and demonstrate a few advanced techniques for handling trend inconsistencies by using domain knowledge.”

See also