AI weekly (16/2020)

My selection of news on AI/ML and Data Science

AI weekly (16/2020)

My selection of news on AI/ML and Data Science

+++ OpenAI Open Sources Microscope and the Lucid Library to Visualize Neurons in Deep Neural Networks +++ What you need to know about product management for AI +++ Forecasting Best Practices, from Microsoft +++

Breakthrough — Or So They Say

Tools and Frameworks

OpenAI Open Sources Microscope and the Lucid Library to Visualize Neurons in Deep Neural Networks. The abilities of modern neural networks are the result of the interactions of thousands of neurons (sometimes tens of thousands or more!). In order to understand their behavior, we’d like to be able to quickly and easily investigate these neurons interactions in detail, and share those observations. OpenAI has introduced the OpenAI Microscope, a collection of visualizations of common deep neural networks in order to facilitate their interpretability. Microscope makes it easier to analyze the features that form inside these neural networks as well as the connections between its neurons. Using OpenAI Microscope, you can select a sample dataset and visualize the core architecture of a neural network, e.g. AlexNet, alongside the state of the image classification process on each layer. The visualizations in Microscope are based on the open sourced Lucid library. More background on the idea behind these visualization can be found in a recent article on Distill.

Business News and Applications

Publications

What you need to know about product management for AI. O’Reilly has published a long article by Peter Skomoroch and Mike Loukides on typical challenges of being an AI product manager, and why it is different from being a traditional (software) PM. Here are a few highlights from their article: (1) AI systems differ from traditional software in many ways, but the biggest difference is that machine learning shifts engineering from a deterministic process to a probabilistic one. With machine learning, we often get a system that is statistically more accurate than simpler techniques, but with the tradeoff that some small percentage of model predictions will always be incorrect, sometimes in ways that are hard to understand. (2) This shift requires a fundamental change in your software engineering practice and has serious implications for software testing, versioning, deployment, and other core development processes. The model is produced by code, but it isn’t code; it’s an artifact of the code and the training data. That data is never as stable as we’d like to think. As your user base grows, the demographics and behavior of the user population in production shift away from your initial training data, which was based on early adopters. Models also become stale and outdated over time. (3) Planning and estimation are difficult for AI products because it is rare to find two real-world systems where the training data and algorithms applied are the same. There are strategies for dealing with all of this uncertainty–starting with the proverb from the early days of Agile: “do the simplest thing that could possibly work.” You don’t always need to start with a complex neural network; a simple regression (or even simpler, an average) might be enough to get your project off the ground. In some cases, that simple model may be all you ever need. (4) Particularly at a company that’s new to AI, part of an AI product manager’s job is helping the organization build the culture it needs to succeed with AI. Because it’s so different from traditional software development, where the risks are more or less well-known and predictable, AI rewards people and companies that are willing to take intelligent risks, and that have (or can develop) an experimental culture. You need to understand that many solutions can’t be bought (yet), that AI products require collaboration between teams, that data silos stand in the way of success, and that the best remedy for failure is picking yourself up and trying again. (5) You are unlikely to succeed at AI if you haven’t laid a proper foundation for it. That foundation means that you have already shifted the culture and data infrastructure of your company. In The AI Hierarchy of Needs, Monica Rogati argues that you can build an AI capability only after you’ve built a solid data infrastructure, including data collection, data storage, data pipelines, data preparation, and traditional analytics. If you can’t walk, you’re unlikely to run. (6) Effective product managers for AI know the difference between easy, hard, and impossible problems. For AI products, one important part of being “feasible” is being precisely defined. As Jeremy Jordan says, “A problem well-defined is half solved.”

Tutorials

Forecasting Best Practices, from Microsoft. Microsoft has released a GitHub repository to share best practices for time series forecasting. From the repo: “This repository provides examples and best practice guidelines for building forecasting solutions. The goal of this repository is to build a comprehensive set of tools and examples that leverage recent advances in forecasting algorithms to build solutions and operationalize them. Rather than creating implementations from scratch, we draw from existing state-of-the-art libraries and build additional utilities around processing and featurizing the data, optimizing and evaluating models, and scaling up to the cloud. The repository includes detailed examples of various time series modeling techniques, as Jupyter Notebooks for Python, and R Markdown documents for R. […] Currently, we focus on a retail sales forecasting use case as it is widely used in assortment planning, inventory optimization, and price optimization. To enable high-throughput forecasting scenarios, we have included examples for forecasting multiple time series with distributed training techniques such as Ray in Python, parallel package in R, and multi-threading in LightGBM.” Featured models currently include Auto ARIMA, Linear Regression, LightGBM, DilatedCNN, Mean Forecast, ARIMA, ETS, Prophet.

See also