AI weekly (01/2020)

My selection of news on AI/ML and Data Science

AI weekly (01/2020)

My selection of news on AI/ML and Data Science

+++ An artificial intelligence predicts the future +++ Google’s AI breast cancer screening tool is learning to generalize across countries +++ The Decade of Deep Learning +++ Using Machine Learning to “Nowcast” Precipitation in High Resolution +++ A Decade in Review in Tech +++ Catalog of Variable Transformations To Make Your Model Work Better +++

Breakthrough — Or So They Say

An artificial intelligence predicts the future. “The World in 2020” is the 34th edition of The Economist’s future-focused franchise publication providing views on the coming year. Now, for the first time, one of the contributions was written by an AI called GPT-2, created by OpenAI. GPT-2 is an unsupervised language model trained using 40 gigabytes of text from the internet. Given a prompt, such as a question or the first sentence of a story, it generates what might plausibly come next. Here one sample answer: “Q: Which technologies are worth watching in 2020?” A: “I would say it is hard to narrow down the list. The world is full of disruptive technologies with real and potentially huge global impacts. The most important is artificial intelligence, which is becoming exponentially more powerful.” And one interesting prediction: “Q: Do you think Donald Trump will win a second term?” A: “I think he will not win a second term. I think he will be defeated in the general election.” More background on the interview can be found in a blog post on Medium.

Tools and Frameworks

Business News and Applications

Google’s AI breast cancer screening tool is learning to generalize across countries. DeepMind and Google Health have developed a new AI system to help doctors detect breast cancer early. The researchers trained an algorithm on mammogram images from female patients in the US and UK, and it performed better than human radiologists. The results were published in Nature (behind a paywall, but see also their blog post). ZDNet provides a critical assessment of the publication. Details on the implementation can be found in the supplementary information to the paper.

Publications

The Decade of Deep Learning. Driven by the development of ever-more powerful compute and the increased availability of big data, deep learning has seen monumental progress over the last decade and successfully tackled many previously intractable problems, especially in Computer Vision and Natural Language Processing. This post is an overview of some the most influential Deep Learning papers of the last decade. It should provide a jumping-off point into many disparate areas of Deep Learning by providing succinct and dense summaries that go slightly deeper than a surface level exposition, with many references to the relevant resources.

Using Machine Learning to “Nowcast” Precipitation in High Resolution. Accurate weather predictions can be particularly challenging for localized storms or events that evolve on hourly timescales, such as thunderstorms. In the preprint arXiv:1912.12132, Google AI is presenting new research into the development of machine learning models for precipitation forecasting that addresses this challenge by making highly localized “physics-free” predictions that apply to the immediate future. A significant advantage of machine learning is that inference is computationally cheap given an already-trained model, allowing forecasts that are nearly instantaneous and in the native high resolution of the input data. This precipitation nowcasting, which focuses on 0-6 hour forecasts, can generate forecasts that have a 1km resolution with a total latency of just 5-10 minutes, including data collection delays, outperforming traditional models, even at these early stages of development.

A Decade in Review in Tech. An interesting post on Medium that reviews (and previews) developments in the software engineering space, covering trends like containerization, CI/CD, streaming, and lots more.

Tutorials

Catalog of Variable Transformations To Make Your Model Work Better. This post on Towards Data Science presents 10+ numeric variable transformation techniques and 7 categorical variable techniques, from basic approaches to advanced approaches. Though some are known to work better than others, e.g. target encoding has great track records than other encoding techniques, there’s not a single approach that always works well in every model with any data. Therefore, it is important to research well what encoding works best in what case, and when you face a new problem, you use as many approaches as introduced in this post until you figure out which works the best.

See also