2020 DS/ML digest 08

2020 DS/ML digest 08

Posted by snakers41 on June 15, 2020

Misc

Ben Evans - https://mailchi.mp/edab3d4e1df8/benedicts-newsletter-no-451162?e=b7fff6bc1c

  • A whole bunch of SV companies (FB, Slack, Box etc etc) announced they’ll let people shift entirely to remote work once the lockdown is over


How to think about old and new tech - https://www.jujens.eu/posts/en/2020/May/31/javascript-fatigue/
Ultimate Guide to Python Debugging - https://martinheinz.dev/blog/24
If you still have any illusions about Yandex - https://habr.com/ru/post/505240/
Netflix content amortization https://www.behindthebalancesheet.com/blog-1/netflix-cooked
Online live events are similar to early e-commerce https://www.ben-evans.com/benedictevans/2020/6/4/solving-online-events
Python CLI formatting https://github.com/willmcgugan/rich


Speech

Not sure why this exists - https://voxclamantisproject.github.io/data.html


NLP

This is retarded on so many levels - https://openai.com/blog/openai-api/

They … finally guessed that training 9000 huge models till convergence is stupid - https://huggingface.co/calculator/

  • Log of model size is linearly releated to log of best achievable performance
  • Model of each size squeezes out 90% of its performance during first 15-25% of epochs
  • If you multiply your model size ca 10x you can multiply your compute cost also 10x

Linformer: Self-Attention with Linear Complexity

  • http://arxiv.org/abs/2006.04768
  • looks like a decent solution to apply transformer layers to very long sequences
  • looks simple enough
  • works with fixed sequence lengths (or you can just have several projection layers)
  • i.e. it does not make sense to project your length down unless k is much smaller than n, i.e. 128 vs 512
  • looks like is most applicable only in text, because you cannot down-scale text, unlike sound / images
  • best gains achieved on long sequences

Google translate - recent improvements

  • https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html
  • Transformer encoder and an RNN decoder
  • Most of these quality gains were from the transformer encoder
  • RNN decoder is much faster at inference time, we applied a variety of optimizations before coupling it with the transformer encoder
  • Examples of translated sentences and documents, which are typically collected from the public web
  • A curriculum learning problem — the models start out training on all data, and then gradually train on smaller and cleaner subsets
  • Advances That Benefited Low-Resource Languages in Particular:
    • Synthetic parallel data, where the sentences in one language are written by a human, but their translations have been generated by a neural translation model
    • M4, which uses a single, giant model to translate between all languages and English

How to generate text: using different decoding methods for language generation with Transformers

A state-of-the-art open source chatbot - https://ai.facebook.com/blog/state-of-the-art-open-source-chatbot/:

  • 9.4 billion parameters
  • Pretrained large (up to 9.4 billion) Transformer neural networks on large amounts of conversational data
  • Previously available public domain conversations that involved 1.5 billion training examples of extracted conversations
  • Too large to fit on a single device
  • The 2.7B can be interacted with on a 16gb P100 GPU or better. The 9.4B parameter model requires at least two 32gb V100 GPUs to interact with
  • Someone made a demo - https://colab.research.google.com/drive/1JxuWRZCV0C7bfCR6gvrju8noagIUj0oi?usp=sharing#scrollTo=mLvt19JdEaHA

When Does Unsupervised Machine Translation Work?

  • https://arxiv.org/pdf/2004.05516.pdf
  • Performance rapidly deteriorates when source and target corpora are from different domains
  • Unsupervised MT performance declines when source and target languages use different scripts
  • Very poor performance on authentic low-resource language pairs

Sparse softmax

Very compact BERT

Extracting data from templatic documents - https://ai.googleblog.com/2020/06/extracting-structured-data-from.html
Abstractive summarization with transformers and pre-training, should work on low-resource - https://ai.googleblog.com/2020/06/pegasus-state-of-art-model-for.html


ML

Yet another hilarious AI bullshit post from F. Piekniewski- https://blog.piekniewski.info/2020/06/08/ai-the-no-bullshit-approach/
PyTorch vs Tensorflow in production - https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2
Stack Overflow survey 2020 - https://insights.stackoverflow.com/survey/2020#technology-most-loved-dreaded-and-wanted-languages-loved
Do not become a data scientist - https://towardsdatascience.com/dont-become-a-data-scientist-ee4769899025
Looks like a proper simple solution to PyTorch DataLoader “memory leak” issue - https://github.com/pytorch/pytorch/issues/13246#issuecomment-612396143
Article review from ODS - https://habr.com/ru/company/ods/blog/505040/
Properly working image classifier visualization technique - https://github.com/jacobgil/pytorch-grad-cam ?

A hilarious paper debunking academic BS in the field of metric learning:

A cool object detection alternative

What is achievable in fashion images with GANs

Some review of self-supervised image techniques - https://dyakonov.org/2020/06/03/самообучение-self-supervision/:

  • To be honest, I more believe in weakly supervised approaches
  • I.e. train, annotate, filter, re-annotate, re-train with augs, etc etc

Yet another large-scale NAS paper by FAIR et al - http://arxiv.org/abs/2006.02049

  • TLDR - now they also search for training recipes as well as architectures via NAS
  • A paper from a series of papers claiming to be “the most efficient”
  • Very complicated, does not transfer to real life except for the models maybe
  • “We use distributed training with 8 nodes for the final models”, let me guess, each node has 8 GPUs?
  • Stochastic weight averaging via EMA yields significant accuracy gain for the classification tasks


Datasets

A small collection of sentiment datasets in English https://github.com/NVIDIA/sentiment-discovery#data-downloads - also some older pre-trained sentiment classifiers and transformers
NLP dataset viewer - https://huggingface.co/nlp/viewer/