2020 DS/ML digest 09

2020 DS/ML digest 09

Posted by snakers41 on June 15, 2020

Misc

Ben Evans - https://mailchi.mp/76b5df661c91/benedicts-newsletter-no-451198?e=b7fff6bc1c
What happens after Zoom - https://www.ben-evans.com/benedictevans/2020/6/22/zoom-and-the-next-video
Ben Evans https://mailchi.mp/bad1c520af3b/benedicts-newsletter-no-451186?e=b7fff6bc1c
It is always funny when Mac’s captive audience discovers … that they have unix shell and core programms … and decides not to use them https://tonsky.me/blog/syncthing/
A nice more in-depth view on history of print media - https://www.ben-evans.com/benedictevans/2020/6/14/75-years-of-us-advertising
Just an awesome blog - blog.cerebralab.com

NLP

This is just silly - 600B params NMT by Google https://arxiv.org/pdf/2006.16668.pdf

SmartReply for YouTube Creators https://ai.googleblog.com/2020/07/smartreply-for-youtube-creators.html

  • Old, Gmail smart reply: encoded input emails word-by-word with a recurrent neural network, and then decoded potential replies with yet another word-level recurrent neural network
  • Current model: searches through a predefined list of suggestions for the most appropriate response
  • Current approach in detail:
    • Encode the text without any preprocessing
    • Pre-compute all embeddings
    • Feeding it text as a sequence of characters or bytes
    • Shrinking the sequence length by applying temporal reduction layers at each layer of the network provides a good trade-off between computation and quality
    • Dual encoder network
    • A contrastive objective
    • Single cross-lingual model for all supported languages

Speech

Pre-train and transfer learning for Speech tasks:

An investigation of phone-based subword units for end-to-end speech recognition http://arxiv.org/abs/2004.04290

Speech separation of 5 speakers speaking at the same time by FAIR

ML

A review of what works in tracking / re-id tasks - https://habr.com/ru/company/recognitor/blog/505694/
Can a NN solve some task? https://blog.cerebralab.com/When_to_assume_neural_networks_can_solve_a_problem
Sensing Force-Based Gestures on the Pixel 4 https://ai.googleblog.com/2020/06/sensing-force-based-gestures-on-pixel-4.html
New Python 3.9 features - https://martinheinz.dev/blog/21
GPU acceleration for WSL 2 - https://devblogs.nvidia.com/announcing-cuda-on-windows-subsystem-for-linux-2/ - tectonic plates are shifting
Gitub linter - https://github.blog/2020-06-18-introducing-github-super-linter-one-linter-to-rule-them-all/
Roof damage via ML on maps - https://ai.googleblog.com/2020/06/machine-learning-based-damage.html
The Tesla LIDAR fallacy - https://www.forbes.com/sites/bradtempleton/2020/04/14/if-teslas-dream-of-making-cameras-perform-as-well-as-lidar-comes-true-it-may-help-teslas-competitors-more/#2f2b83c658e8
RepNet: Counting Repetitions in Videos - https://ai.googleblog.com/2020/06/repnet-counting-repetitions-in-videos.html
New in CoreML - https://machinethink.net/blog/new-in-apple-machine-learning-2020/
SpineNet: A Novel Architecture for Object Detection Discovered with Neural Architecture Search - https://ai.googleblog.com/2020/06/spinenet-novel-architecture-for-object.html
The machine learning community has a toxicity problem - https://www.reddit.com/r/MachineLearning/comments/hiv3vf/d_the_machine_learning_community_has_a_toxicity/

Code

Github super linter - https://github.com/github/super-linter/blob/master/README.md
gRPC for Python guide - https://martinheinz.dev/blog/23
Speed of python async libraries http://calpaterson.com/async-python-is-not-faster.html
https://amir.rachum.com/blog/2020/06/25/cheat-sheet/
https://codewithoutrules.com/2020/06/25/dev-environment/

Datasets

Russian paraphrasing dataset http://paraphraser.ru/download/