2020 DS/ML digest 13

Posted by snakers41 on November 26, 2020


Improving On-Device Speech Recognition with VoiceFilter-Lite - https://ai.googleblog.com/2020/11/improving-on-device-speech-recognition.html

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

  • Link http://arxiv.org/abs/2010.10504
  • TLDR - looks like your typical huge compute overfitting exercise
    • Unlabeled audio of the Libri-Light dataset
    • Noisy student training with SpecAugment using giant Conformer models pre- trained using wav2vec 2.0 pre-training
    • Giant Conformer models
    • Pre-training: We use wav2vec 2.0 pre-training
    • Iterative Self-Training: We use noisy student training (NST) with adaptive SpecAugment
  • Architecture:
    • Screenshot 2020-11-02 131730
    • Sequence transducer consisting of a LSTM decoder and a Conformer encoder
    • Get rid of relative positional embedding
    • Conformer XL and Conformer XXL, which have 600M and 1B parameters
    • Conformer L has a single layer LSTM as its decoder, while XL and XXL have two-layer LSTM decoders
  • Compute
    • Pre-training 256/512 Google TPU V3 cores for 3-4 days for the XL/XXL models respectively
    • Fine-tuning the pre-trained checkpoints (400k steps) with global batch size 1024/512 on 256/512 Google TPU v3 cores for 1-3 days for the XL/XXL
  • Pre-training Ideas
    • log-mel spectrograms instead of raw wavs
  • LM and Fusion
    • Eight-layer 103M-parameter transformer language model
    • On the LibriSpeech language model corpus
    • 1024-token word-piece-model (WPM)

Navigating Recorder Transcripts Easily, with Smart Scrolling - https://ai.googleblog.com/2020/11/navigating-recorder-transcripts-easily.html


Silero models now has a Ukrainian model - https://t.me/snakers4/2582
Text augs (RU) - https://dyakonov.org/2020/11/09/text-augmentation/
CV inference 101 - https://habr.com/ru/company/recognitor/blog/524980/
Associative memories (AM) are pattern storage and retrieval systems inspired by the psychological concept of the same name - https://thegradient.pub/dont-forget-about-associative-memories/
The Narrated Transformer - https://www.youtube.com/watch?v=-QH8fRhqFHM
Google’s URL2Video - https://ai.googleblog.com/2020/10/experimenting-with-automatic-video.html
Developing Real-Time, Automatic Sign Language Detection for Video Conferencing - https://ai.googleblog.com/2020/10/developing-real-time-automatic-sign.html
First decent AV analysis article - https://www.eetimes.com/is-av-software-driver-detecting-what-we-are-seeing/
What is Vokenization - https://www.technologyreview.com/2020/11/06/1011726/ai-natural-language-processing-computer-vision/
Life and Death of a Linux Process - https://natanyellin.com/posts/life-and-death-of-a-linux-process/
YouTube DL reinstated - https://github.blog/2020-11-16-standing-up-for-developers-youtube-dl-is-back/
Using GANs to Create Fantastical Creatures - https://ai.googleblog.com/2020/11/using-gans-to-create-fantastical.html

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

  • https://github.com/facebookresearch/pifuhd
  • Looks awesome
  • End-to-end multi-level framework that infers 3D geometry of clothed humans at an 1k image resolution in a pixel-aligned manner
  • No explicit geometric representation is enforced in the coarse levels
  • Pixel-Aligned Implicit Function (PIFu) representation
  • PIFu does not explicitly discretize the output space representation but instead regresses a function which de- termines the occupancy for any given 3D location
  • Without having a discretized representation of the entire output volume in memory simultaneously
  • Large scale dataset synthetically generated by rendering hundreds of high quality scanned 3D human mesh models
  • The input size + image feature resolution of PIFu limited 512×512 and 128 × 128 due to memory limitations in hardware
  • Network should be designed such that its receptive field covers the entire image
  • No info about compute

Background Features in Google Meet - https://ai.googleblog.com/2020/10/background-features-in-google-meet.html

Audiovisual Speech Enhancement in YouTube Stories - https://ai.googleblog.com/2020/10/audiovisual-speech-enhancement-in.html

  • MobileNet (v2) architecture, 6MB
  • output_audio = 0.1 x original_audio + 0.9 x speech
  • Uses lip movement
  • Acts as a denoiser
  • 1+ year to develop
  • QRNN

Interpretability in Machine Learning: An Overview - https://thegradient.pub/interpretability-in-ml-a-broad-overview/

Python / engineering

Make Python Docker Builds Slim & Fast - https://avilpage.com/2020/10/python-docker-build-slim-fast.html
PyTorch large scale speed and performance

Build Your Next Project with Wolfram Alpha API and Python https://martinheinz.dev/blog/36

  • Interesting application - math captcha generator

How to Build an Open-Domain Question Answering System? https://lilianweng.github.io/lil-log/2020/10/29/open-domain-question-answering.html
A C library by nvidia for inference - https://github.com/triton-inference-server/server/blob/master/docs/architecture.md
Nice feature on a github desktop app - https://github.blog/2020-11-17-introducing-split-diffs-in-github-desktop/
Docker explains for 10th time why they cut free accounts - https://www.docker.com/blog/rate-limiting-by-the-numbers/
Python Pitfalls - Expecting The Unexpected - https://martinheinz.dev/blog/37
A post by the maintainer of pandas - https://tomaugspurger.github.io/whats-next.html


Do Nvidias competitors want it to fail with arm - https://digitstodollars.com/2020/11/04/do-nvidias-competitors-want-it-to-fail-with-arm/
how-huawei-will-survive https://digitstodollars.com/2020/11/03/how-huawei-will-survive/
NVIDIA A100 Launches on AWS - https://blogs.nvidia.com/blog/2020/11/02/nvidia-a100-launches-on-aws/
Mobile driving license - https://security.googleblog.com/2020/10/privacy-preserving-features-in-mobile.html
THE MYSTERY OF APPLE’S MISSING PROFITS https://digitstodollars.com/2020/11/19/the-mystery-of-apples-missing-profits/
Facebook … building subsea cables in Africa - https://engineering.fb.com/2020/11/19/connectivity/subsea-cables/
A case for … Keeping encryption elitist - https://blog.cerebralab.com/Keeping encryption elitist
Electronics supply chain - https://digitstodollars.com/2020/10/27/what-is-happening-to-the-supply-chain/
Resetting online commerce - https://www.ben-evans.com/benedictevans/2020/10/24/resetting-online-commerce
Market definitions and tech monopolies - https://www.ben-evans.com/benedictevans/2020/10/31/market-definitions-and-tech-monopolies


Objectron dataset, a collection of short, object-centric video clips capturing a larger set of common objects from different angles