2020 DS/ML digest 14

2020 DS/ML digest 14

Posted by snakers41 on December 26, 2020

Speech

Our VAD was released - https://github.com/snakers4/silero-vad
A post from nmslib author on GAN augmentations for speech - http://searchivarius.org/blog/data_augm_gan_2020
Also, if this is at least 50% true then this guy is a real CS/ML role model - http://searchivarius.org/about

A series of releases in Speech:

ML

Will natural language processing engineers find it hard to get work in the future? Once computers are capable of near-perfect text and speech processing and good tools are freely available, will most NLP engineers be out of work? - http://searchivarius.org/blog/will-natural-language-processing-engineers-find-it-hard-get-work-future-once-computers-are
A Microsoft custom data type for efficient inference - https://www.microsoft.com/en-us/research/blog/a-microsoft-custom-data-type-for-efficient-inference (they really want a large trillion parameter transformer)
Looks like new Github report is … meh - https://octoverse.github.com/
AMP guide from Nvidia - https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html
Transformers for Image Recognition at Scale, models https://console.cloud.google.com/storage/browser/vit_models/imagenet21k%2Bimagenet2012?pageState=(“StorageObjectListTable”:(“f”:"%5B%5D"))&prefix=&forceOnObjectsSortingFiltering=false
3D Object Recognition 101 - https://habr.com/ru/company/ods/blog/522836/
Once again stumbled upon JAX - https://iaml.it/blog/jax-intro-english
Also JAX again - https://deepmind.com/blog/article/using-jax-to-accelerate-our-research
This “AI” newsletter from investment bankers did not entirely suck this time - https://newsletter.airstreet.com/issues/your-guide-to-ai-november-2020-290213
Machine learning could be fundamentally unexplainable https://blog.cerebralab.com/Machine learning could be fundamentally unexplainable
You might not need machine learning - https://nullprogram.com/blog/2020/11/24/
MPNet combines strengths of masked and permuted language modeling for language understanding - https://www.microsoft.com/en-us/research/blog/mpnet-combines-strengths-of-masked-and-permuted-language-modeling-for-language-understanding/
Use summary ROC curves to compare ML in medicine to individual doctors - https://lukeoakdenrayner.wordpress.com/2020/12/08/docs-are-rocs-a-simple-fix-for-a-methodologically-indefensible-practice-in-medical-ai-studies/
A FRIENDLY INTRODUCTION TO PCA - http://peterbloem.nl/blog/pca
Portrait Light: Enhancing Portrait Lighting with Machine Learning - https://ai.googleblog.com/2020/12/portrait-light-enhancing-portrait.html
Naturally Occurring Equivariance in Neural Networks - https://distill.pub/2020/circuits/equivariance/
‘Seeing’ on tiny battery-powered microcontrollers with RNNPool - https://habr.com/ru/company/southbridge/blog/531820
Privacy Considerations in Large Language Models - https://ai.googleblog.com/2020/12/privacy-considerations-in-large.html
MediaPipe Holistic — Simultaneous Face, Hand and Pose Prediction, on Device - https://ai.googleblog.com/2020/12/mediapipe-holistic-simultaneous-face.html
A holistic representation toward integrative AI - https://www.microsoft.com/en-us/research/blog/a-holistic-representation-toward-integrative-ai/

Code

Retvals, terrible teaching, and admitting we have a problem - https://rachelbythebay.com/w/2020/12/06/forked/
Commits are snapshots, not diffs - https://github.blog/2020-12-17-commits-are-snapshots-not-diffs/
Wrapping SSH into HTTPS (RU) - https://habr.com/ru/post/531590/
De Bruijn Sequence Generator for Faster Shift Register Code Bruteforcing - https://damip.net/article-de-bruijn-sequence
Endlessh: an SSH Tarpit - https://nullprogram.com/blog/2019/03/22/
Networking Tools Every Developer Needs to Know - https://martinheinz.dev/blog/38
Fighting python memory leaks - https://habr.com/ru/company/domclick/blog/532030/

Tech / Market / Harware

FB building … phone cell towers - https://engineering.fb.com/2020/12/03/connectivity/supercell-reaching-new-heights-for-wider-connectivity/
NXP-Amazon Deal Promises Carmakers Vehicle-Wide Data - https://www.eetimes.com/nxp-amazon-deal-promises-carmakers-vehicle-wide-data/
AWS Titanium - https://aws.amazon.com/machine-learning/trainium/
A LOOK AT ROBLOX’S USER DATA - https://digitstodollars.com/2020/12/08/a-look-at-robloxs-user-data/
AMAZON AI ASSEMBLE - https://digitstodollars.com/2020/12/08/amazon-ai-assemble/
WSL 2 GPU Support is Here - https://www.docker.com/blog/wsl-2-gpu-support-is-here/
3080 Ti Confirmed - https://docs.google.com/spreadsheets/d/1xAo6TcSgHdd25EdQ-6GqM0VKbTYu8cWyycgJhHRVIgY/edit?usp=sharing

Random

Stenotyping in English - https://habr.com/ru/post/530682
Recovers passwords from pixelized screenshots - https://github.com/beurtschipper/Depix
Managing AWS surprise bills - https://bahr.dev/2020/12/02/surprise-bills/
Acquiring the Reflectance Field of a Human Face - http://www.pauldebevec.com/Research/LS/

Datasets

English spell-checking dictionaries - http://wordlist.aspell.net/12dicts-readme/
MLS dataset - https://arxiv.org/abs/2012.03411
Some Russian audio books and English data - https://github.com/sovaai/sova-dataset