2021 DS/ML digest 04

2021 DS/ML digest 04

Posted by snakers41 on April 28, 2021

Speech

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling - http://arxiv.org/abs/2103.14574
Mozilla partners with NVIDIA to democratize and diversify voice technology - https://foundation.mozilla.org/en/blog/mozilla-partners-with-nvidia-to-democratize-and-diversify-voice-technology/
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction - http://arxiv.org/abs/2104.08189

ML / Papers

Paper Review: Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning - https://andlukyane.com/blog/paper-review-nlptriplettricks
Paper Review: Paper Review: EfficientNetV2: Smaller Models and Faster Training - https://andlukyane.com/blog/paper-review-effnetv2
Scaling Local Self-Attention For Parameter Efficient Visual Backbones - http://arxiv.org/abs/2103.12731
Shortcut learning in deep neural networks - https://arxiv.org/abs/2004.07780
Russian DialoGPT - https://habr.com/ru/company/icl_services/blog/548244/
Multilingual datasets have major problems and need to be inspected before being used to train something - https://arxiv.org/abs/2103.12028
Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots - https://www.youtube.com/watch?v=goxCjGPQH7U&ab_channel=HybridRobotics
Can Vision Transformers Learn without Natural Images? https://arxiv.org/pdf/2103.13023.pdf
Towards Lifelong Learning of End-to-end ASR - http://arxiv.org/abs/2104.01616
Paper Review: Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains - https://andlukyane.com//blog/paper-review-furrycars
New Optimizer - https://github.com/facebookresearch/madgrad
Federated learning may be helpful in medicine - https://thegradient.pub/decentralized-ai-for-healthcare/
Branch Specialization - https://distill.pub/2020/circuits/branch-specialization/
Weight Banding - https://distill.pub/2020/circuits/weight-banding/
Embedded devices for ML acceleration update - https://habr.com/ru/company/recognitor/blog/551552/
GPU, FPGA, ASIC, TPU, VPU, IPU, DPU, NPU, RPU, NNP - https://habr.com/ru/post/455353/
Presenting the iGibson Challenge on Interactive and Social Navigation - https://ai.googleblog.com/2021/04/presenting-igibson-challenge-on.html
Time is Brain: AI helps cut down stroke diagnosis time in the Himalayan foothills - https://blog.qure.ai/notes/ai-cuts-down-stroke-diagnosis-time-himalayas
Simple S2S PyTorch Transformer Example with Greedy Decoding - https://colab.research.google.com/drive/1swXWW5sOLW8zSZBaQBYcGQkQ_Bje_bmI
The state of transformers in computer vision - https://habr.com/ru/company/recognitor/blog/553478/
Transformer S2S example - https://twitter.com/full_stack_dl/status/1349156930518859780
HDR+ with Bracketing on Pixel Phones - https://ai.googleblog.com/2021/04/hdr-with-bracketing-on-pixel-phones.html
Self-Organising Textures - https://distill.pub/2020/selforg/textures/
MaX-DeepLab: Dual-Path Transformers for End-to-End Panoptic Segmentation - https://ai.googleblog.com/2021/04/max-deeplab-dual-path-transformers-for.html
Monster Mash: A Sketch-Based Tool for Casual 3D Modeling and Animation - https://ai.googleblog.com/2021/04/monster-mash-sketch-based-tool-for.html
Fit More and Train Faster With ZeRO via DeepSpeed and FairScale - https://huggingface.co/blog/zero-deepspeed-fairscale

Datasets

AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs in the Wild - https://arxiv.org/abs/2103.13282
CUAD: A free $2million legal dataset! - https://arxiv.org/abs/2103.06268
800GB of cleaned, Common Crawl text - https://github.com/allenai/allennlp/discussions/5056
Facebook AI has built and open-sourced a new, unique data set called Casual Conversations, consisting of 45,186 videos (3,011 participants) of participants having nonscripted conversations - https://ai.facebook.com/blog/shedding-light-on-fairness-in-ai-with-a-new-data-set/
A collection of 3D object meshes - https://app.ignitionrobotics.org/liuyuanpal/fuel/collections/Google Scanned Objects

Code

Farewell to fsync(): 10× faster database tests with Docker - https://pythonspeed.com/articles/faster-db-tests/
All The Important Features and Changes in Python 3.10 - https://martinheinz.dev/blog/46
Six Secret Easter Eggs in GitHub - https://dev.to/github/six-secret-easter-eggs-in-github-2j17
Loading SQL data into Pandas without running out of memory - https://pythonspeed.com/articles/pandas-sql-chunking/
Python packaging guide - https://antonz.ru/packaging/
Oh Shit, Git!?! - https://ohshitgit.com/ru
Don’t leak your Docker image’s build secrets - https://pythonspeed.com/articles/docker-build-secrets/ (lol no mention of the easiest and obvious method - env variables)
Docker security best practices - https://sysdig.com/blog/dockerfile-best-practices/

Tech

Intel to invest $20bn in building plants in Arizona, followed by further plants in the USA and Europe, reflecting (partly) the changed geopolitical environment and increasing concern (partly military) at the concentration of key parts of the semiconductor industry in plants in Taiwan, within China’s sights
Microsoft is apparently looking at buying Discord, the platform for audio social networks, for $10bn.
EU proposes new rules to boost start-ups and catch up with the U.S. and China on tech - https://www.cnbc.com/2021/03/19/eu-start-up-nations-standard-aims-to-help-europe-catch-us-and-china-.html
The mess at Medium - https://www.platformer.news/p/-the-mess-at-medium
Average Smartphone NAND Flash Capacity Crossed 100GB in 2020 - https://www.counterpointresearch.com/average-smartphone-nand-flash-capacity-crossed-100gb-2020/
Trust in tech craters - https://www.axios.com/edelman-trust-barometer-tech-5787acea-8ef5-4d0b-9694-6e4f8eb006c4.html
Tesla Owners Take To Reddit Asking What Happens If ‘Full Self Driving’ Isn’t Real - https://jalopnik.com/tesla-owners-take-to-reddit-asking-what-happens-if-full-1846553907
People’s Expensive NFTs Keep Vanishing. This Is Why - https://www.vice.com/en/article/pkdj79/peoples-expensive-nfts-keep-vanishing-this-is-why
THE HANDSET INDUSTRY IS A FLAT CIRCLE - https://digitstodollars.com/2021/04/07/the-handset-industry-is-a-flat-circle/
Your legacy database is outgrowing itself - https://ikonicscale.com/your-legacy-database-is-outgrowing-itself
Amazon Delivery Drivers Forced to Sign ‘Biometric Consent’ Form or Lose Job - https://www.vice.com/en/article/dy8n3j/amazon-delivery-drivers-forced-to-sign-biometric-consent-form-or-lose-job
MOBILE IP: DON’T HATE THE PLAYER, HATE THE GAME - https://digitstodollars.com/2021/04/13/mobile-ip-dont-hate-the-player-hate-the-game/
YouTube removals - https://transparencyreport.google.com/youtube-policy/removals
Ultra-rich American idiot makes Boston dynamics dog piss in a cup - https://www.youtube.com/watch?v=tqsy9Wtr1qE&ab_channel=MichaelReeves
Bezos on values and employees - https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf
The state of surveiilance in Moscow - https://habr.com/ru/post/553448/
Nvidia server CPU - https://www.reuters.com/technology/nvidia-directly-challenge-intel-with-arm-based-grace-server-chip-2021-04-12/

Blogs

It doesn’t work - https://00f.net/2021/03/26/it-doesnt-work/
Your guide to AI March 2021 - https://newsletter.airstreet.com/issues/your-guide-to-ai-march-2021-481150
Machine Learning, Ethics, and Open Source Licensing - https://thegradient.pub/machine-learning-ethics-and-open-source-licensing/+
The social contract of open source - https://snarky.ca/the-social-contract-of-open-source/
Last Week in AI 113 - https://lastweekin.ai/p/113

Hardware

MLPerf Inference v1.0:

Cerebras launches new AI supercomputing processor with 2.6 trillion transistors - https://venturebeat.com/2021/04/20/cerebras-systems-launches-new-ai-supercomputing-processor-with-2-6-trillion-transistors/