Week highlights
- New variation of Adam?
- Website;
- Code;
Eliminate the generalization gap between adaptive methods and SGD
;TL;DR: A Faster And Better Optimizer with Highly Robust Performance
;- Dynamic bound on learning rates. Inspired by gradient clipping;
- Not very sensitive to the hyperparameters, especially compared with Sgd(M);
- Tested on MNIST, CIFAR, Penn Treebank - no serious datasets;
- Dynamically transforms from Adam to SGD as the training step becomes larger;
- Meh we tested it - works the same;
Dependency parsing and POS tagging in Russian
Less popular set of NLP tasks.
Popular tools reviewed
https://habr.com/ru/company/sberbank/blog/418701/
Only morphology:
(0) Well known pymorphy2
package;
Only POS tags and morphology:
(0) https://github.com/IlyaGusev/rnnmorph (easy to use);
(1) https://github.com/nlpub/pymystem3 (easy to use);
Full dependency parsing
(0) Russian spacy plugin:
- https://github.com/buriy/spacy-ru - installation
- https://github.com/buriy/spacy-ru/blob/master/examples/POS_and_syntax.ipynb - usage with examples
(1) Malt parser based solution (drawback - no examples) - https://github.com/oxaoo/mp4ru
(2) Google’s syntaxnet - https://github.com/tensorflow/models/tree/master/research/syntaxnet
NLP
- Pay Less Attention with Lightweight and Dynamic Convolutions:
- Link https://arxiv.org/abs/1901.10430;
- Essentially they say that complex key-value attention can be replaced with approach inspired by depthwise separable convolutions;
- Wait till someone builds a transformer with these!;
- A very lightweight convolution can perform competitively to the best reported self-attention results;
- Self-attention is computationally very challenging due to the quadratic complexity in the input length. In practice long sequences require the introduction of hierarchies;
- Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization:
- https://thegradient.pub/openai-shouldnt-release-their-full-language-model/
- Arguments:
Libraries / code / articles
- Finally a proper distributed training PyTorch example;
- Amazon wrapper by fast.ai https://www.fast.ai/2019/02/15/fastec2/ - why?
- FAIR wav2letter open-sourced C++ implementation:
Google open sources GPipe for HUGE networks;
ML + wind turbines -
model recommends how to make optimal hourly delivery commitments to the power grid a full day in advance
- +20% value is a lot;Self fulfilling prophecies in DS;
How Google fights fake news;
Google’s take on quantum computing;
- 72 qubits. 1m required to build a real computer;
Number of physical wires connected from room temperature to the qubits inside the cryostat and the finite cooling power of the cryostat represent a significant constraint
;
AGS - a module similar to CTC loss in its role in speech-to-text applications;
Sooo many links about ethics in AI;
Uber eats recommendations and building quiery understanding language;
Whale competition approaches;
Amazing blog about ML in medicine;
Papers
- Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes https://arxiv.org/abs/1902.06855
- A Generalized Framework for Population Based Training https://arxiv.org/abs/1902.01894
Datasets / competitions
- Soon http://bigearth.net;
- Soon ML track of Unearthed.ai will open;
- GQA: a new dataset for compositional question answering over real-world images https://arxiv.org/abs/1902.09506;
- 500k x-ray dataset:
“…the ChestXray14 dataset (older one), as it exists now, is not fit for training medical AI systems to do diagnostic work.”