2019 DS/ML digest 05

2019 DS/ML digest 05

Posted by snakers41 on February 18, 2019

Week highlights

  • New variation of Adam?
    • Website;
    • Code;
    • Eliminate the generalization gap between adaptive methods and SGD;
    • TL;DR: A Faster And Better Optimizer with Highly Robust Performance;
    • Dynamic bound on learning rates. Inspired by gradient clipping;
    • Not very sensitive to the hyperparameters, especially compared with Sgd(M);
    • Tested on MNIST, CIFAR, Penn Treebank - no serious datasets;
    • Dynamically transforms from Adam to SGD as the training step becomes larger;
    • Meh we tested it - works the same;

Dependency parsing and POS tagging in Russian

Less popular set of NLP tasks.

Popular tools reviewed

Only morphology:
(0) Well known pymorphy2 package;

Only POS tags and morphology:
(0) https://github.com/IlyaGusev/rnnmorph (easy to use);
(1) https://github.com/nlpub/pymystem3 (easy to use);

Full dependency parsing
(0) Russian spacy plugin:


  • Pay Less Attention with Lightweight and Dynamic Convolutions:
    • Link https://arxiv.org/abs/1901.10430;
    • Essentially they say that complex key-value attention can be replaced with approach inspired by depthwise separable convolutions;
    • Wait till someone builds a transformer with these!;
    • A very lightweight convolution can perform competitively to the best reported self-attention results;
    • Self-attention is computationally very challenging due to the quadratic complexity in the input length. In practice long sequences require the introduction of hierarchies;
  • Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization:
  • https://thegradient.pub/openai-shouldnt-release-their-full-language-model/
  • Arguments:

Libraries / code / articles

  • Google open sources GPipe for HUGE networks;

  • ML + wind turbines - model recommends how to make optimal hourly delivery commitments to the power grid a full day in advance - +20% value is a lot;

  • Self fulfilling prophecies in DS;

  • https://thispersondoesnotexist.com/

  • How Google fights fake news;

  • Google’s take on quantum computing;

    • 72 qubits. 1m required to build a real computer;
    • Number of physical wires connected from room temperature to the qubits inside the cryostat and the finite cooling power of the cryostat represent a significant constraint;
  • AGS - a module similar to CTC loss in its role in speech-to-text applications;

  • Sooo many links about ethics in AI;

  • Uber eats recommendations and building quiery understanding language;

  • Whale competition approaches;

  • Amazing blog about ML in medicine;


Datasets / competitions