2019 DS/ML digest 08

2019 DS/ML digest 08

Posted by snakers41 on April 9, 2019

Buy me a coffeeBuy me a coffee

Papers / highlights:

  • Approaches to eliminate bugs in ML models:
    adversarial testing, robust learning, and formal verification - the ideas are cool, but these do not seem to really practical in real life. Back-propagating some error boundaries seems interesting;

NLP

Smart reply

Google’s smart reply works well … but it is kind of useless. It is interesting … that it uses EmbeddingBag as well (lol):

  • How smart reply by Google works - looks just like seq2seq with post-processing and filtering. Most interesting part - canonical response set generation;
  • Google then substitutes seq2seq network with a ngram based network;

Free sentiment annotation

  • Cools idea - use emojis in tweets as annotation. As for CVAEs - this seems kind of academic, but cool as well. Just a classifier trained on a corpus of texts with emojis will be cool;
  • Separate attention leyar per each emoji allows easier visualizations;

Graph networks

FAIR released a toolkit to learn embeddings from graphs.

Transformer from FAIR

The coolest thing - now they go on subword level.

1m breast cancer images

  • New dataset;
  • By the comments from people from industry this dataset is even properly assembled and useful;

Posts / articles