2018 DS/ML digest 27

2018 DS/ML digest 27

Posted by snakers41 on October 15, 2018

Buy me a coffeeBuy me a coffee

Become a PatronBecome a Patron


  • !!! Annotated encoder decoder !!!:

    • PyTorch text processing utils;
    • A real working seq2seq NMT example with explanations and greedy search;
    • Works best for word level problems;
  • Building semantic code search;

  • Dynamic Meta-Embeddings for Improved Sentence Representations

    • FAIR, link;
    • Dynamic meta-embeddings - a simple yet effective method for the supervised learning of embedding ensembles - essentially embedding stacking;
    • Meta embeddings are usually created in a separate preprocessing step, rather than in a process that is dynamically adapted to the task;
    • Naive concatenation as a baseline;
    • Essentially learning a transition layer with some attention / skip-connections from the LSTM used as a sentence encoder;
    • Core architecture - bi-LSTM sentence encoder;
    • Their gains seem superficial;
  • Lyrics Segmentation: Textual Macrostructure Detection using Convolutions

    • Text self-similarity measures: string similarity + phonetic similarity (simphon) + lexico-structural similarity (n grams);
    • Applied CNNs and RNNs for song lyric structure prediction / genre prediction;
  • TPUs vs GPUs for Transformers (BERT):

    • BERT was done with 4 TPU pods (256 TPU chips) in 4 days;
    • “our model predicts that a GPU is 32% slower than a TPU for this specific scenario”;
    • We can expect to train BERT on 64 GPUs (the equivalent to 4 TPU pods) in 5 1/3 days or 8 1/2 days. On an 8 GPU machine for V100/RTX 2080 Tis with any software and any parallelization algorithm (PyTorch, TensorFlow) one can expect to train BERT in 42 days or 68 days. For a standard 4 GPU desktop with RTX 2080 Ti (much cheaper than other options), one can expect to replicate BERT in 99 days;
  • Chinese scientists use predict roots / stems separately for Russian / English NMT:

    • Link;
    • Predict the stem and suffix separately during decoding +1.98 BLEU on English to Russian translation;
    • Total number of different stems in a morphologically rich language is much less than the number of words;
    • Two-step approach for the decoder. In particular, stem is first generated at each decoding step, before suffix is predicted;
    • Distributed training;
    • Snowball stemmer;
    • Also use corpuses crawled from e-commerce websites;

Market / articles / libraries / datasets

Just cool papers / links

  1. Monocular depth estimation from video;
  2. Faster RefineNet for semseg;
  3. 3D object detection with RGB-D images;
  4. Someone from Google trains GANs on Imagenet with 128x128 - 512x512 (samples);
  5. Interesting idea " demographic information of authors is encoded in – and can be recovered from – the intermediate representations learned by text-based neural classifiers";
  6. Global network that enables you to send email anywhere in the sea;
  7. StandFords graph course;
  8. Deanon in the Internet https://vas3k.ru/blog/389/;
  9. Image forensics https://vas3k.ru/blog/390/;