2018 DS/ML digest 32

2018 DS/ML digest 32

Posted by snakers41 on December 19, 2018

Buy me a coffeeBuy me a coffee

Become a PatronBecome a Patron


NLP

  • A bit about GCNN;

  • PyText from Facebook:

    • TLDR - FastText meets PyTorch;
    • Very similar to AllenNLP in nature;
    • Will be useful if you can afford to write modules for their framework to solve 100 identical tasks (i.e. like Facebook with 200 languages);
    • In itself - seems to be too high maintenance to use;
  • VON MISES-FISHER LOSS FOR TRAINING SEQUENCE TO SEQUENCE MODELS WITH CONTINUOUS OUTPUTS

    • Link;
    • Softmax is used in all NMT models (hierarchical softmax is an alternative). It is the slowest part of all models;
    • Replace the softmax layer with a continuous embedding layer;
    • Novel probabilistic loss, and a training and inference procedure in which we generate a probability distribution over pre-trained word embeddings;
    • Train up to 2.5x faster, comparable accuracy;
    • Produce more meaningful errors than the softmax-based models;
    • Now BPE is de-facto SOTA approach;
    • Decoder of our model produces a continuous vector. The output word is then predicted by searching for the nearest neighbor of in the embedding space;
    • Probabilistic variant of cosine loss;

ML / DS Articles / posts

  • How google handles gender neutrality in its NMT - annotation, extra classification steps and “multi-language” translations;
  • Google’s Grasp2Vec:
    • It is common to assume that images can be compressed into a low-dimensional space, and that frames in a video can be predicted from previous frames;
    • the architecture shown below embeds the pre-grasp images and post-grasp images into a dense spatial feature map;
  • What is wrong with pandas:
    • “My rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset”;
    • Key problem - reliance on:
      • Having all of data in RAM all of the time;
      • Relying on numpy arrays internally;
      • (Fixed) slow i/o from disk;
      • (?) Does not parallelize well;
  • Why python is slow;
  • Libraries to explain feature importance with black box ML models. This seems more interesting;

This is insanity

News / entertainment pieces

Datasets:

  • New alternative to MNIST - Japanese cursive;
  • 20 hours of transcripts for 700 languages with texts and all. Bible reading … ;
  • Visual Commonsense Reasoning:
    • Paper;
    • 290k multiple choice questions;
    • 290k correct answers and rationales: one per question;
    • 110k images;
    • Scaffolded on top of 80 object categories from COCO;