NLP
A bit about GCNN;
PyText from Facebook:
- TLDR - FastText meets PyTorch;
- Very similar to AllenNLP in nature;
- Will be useful if you can afford to write modules for their framework to solve 100 identical tasks (i.e. like Facebook with 200 languages);
- In itself - seems to be too high maintenance to use;
VON MISES-FISHER LOSS FOR TRAINING SEQUENCE TO SEQUENCE MODELS WITH CONTINUOUS OUTPUTS
- Link;
- Softmax is used in all NMT models (hierarchical softmax is an alternative). It is the slowest part of all models;
- Replace the softmax layer with a continuous embedding layer;
- Novel probabilistic loss, and a training and inference procedure in which we generate a probability distribution over pre-trained word embeddings;
- Train up to 2.5x faster, comparable accuracy;
- Produce more meaningful errors than the softmax-based models;
- Now BPE is de-facto SOTA approach;
- Decoder of our model produces a continuous vector. The output word is then predicted by searching for the nearest neighbor of in the embedding space;
- Probabilistic variant of cosine loss;
ML / DS Articles / posts
- How google handles gender neutrality in its NMT - annotation, extra classification steps and “multi-language” translations;
- Google’s Grasp2Vec:
- It is common to assume that images can be compressed into a low-dimensional space, and that frames in a video can be predicted from previous frames;
- the architecture shown below embeds the pre-grasp images and post-grasp images into a dense spatial feature map;
- OpenAI tackles protein folding problem;
- The future of ML in medicine - DL assisted decisions;
- Spatial transformer networks https://arxiv.org/pdf/1506.02025.pdf - learnable affine transformations;
- NIPS statistics;
https://github.com/SimonKohl/probabilistic_unet - Some intro to Vaex;
- Plain intro to Word2Vec;
- What is wrong with pandas:
- “My rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset”;
- Key problem - reliance on:
- Having all of data in RAM all of the time;
- Relying on numpy arrays internally;
- (Fixed) slow i/o from disk;
- (?) Does not parallelize well;
- Why python is slow;
- Libraries to explain feature importance with black box ML models. This seems more interesting;
This is insanity
News / entertainment pieces
- Recent interview with Hinton;
- Google vs FAIR;
- Looming AI apocalypse;
- Ben Evans’ Newsletter;
Datasets:
- New alternative to MNIST - Japanese cursive;
- 20 hours of transcripts for 700 languages with texts and all. Bible reading … ;
- Visual Commonsense Reasoning:
- Paper;
- 290k multiple choice questions;
- 290k correct answers and rationales: one per question;
- 110k images;
- Scaffolded on top of 80 object categories from COCO;