2019 DS/ML digest 13

2019 DS/ML digest 13

Posted by snakers41 on July 29, 2019

Buy me a coffeeBuy me a coffee


  • New universal sentence transformer by Google;
  • Evolved transformer;
  • New C++ text tokenizer;
  • “More data & compute = SOTA” is NOT research news.
  • XLNet vs Bert;
  • BERT embeddings use case - plain recommendation service;
  • Interpreting BERT embedding as a tree - cool new idea?


  • SincNet - yet another learnable frontend for ASR with code + explanation video;
  • Using generated speech as annotation in a Tacotron-like network;
  • Separable convolutions + BPE for STT;


  • Screw CV - a very cool ontology project to detect, classify and label SKUs to screws - cool semseg DICE metric extension;
  • Really cool - semi-supervised approach beats just supervised;
  • Tesla fires 10% of self-driving car team;
  • CNN encoded in … glass?
  • New workhorse light-weight network - MixNet?
  • Satellite road mapping hits the mainstream;
  • Greedy layer-wise training can scale to Imagenet;
  • Tesla’s presentation on fleet learning:
    • Shared architectures
    • Oversampling
    • Oversampling across tasks
    • Sparse supervision
  • Fixing the train-test resolution discrepancy. Cool old ideas, now from FAIR:
    • “that increasing the size of the crops used at test time compensates for randomly sampling the RoCs at training time”;
    • Train in a weak semi-supervised fashion on public available data;
    • Use smaller images when training - train x2 faster;
    • Tune a bit (last N layers and batch-norm) to the test resolution - gives 1-2pp;
    • RandomResizedCrop adds 3-5pp versus RandomCrop when trainin on 1/2 of test resolution;

ML in general

Code / python