Buy me a coffee
- New universal sentence transformer by Google;
- Evolved transformer;
- New C++ text tokenizer;
- “More data & compute = SOTA” is NOT research news.
- XLNet vs Bert;
- BERT embeddings use case - plain recommendation service;
- Interpreting BERT embedding as a tree - cool new idea?
- SincNet - yet another learnable frontend for ASR with code + explanation video;
- Using generated speech as annotation in a Tacotron-like network;
- Separable convolutions + BPE for STT;
- Screw CV - a very cool ontology project to detect, classify and label SKUs to screws - cool semseg DICE metric extension;
- Really cool - semi-supervised approach beats just supervised;
- Tesla fires 10% of self-driving car team;
- CNN encoded in … glass?
- New workhorse light-weight network - MixNet?
- Satellite road mapping hits the mainstream;
- Greedy layer-wise training can scale to Imagenet;
- Tesla’s presentation on fleet learning:
- Shared architectures
- Oversampling across tasks
- Sparse supervision
- Fixing the train-test resolution discrepancy. Cool old ideas, now from FAIR:
- “that increasing the size of the crops used at test time compensates for randomly sampling the RoCs at training time”;
- Train in a weak semi-supervised fashion on public available data;
- Use smaller images when training - train x2 faster;
- Tune a bit (last N layers and batch-norm) to the test resolution - gives 1-2pp;
- RandomResizedCrop adds 3-5pp versus RandomCrop when trainin on 1/2 of test resolution;
ML in general
Code / python