Sound
- Unsupervised pitch estimation. Key idea - you need relative pitch, and you can use synthetic data
CV
- OCR in Yandex. A legit article
- 87.4% top-1 accuracy on Imagenet:
- 3.5B weakly labeled Instagram images
- (i) train an EfficientNet model on labeled ImageNet images
- (ii) use it as a teacher to generate pseudo labels on 300M unlabeled images
- (iii) train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images
- (iv) iterate this process by putting back the student as the teacher
- During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as good as possible
- 480M params
ML in general
Top trends from ICLR
Google uses RL in recommendations
Open Images 2019 solutions
Why you should not get a PhD
AI circus eof 2019 update
Train pose estimation on radio vs images
Differentiable Convex Optimization Layers
Rigging the Lottery: Making All Tickets Winners:
- Train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy
- Method updates the topology of the network during training by using parameter magnitudes and infrequent gradient calculations
- Deficiencies of current methods
- You have to train a large network first, so you are limited by it
- It is inefficient - a lot of zeros
- Does not change the FLOPs required to execute the model during training
- Lottery Ticket Hypothesis - if we can find a sparse neural network with iterative pruning, then we can train that sparse network from scratch, to the same level of accuracy, by starting from the original initial conditions
- The Rigged Lottery method:
- Memory efficient / computationally efficient / accurate
- Infrequently using instantaneous gradient information to inform a re-wiring of the network
Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition
- IdleBlock, which naturally prunes connections within the block
- Architecture = the design of a normal block and a reduction block
- ResNet repeats a Bottleneck block, ShuffleNet repeats a ShuffleBlock, MobileNet v2/v3 and EfficientNet monotonically repeats and Inverted Residual Block (MBBlock), NASNet repeats a Normal Cell, and FBNet repeats a variant of MBBlock with different hyper-parameters
- In the Idle design, a subspace of the input is not transformed
- Given an input tensor x with C channels, the idle factor α ∈ (0, 1) (prunning) tensor will be sliced to two branches:
- Active branch x1 with C · (1 − α) channels
- Idle branch with C · α channels
- Output tensor contains C · α channels copied directly
- Key distinction of mixed composition is the enhanced receptive field of the stacked output
- The theoretical computation cost of one MBBlock is roughly equal to cost of two IdleBlocks
Datasets
- Violence detection dataset
- Objects in the wild - 365 categories, 600k images
- Open Images is now at v5, 600 categories, millions of images, a lot of annotation types
- Groq [Announces]|(https://groq.com/groq-announces-worlds-first-architecture-capable-of-petaops-on-a-single-chip/) World’s First Architecture Capable of 10^15 Operations per Second on a Single Chip
NLP
- A visual guide to BERT;
- New transformers released by HuggingFace (ALBERT / CamemBERT / DistilRoBERTa / GPT-2 XL);