2019 DS/ML digest 20

2019 DS/ML digest 20

Posted by snakers41 on December 23, 2019


  • Unsupervised pitch estimation. Key idea - you need relative pitch, and you can use synthetic data


  • OCR in Yandex. A legit article
  • 87.4% top-1 accuracy on Imagenet:
    • 3.5B weakly labeled Instagram images
    • (i) train an EfficientNet model on labeled ImageNet images
    • (ii) use it as a teacher to generate pseudo labels on 300M unlabeled images
    • (iii) train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images
    • (iv) iterate this process by putting back the student as the teacher
    • During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as good as possible
    • 480M params

ML in general

  • Pixel phones now can make long exposure night sky photos

  • Top trends from ICLR

  • Google uses RL in recommendations

  • Open Images 2019 solutions

  • Why you should not get a PhD

  • AI circus eof 2019 update

  • Train pose estimation on radio vs images

  • Differentiable Convex Optimization Layers

  • Rigging the Lottery: Making All Tickets Winners:

    • Train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy
    • Method updates the topology of the network during training by using parameter magnitudes and infrequent gradient calculations
    • Deficiencies of current methods
      • You have to train a large network first, so you are limited by it
      • It is inefficient - a lot of zeros
    • Does not change the FLOPs required to execute the model during training
    • Lottery Ticket Hypothesis - if we can find a sparse neural network with iterative pruning, then we can train that sparse network from scratch, to the same level of accuracy, by starting from the original initial conditions
    • The Rigged Lottery method:
      • Memory efficient / computationally efficient / accurate
      • Infrequently using instantaneous gradient information to inform a re-wiring of the network
  • Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition

    • IdleBlock, which naturally prunes connections within the block
    • Architecture = the design of a normal block and a reduction block
    • ResNet repeats a Bottleneck block, ShuffleNet repeats a ShuffleBlock, MobileNet v2/v3 and EfficientNet monotonically repeats and Inverted Residual Block (MBBlock), NASNet repeats a Normal Cell, and FBNet repeats a variant of MBBlock with different hyper-parameters
    • In the Idle design, a subspace of the input is not transformed
    • Given an input tensor x with C channels, the idle factor α ∈ (0, 1) (prunning) tensor will be sliced to two branches:
      • Active branch x1 with C · (1 − α) channels
      • Idle branch with C · α channels
      • Output tensor contains C · α channels copied directly
    • Key distinction of mixed composition is the enhanced receptive field of the stacked output
    • The theoretical computation cost of one MBBlock is roughly equal to cost of two IdleBlocks



  • A visual guide to BERT;
  • New transformers released by HuggingFace (ALBERT / CamemBERT / DistilRoBERTa / GPT-2 XL);