2019 DS/ML digest 18

2019 DS/ML digest 18

Posted by snakers41 on November 25, 2019

Random shit

  • Subscribe to Ben Evans. No srsly.
  • Quantum Supremacy Achieved is almost irresistible to print, but it will inevitably mislead the general public. Some explanations wtf;

Speech / audio

NLP

ML

CV

  • Why ML in medicine does not work:
  • How JPEG loses data;
  • Very funny over-engineering - Jetson to detect cats http://myplace.frontier.com/~r.bond/cats/cats.htm
  • Unified embedding for visual search at pinterest. All the tasks share a common base network until the embedding is generated, and then things split off into task-specific branches. Task branches are fully connected layers;
  • Sounds like ideas 3 and 5 can be applied to any CV task;
  • Once again stumbled upon this awesome paper by FAIR;
  • Dataset distillation paper and website:
  • Computing Receptive Fields of Convolutional Neural Networks;
  • The Visual Task Adaptation Benchmark;
  • Google releases pre-trained MobileNet3;
  • CNNs are biased towards texture:
    • If you limit the receptive field (therefore making your CNN rely only on texture), the performance does not drop drastically;
    • Stylized data actually increases performance at test time, even though the testing data is entirely unstylized;
    • 5-shot 5-way miniImageNet test accuracy versus pre-training data composition when using test-time augmentation:
  • Unsupervised Pre-Training of Image Features on Non-Curated Data:
    • Key pains - academic datasets are curated, contain ez clusters of data;
    • All revolves around modes of self-supervision - i.e. rotation, some closeness function, clustering;
    • Labels obtained via hierachical K-means + rotations;
    • Seems to beat vanilla approaches in miniImageNet;
    • They use a very old network - VGG though;

Competitions

  • Dockerization in competitions on DD as well?