2018 DS/ML digest 26

2018 DS/ML digest 26

Posted by snakers41 on October 15, 2018

Buy me a coffeeBuy me a coffee

Become a PatronBecome a Patron

Yet another list of GPU benchmarks:

  • One, two - beware - building a sinlge GPU system with these does not make sense;

NLP / NLP Papers

While the relations word2vec captured had an intuitive and almost magical quality to them, later studies showed that there is nothing inherently special about word2vec: Word embeddings can also be learned via matrix factorization (Pennington et al, 2014; Levy & Goldberg, 2014)and with proper tuning, classic matrix factorization approaches like SVD and LSA achieve similar results (Levy et al., 2015)

Semantic sentence embeddings for paraphrasing and text summarization

  • Link
  • Encoder decoder model which is trained on sentence paraphrase pairs sourced from CV datasets;
  • All the datasets are in English ofc;
  • On a small benchmark dataset this performs better than off-the-shelf methods;

Semi-Supervised Sequence Modeling with Cross-View Training

  • Link;
  • On labeled examples standard supervised learning is used;
  • On unlabeled examples - CVT teaches auxiliary prediction modules that see restricted views of the input (only part of a sentence) to match the predictions of the full model seeing the whole input;

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  • Link;
  • Bidirectional Encoder Representations from Transformers;
  • Essentially this is: Transformer + bi-directional + language modelling = SOTA;
  • Claimed link (goo.gl/language/bert) not yet active;
  • BERT Transformer uses bidirectional self-attention, while the GPT Transformer uses constrained self-attention where every token can only attend to context to its left;
  • All in all looks overly complicated for down-to-earth applications;
  • Masking pre-training schemes are cool, but also overly complicated;

Zero-Shot Style Transfer in Text Using Recurrent Neural Networks

  • Link;
  • Create paraphrases which are written in the style of another existing text;
  • Use thirty-two stylistically distinct versions of the Bible (Old and New Testaments);
  • In zero-shot translation, the system must translate from one language to another even though it has never seen a translation between the particular language pair;
  • Multi-layer recurrent neural network encoder and multi-layer recurrent network with attention for decoding;
  • Essentially a seq2seq model for paraphrasing. No details about actual forward / backward pass mechanics;

New super cool competitions:

  • Astronomical dataset competition - looks very cool, but as a competition it is of little interest. Dataset is very cool;
  • ISS RFID tracking challenge - very challenging and interesting;
  • Human protein multi-class - huge full-rez original dataset;

Jobs / market

  • Head of DS in Ostrovok job;
  • Yet another toxic BS from large companies;

Articles / papers / libraries:

  • NVtop - htop for GPU;
  • Running back-prop on mobile devices;
  • Google + DL + cancer - the real of using DL in assiting diagnostics “Algorithm-assisted pathologists demonstrated higher accuracy than either the algorithm or the pathologist alone”;
  • NMT (question reformulation) + BiDAF + some classifier = better QA system;
  • Google confirms building search engine for China;
  • CNN benchmark for mobile phones. CNNs behind it. Looks really reasonable;
  • Building a book semantic search engine;