2018 DS/ML digest 31

2018 DS/ML digest 31

Posted by snakers41 on December 9, 2018

Buy me a coffeeBuy me a coffee

Become a PatronBecome a Patron

Highlights of the week

  • PyTorch 1.0 released:
    • Deploy support;
    • Torch hub;
    • Overhaul of distributed data parallel;
    • No docs on running on actual mobile devices;
  • BERT illustrated. The article is amazing, but:
    • No real info from people, who actually used this;
    • ELMO is published using AllenNLP which is a no-go for normal people (non researchers);
    • Transformer trains 10x longer on the same task;
    • Ofc Google shares its models / code / pre-trained
    • Poor support for languages other than English;
  • Paining with GANs:
    • Neurons in GAN’s generators understand composision and correlations between objects;

Rule-based text parsing for Russian

Articles / posts

  • RU - information etiquette;
  • Amazing GAN drawer, code;
  • Important papers in 2018 short review;
  • Video compression by CNNs beats traditional codecs … except it does not - read the comments;
  • How Google builds its depth estimation networks for Pixel;
  • The state of tech presentation by Ben Evans;
  • Ben Evans Newsletter;
  • Google employees against project Dragonfly;
  • It is better to perform a minimal amount of computation on each individual training example (thus processing more of them) rather than performing extensive computation on a smaller amount of data;
  • TF based library for ranking;
  • Immigration to Australia;


  • Novel alternatives to Triplet Loss for contrastive embedding training

    • ArcFace:
      • Assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in an angular space and penalises the angles between the deep features and their corresponding weights in a multiplicative way;
    • Сontrastive center loss:
      • It learns a class center for each class;
      • It considers the intra-class compactness and inter-class separability simultaneously by penalizing the contrastive values between: (1) the distances of training samples to their corresponding class centers, and (2) the sum of the distances of training samples to their non- corresponding class centers
  • You do not need RNNs for sequence learning;

    • Model Code;
    • A simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets;
    • Common association between sequence modeling and recurrent networks should be reconsidered;
    • Temporal convolutional network (TCN):
      • Convolutions in the architecture are causal, meaning that there is no information “leakage” from future to past;
      • The architecture can take a sequence of any length and map it to an output sequence of the same length, just as with an RNN;
      • Very long effective history sizes possible;
      • TCN = 1D FCN + causal convolutions + skip connections + dilated convs;
    • Pros:
      • Faster and more memory efficient;
      • Less problems with gradients;
      • Longer memory;
    • Cons:
      • Larger memory usage than RNNs on inference;
      • Tweaking of receptive field required for each doamin required;
  • Trellis Networks for Sequence Modeling - paper

    • Temporal convolutional network with special structure, characterized by weight tying across depth and direct injection of the input into deep layers;
    • Code;
    • Weights are shared not only by all time steps but also by all network layers, tying them into a regular trellis pattern;
    • Input is injected into all network layers. That is, the input at a given time-step is provided not only to the first layer, but directly to all layers in the network;
    • Trellis networks generalize truncated recurrent networks (recur- rent networks with bounded memory horizon);
    • Bridge between recurrent and convolutional architectures;
    • TrellisNet is a special kind of temporal convolutional network;

TDS Picks

  • ResNet over ResNet (ROR) explained:
    • Shortcut connection across a group of Residual Blocks;
  • Writing a starter Jupyter extension;
  • ETH transaction clustering;