2019 DS/ML digest 07

2019 DS/ML digest 07

Posted by snakers41 on March 18, 2019

Buy me a coffeeBuy me a coffee

Normalization techniques other than batch norm:

Weight normalization (used in TCN, paper):

  • Decouples length of weight vectors from their direction;
  • Does not introduce any dependencies between the examples in a minibatch;
  • Can be applied successfully to recurrent models such as LSTMs;
  • Tested only on small datasets (CIFAR + VAES + DQN);

Instance norm (used in style transfer)

  • Proposed for style transfer;
  • Essentially is batch-norm for one image;
  • The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch;

Layer norm (used in Transformer, paper)

  • Designed especially for sequntial networks;
  • Computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case;
  • The mean and standard-deviation are calculated separately over the last certain number dimensions;
  • Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the affine option, Layer Normalization applies per-element scale and bias;

Articles / posts: