Google Open Images 2nd place solution - this is insanity:
512 GPUs, Japanese guys using Chainer;
Co-occurrence loss - for bounding box proposals that are spatially close to the ground truth boxes with a subject class annotation, co-occurrence loss ignores all learning signals for classifying the part classes of the subject class;
Train models exclusively on rare classes and ensemble them with the rest of the models. We find this technique beneficial especially for the first 250 rarest classes, sorted by their occurrence count;
Feature Pyramid Network (FPN) with SE-ResNeXt-101 and SENet-154;
Equivalent to supervised approaches trained with nearly 100,000 reference translations;
Steps:
Learn word embeddings (vectorial representations of words) for every word in each language;
Learn a rotation of the word embeddings in one language to match the word embeddings in the other language, using a combination of various new and old techniques, such as adversarial training;
After the rotation, word translation is performed via nearest neighbor search;
Equipped with a language model and the word-by-word initialization, we can now build an early version of a translation system;
Treat these system translations (original sentence in Urdu, translation in English) as ground truth data to train an MT system in the opposite direction;