Although I consider myself proficient in foreign languages, this is my first post on my website in English. Enjoy.
I personally like starting all my articles with a mysterious picture...but if you are reeding this you have have a hunch what this is...
0. How it started
This article will be a bit different from the rest, because I cannot publish the code and all the details for personal reasons.
Without further ado I woke up one day and found ssh server access in one of my Telegram chats. The machine that I ssh-ed into had some peculiar details:
- It was virtual machine inside of VM-ware container;
- It had Titan X GPU passed through from a host machine via kvm-qemu, which is kinda advanced stuff (for GPU benchmarks for deep learning go here);
- (Also it had a second similar GPU not passed through);
- It had ca. 1.5m of random flat photos there;
- Also I found this paper in my Telegram chat;
Well, what can you do with 1.5m photos (obviously scraped from websites I am kinda familiar with - Russian flat booking / purchase websites) and bleeding edge GPU? Of course train neural networks (or mine cryptocurrencies)! But you need annotation for this purpose and some clean purpose. The purpose may be kinda obvious:
- Wall carpet recognition (a running joke in Russia);
- Architectural style detection;
- Finding similar objects / flats;
- Finding objects like TVs, fridges, etc;
There was no annotated subset of these pictures. So I decided to have fun with them and see what I can squeeze from the dataset using just plain-vanilla unsupervised learning methods and ready pre-trained neural networks in couple of hours.
First of all, I searched the Net for a moment and found a list of amazing articles and results:
- Major neural network architecture comparison;
- Latest neural networks accuracy normalized by parameter count;
- Famous VGG-16 architecture;
- Neural network architecture history;
- Keras built-in models;
- Siamese neural networks;
Top models in last 5-10 years in imagenet competitions
So having no training dataset the best thing we can do is take a couple of existing big pre-trained neural networks and play with their layers. Let's begin.
1. Preliminary investigation
First of all, let's count all the pictures we have (I will omit some steps as the owner of the machine asked me not to show any real details).
currentDir = os.getcwd()images_path = '/mnt/dataset1/Images/'pic_path = '/mnt/dataset1/Images/Download/'pic_list = pd.read_csv(images_path+"downloadedImages.txt")pic_list.shape
We get (1530210, 1) ~ 1.5m pictures. Not bad. I will not go into great details about my analysis of where the pictures came from / their sizes / formats etc. It's enough to say that there are ca. 10-15 top Russian flat boards, pictures are usually dull pictures of Russian flats and are mostly ~HD quality pictures with mediocre lighting and jpg and png file-formats.
Let's create a subset of pictures for our purposes.
pic_list[pic_list['2_split']=='folder1'].raw.valuespic_array = pic_list[pic_list['2_split']=='folder1].raw.valuespic_array = np.random.permutation(pic_array)pic_array_shuf = pic_array[0:10000]pic_array_shuf.size
Because pictures are located on an external mounted volume, let's also copy them to our virtual drive with a simple list of commands:
from shutil import copyfilecopyfile(orig_path, test_path+test_split)for pic in log_progress(pic_array_shuf):orig_path = pic_path + piccopyfile(orig_path, test_path+pic.split('/'))%ls -ls $test_path | wc -l
Now we have 10,000 randomly selected pictures in our folder., Why 10,000? To make all the calculations within reasonable amount of time. This is only investigation after all.
2. Naive approach
So. We have 10,000 pictures and we do not know anything about them except for their name, path and size. Let's steal a keras documentation example from here and do the following.
Oh, I have not mentioned that to do that within reasonable time, you need to setup keras and your GPU properly. My dependency list looks small:
sudo passwdsudo apt-get install tmuxsudo pip3 install jupyter_contrib_nbextensionssudo pip3 install jupyter_nbextensions_configuratorsudo jupyter nbextensions_configurator enable --usersudo pip3 install numpysudo pip3 install matplotlibsudo pip3 install kerassudo pip3 install tensorflowsudo pip3 install sklearnsudo apt install glancescd ~/cd flat-nn/jupyter notebook --no-browser --port=8888 --ip=server-ip
But to run keras on GPU properly you also need to setup CUDA drivers and do some configs. I posted a compilation here on my telegram channel. Initially this config was taken from fast.ai forums. Key parts of this config include (this is very system specific, use with caution, replace pip with pip3 if necessary):
# install and configure theano
pip install theano
device = gpu
floatX = float32
root = /usr/local/cuda" > ~/.theanorc
# install and configure keras
pip install keras==1.2.2
}' > ~/.keras/keras.json
# install cudnn libraries
wget "http://platform.ai/files/cudnn.tgz" -O "cudnn.tgz"
tar -zxf cudnn.tgz
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/
So, having installed everything, let's start:
# Extract features from an arbitrary intermediate layer with VGG19
from keras.applications.vgg19 import VGG19
from keras.preprocessing import image
from keras.applications.vgg19 import preprocess_input
from keras.models import Model
import numpy as np
If you did everything properly, you should receive some variation of this message:
Using gpu device 0: GeForce GTX TITAN X (CNMeM is disabled, cuDNN not available)
Let's see what is inside of VGG-19 model
base_model = VGG19(weights='imagenet')
We can take the fc1 or fc2 and do some test with them. They are supposed to contain high-level abstract features, that would recognize shapes, small objects, corners, text, eyes, etc
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv4 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ fc1 (Dense) (None, 4096) 102764544 _________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 _________________________________________________________________ predictions (Dense) (None, 1000) 4097000 ================================================================= Total params: 143,667,240 Trainable params: 143,667,240 Non-trainable params: 0 _________________________________________________________________
base_model = VGG19(weights='imagenet')
model = Model(inputs=base_model.input, outputs=base_model.get_layer('fc2').output)
images_path = '/mnt/dataset1/Images/'
pic_path = '/amnt/dataset1/Images/Download/'
test_path = currentDir+'/flat_pics/'
def predict_batches(model, path, batch_size=8):
test_batches = get_batches(
imageSizeTuple = (224,224)
test_batches, preds = predict_batches(model, test_path, batch_size=128)
So we get a (10000, 4096) matrix filled with VGG-19 predictions (you can take imagenet model for this purpose). Ideally we should do the following
- Stack two neural networks together using keras functional API;
- Set all the layers except for the new layers in our new NNs as not-trainable;
- Train a siamese NN for a day - week and see the results;
...but we have no annotation, remember? So let's do the easiest thing possible, Let's use a sklearn version on affinity propagation algorithm to calculate distances between 4000+ features from VGG-19. This is really inefficient because it is not parallelized and it's not using the GPU. Of course we could modify this example and calculate everything in seconds, but let's just be lazy and wait for a couple of minutes instead of thinking too much =). Let's also fill the diagonal elements of the matrix with some random value (a picture looks like itself most, we do not need this).
from sklearn.cluster import AffinityPropagation
from sklearn import metrics
af = AffinityPropagation().fit(preds)
aff_matrix = af.affinity_matrix_
most_different = aff_matrix.argmin(axis=1)
most_similar = aff_matrix.argmax(axis=1)
3. Naive results
Let's see what our naive approach gives us! Do not forget that this was just exploration with any data annotation whatsoever. Also note that I did not do any proper picture size analysis and adjustment - I just fed everything to keras as-as.
# utility for easy plot generation
if type(ims) is np.ndarray:
ims = np.array(ims).astype(np.uint8)
if (ims.shape[-1] != 3):
ims = ims.transpose((0,2,3,1))
f = plt.figure(figsize=figsize)
for i in range(len(ims)):
sp = f.add_subplot(rows, len(ims)//rows, i+1)
if titles is not None:
plt.imshow(ims[i], interpolation=None if interp else 'none')
def plots_idx(idx, titles, path, filenames, figsize):
plots([image.load_img(path + filenames[i]) for i in idx], titles=titles, figsize = figsize, rows=1)
for pic in np.arange(9):
idx = [pic,most_different[pic],most_similar[pic]]
titles = ['pic','most_different', 'most_similar']
path = test_path
filenames = pic_filenames
plots_idx (idx,titles,path,filenames, figsize = (10,20))
See some random examples below
Apart from the awful soviet toilet, which pops up randomly, we can see that this naive approach:'
- Detects copies of the same picture easily;
- Can distinguish corners:
- Can distinguish outdoor photos vs. indoor photos;
- Can distinguish city landscapes and sea-view;
- Can distinguish flat floor maps;
Not bad for totally unrelated algorithm (imagenet) without any data processing and / or training on random images! Imagine what the results would be, if we trained our classifier
4. Let's go deeper
Let's use t-sne projection (or plain PCA) to project our 4,000 dimension data into 2-axis plane.
from sklearn.manifold import TSNE
tsne = TSNE(random_state=17)
X_tsne = tsne.fit_transfor
plt.scatter(X_tsne[:, 0], X_tsne[:, 1],
edgecolor='none', alpha=0.7, s=40,
plt.title('Flats VGG-19 last FCN layer. t-SNE projection')
We clearly can spot some clusters here:
What is remarkable, is that they clearly correspond to different picture types.
Exteriors of blocks of flats
Not bad for data exploration analysis, huh?