What did the bird say? Part 1 - The beginning

Did you hear the word of the bird? No seriously

Posted by snakers41 on July 16, 2017

As usual - let me grab your attention with a nice chart without explaining what it is. Use your imagination to guess!

Article list

What did the bird say? Bird voice recognition. Part 1 - the beginning

What did the bird say? Bird voice recognition. Part 2 - taxonomy

What did the bird say? Bird voice recognition. Part 3 - Listen to the birds

What did the bird say? Bird voice recognition. Part 4 - Dataset choice, data download and pre-processing, visualization and analysis

What did the bird say? Bird voice recognition. Part 5 - Data pre-processing for CNNs

What did the bird say? Bird voice recognition. Part 6 - Neural network MVP

What  did the bird say? Bird voice recognition. Part 7 - full dataset preprocessing (169GB)

What  did the bird say? Bird voice recognition. Part 8 - fast Squeeze-net on 800k images - 500s per epoch (10-20x speed up)

0. What the hell is this article about? Birds? Sounds? Neural networks? Data Science?

All of these categories, be patient. Let me give you some background story fist.

Three-four years ago I found this. It is a giant board with bird songs visualized as spectrograms "using machine learning". It is an ideal example of Google project - beautiful, decently looking, nice, expensive to make and utterly useless and lacking any connection to reality.  At that moment I was impressed by that project, because I did not know anything about Data Science and machine learning and really thought that to produce such visualizations you really needed some "black magic".

Several years later after having started this website and our telegram channel and after our neural chicken coop  project I saw that page and was a bit more skeptical. Roughly at the same time I stumbled upon this video on Youtube (there are literally dozens of videos of related to different taxonomic units).

I binge-watched several dozens of these videos. Then I remembered this scene from the renown Silicon Valley TV-series. 

And then I just happened to remember reading the following articles / news items / blog posts:

  • How Silicon Valley not hotdog app was made using react native, tensorflow and squeeze nets;
  • Squeeze net in keras, paper;
  • Another (sic! bird-related) project using raspberry pi and squeeze nets;
  • My brief remark on Tensorflow for Android by Google (Apple also launched something like this);

Do you get the connections =) ?

Well you do not have to be a genius to see the following patterns / projections:

  1. Calculations follow cyclical patterns. When some kind of calculation can be moved to end-user devices (PCs, notebooks, smart-phones) with some benefit, eventually it does;
  2. Squeeze net enables us to run a the prediction part powerful neural network architecture on a mobile device without significant limitation and weights take just 5-10 Mbs of memory;
  3. As of ~ July 2017 nobody did an app / algorithm for bird species recognition by its song;

1. Action plan

So it's decided - let's make an app that will recognize a bird by listening to its song. Sounds cool, right? It takes a lof of time to describe all of this, but actually when you know all the inputs the plan is born in your head in literally seconds, which happened to me. And I got really inspired.

The project can be roughly separated into the following chunks:

  1. Find bird songs, download them, do some basic statistical analysis;
  2. Analyze them, choose a representative  subsample for proof of concept;
  3. Learn how to extract features from sound;
  4. Run a lot of experiments with plain vanilla neural networks to see if this project is viable;
  5. If it is, then migrate the NN to squeeze net architecture to reduce weight matrix size;
  6. Build a really react native app, that will listen to the birds and tell you which bird this is;

Sound  somewhat easy. But there are a lot of tricky parts, especially with collecting data, sampling data and NN architecture.

2. Without further ado, let's jump in?

After a bit of research I found this spectacular website with ca. 350k bird voice recordings of ca. 10k bird species. For reference in the Animal kingdom (taxonomic term) there are ca. 30k animals (birds are also animals).

If you did not study biology well in school - this is your last chance to catch up. Here you can find the interactive version.

This website also features a simple but powerful API. So - let's start.

Let's include the libraries that we will most likely need (I am lazy).

from __future__ import print_function
import os.path
from collections import defaultdict
import string
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
from sklearn.feature_extraction.text import CountVectorizer
import wordcloud
%matplotlib inline
import time

As usual, let's use this utility for showing progress:

# https://github.com/alexanderkuk/log-progress
# Progress indicator utilitty
def log_progress(sequence, every=None, size=None, name='Items'):
    from ipywidgets import IntProgress, HTML, VBox
    from IPython.display import display
    is_iterator = False
    if size is None:
            size = len(sequence)
        except TypeError:
            is_iterator = True
    if size is not None:
        if every is None:
            if size <= 200:
                every = 1
                every = int(size / 200)     # every 0.5%
        assert every is not None, 'sequence is iterator, set every'
    if is_iterator:
        progress = IntProgress(min=0, max=1, value=1)
        progress.bar_style = 'info'
        progress = IntProgress(min=0, max=size, value=0)
    label = HTML()
    box = VBox(children=[label, progress])
    index = 0
        for index, record in enumerate(sequence, 1):
            if index == 1 or index % every == 0:
                if is_iterator:
                    label.value = '{name}: {index} / ?'.format(
                    progress.value = index
                    label.value = u'{name}: {index} / {size}'.format(
            yield record
        progress.bar_style = 'danger'
        progress.bar_style = 'success'
        progress.value = index
        label.value = "{name}: {index}".format(
            index=str(index or '?')

Let's store some important variables and let's write some simple helper functions

api_endpoint = 'http://www.xeno-canto.org/api/2/recordings'
area_list = ['africa', 'america', 'asia', 'australia', 'europe']

def api_query(query=None,area=None,country=None,page=None):
    if((page is None) or (page == 0) or (page == '')):
        page = 1
    if ((query is None) or (query == '')):
        if ((area is None) or (area == '')):
            if ((country is None) or (country == '')):
                return None
                return api_endpoint+'?query=cnt:'+country+'&page='+str(page)
            return api_endpoint+'?query=area:'+area+'&page='+str(page)
        return api_endpoint+'?query='+query+'&page='+str(page)  

As usual, I am not showing boring intermediate steps, you will able to find all the code in ipynb format at the end of the article. I am focusing on important milestones.

This  code gives us an idea how many pages we will need to collect

area_df = pd.DataFrame(columns = ['area','numRecordings','numSpecies','numPages'])
for area in log_progress(area_list):
        result = requests.get(api_query(area=area))
        temp_dict = {'area':area,
                     'numRecordings': result.json()['numRecordings'],
                     'numSpecies': result.json()['numSpecies'],
                     'numPages': result.json()['numPages']}
        area_df = area_df.append(temp_dict, ignore_index=True)
    except Exception as ex:
        logger.error('Failed to upload to ftp: '+ str(ex))

This yields this. Not bad, right?

The code required to collect all the data is shockingly short! Note  that we use iter_df  just in case some of our requests get lost.

area_list = ['africa', 'america', 'asia', 'australia', 'europe']
response_cols = ['cnt',
iter_df = pd.DataFrame(columns=['area','page','processed'])
for index, row in area_df.iterrows():
    iter_df_append = pd.DataFrame(columns=['area','page','processed'])
    iter_df_append.page = np.arange(1,row['numPages']+1,1)
    iter_df_append.processed = 0
    iter_df_append.area = row['area']
    iter_df = iter_df.append(iter_df_append,ignore_index =True)
result_df = pd.DataFrame(columns=[response_cols])

from fake_useragent import UserAgentua = UserAgent()headers = ua.chromeheaders = {'User-Agent': headers}response_cols = ['cnt',                   'date',                   'en',                   'file',                   'gen',                   'id',                   'lat',                   'lic',                   'lng',                   'loc',                   'q',                   'rec',                   'sp',                   'ssp',                   'time',                   'type',                   'url',                    'area',                    'page']result_df = pd.DataFrame(columns=[response_cols])idx = np.arange(0,738)for num in log_progress(idx):    try:        query_url = api_query(area=iter_df.iloc[num].area, page=int(iter_df.iloc[num].page))        result = requests.get(query_url, headers=headers)            temp_df = pd.DataFrame(result.json()['recordings'])        temp_df['area'] = iter_df.iloc[num].area        temp_df['page'] = int(iter_df.iloc[num].page)               result_df = result_df.append(temp_df, ignore_index=True)               iter_df.set_value(num, 'processed', 1)        time.sleep(1)                 # Testing break         # if (num==1):        #    break    except Exception as ex:        print('Script failed for num: {}\nError type: {}\n'.format(str(num), str(ex)))


result_df.shape gives us (367577, 19).

3. Let's do some basic analysis on the data we have!

I will not bore you withe the details too much here (you will find them in the  ipynb file), but I will point out a few things we have to understand prior to jumping into neural networks and stuff.

We should:

  • Understand our data (who collects it, why, when, using which form, etc) - this out of scope, but take for granted that I did the research;
  • What are the basic distributions of our variables?
  • Which countries / species / call_types are the most popular?
  • Can we ensure that our classes are balanced (i.e. we do not have 9000 songs for one bird type and 10 for another in our final dataset)?
  • Listen to the bird songs (we will do it, I promise!);

Essentially in non-technical datasets, it's better to start with basic pivot tables. Yes. you heard it right. Good old pivot tables (you can do it in Excel, or in python).

This piece of code 

table = pd.pivot_table(df,
    margins = True,
pd.set_option('display.max_rows', len(table))
table = table.sort_values(by=('id','All'),ascending=False)

yields this for example

This is a bit boring and underwhelming but you need to do some basic analysis before proceeding to any conclusions. The same can be done for species, bird song types, countries, etc

To finish the article I will just present a couple of info-graphics that are interesting:

How many songs we have per bird species:

How many songs we have per bird species. Log10 scale:

It's easy to see that average is around ca. 10^1.5, which is ca. 30-40.

Most popular words used to describe bird calls (code is in the notebook):

And finally bird songs by country (we also have lat and long, but I was lazy to draw the map - I will explain later why).

This was done using this tool.

4. So what?

What? Can't we just download all the songs, load them into VGG-16 and that's it?

Not so fast. Above you could see that we have ~30-40 bird songs per species on average, which is not really enough. And we want to build a classifier that is actually usable in real world conditions, not in some kind of walled garden. So we need at least hundreds of songs per class (in a balanced dataset!) for neural networks to run properly.

So the obvious idea is to use taxonomic tree data to predict not bird species, but for example bird genus, which may be easies (it may be scientifically trivial, but we are talking about proof of concept stage now).

After some research, I found that the biggest resource on this field is ITIS and it boasts direct database download. So, we will need to be a bit technical about it.


As usual - to view the cells properly, I highly recommend using collapsible cells plug-in for Jupiter notebook.