What did the bird say? Bird voice recognition. Part 3 - Listen to the birds

And a bit more of taxonomic analysis

Posted by snakers41 on July 22, 2017
Article list

What did the bird say? Bird voice recognition. Part 1 - the beginning

What did the bird say? Bird voice recognition. Part 2 - taxonomy

What did the bird say? Bird voice recognition. Part 3 - Listen to the birds

What did the bird say? Bird voice recognition. Part 4 - Dataset choice, data download and pre-processing, visualization and analysis

What did the bird say? Bird voice recognition. Part 5 - Data pre-processing for CNNs

What did the bird say? Bird voice recognition. Part 6 - Neural network MVP

What  did the bird say? Bird voice recognition. Part 7 - full dataset preprocessing (169GB)

What  did the bird say? Bird voice recognition. Part 8 - fast Squeeze-net on 800k images - 500s per epoch (10-20x speed up)

Eurasian bullfinch looks like the berries it eats

1. Add taxonomy to the mix

So, here we finished off with downloading a bird taxonomy database. Let's make use of it! Let's read the data, add it to our database and see how much taxonomic aggregation will enable us to balance the dataset (I understand that if we have 10 bird songs, it probably means that we have ca. 10x the size of samples, but having a decent database is a must have for any such problem).

Let's join the tables using pd.merge by applying lower() to the genus (all other taxonomic units have some variations in their spelling). This gives us ca. 70% of the xeno-canto database with taxonomy out of box with zero data wrangling!

The code for joining tables looks like this (I omitted all the boring parts):

bird_calls = pd.read_csv('bird_api_data.csv')
bird_taxonomy = pd.read_csv('birds_pivot.csv')
bird_taxonomy['l_genus'] = bird_taxonomy.genus.str.lower()
bird_taxonomy[bird_taxonomy['genus'].apply(lambda x: x.lower()) == 'Automolus'.lower()]
taxonomy_cols = ['class',
bird_taxonomy_distinct = bird_taxonomy[taxonomy_cols].drop_duplicates()
merged_df = pd.merge(bird_calls,

This produces (367577, 28). Note that merged_df[merged_df.family.isnull()].shape  gives us (81430, 28) which illustrates the point above. Also pay attention to the drop_duplicates() bit in the code above - it's very important - it transforms relatively big taxonomy table (30k+ entries) to (1702, 8) entries.

2. Is the amount of data now sufficient?

The key reason why we decided to go for the taxonomy database (despite the fact that this is awesome) is that event given average length of files of 30s (I did not know that when making the decisions) average of ca. 30 bird songs seems not enough for resilient classifier training. So let's see now if that helped. Let's briefly analyze which data we have using vernacular names of birds families, orders and genus!

Bird orders

table = pd.pivot_table(merged_df,
    margins = True,
pd.set_option('display.max_rows', len(table))
table = table.sort_values(by=('id','All'),ascending=False)

We get this. Seems too general.

Bird families

Also seems too general.

Bird genus

Seems like the best fit. Also please notice that below genus there is are only species, which are  too granular. Also it will be cool to analyze bird call similarity based on their closeness inside of taxonomy tree...

3. A small treat - listen to the birds!

Now that we have a rough idea what data we have, let's finally listed to the birds. For this purpose there are generally 2 approaches:

All the bird calls are in MP3. So we will use the following function to play MP3:

import scipy.constants as const
import scipy
from scipy.io import wavfile
from IPython.core.display import HTML
%pylab inline
from __future__ import division
def mp3Player(filepath):
    """ will display html 5 player for compatible browser
    Parameters :
    filepath : relative filepath with respect to the notebook directory ( where the .ipynb are not cwd)
               of the file to play
    The browser need to know how to play wav through html5.
    there is no autoplay to prevent file playing when the browser opens
    src = """
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>Simple Test</title>
    <audio controls="controls" style="width:600px" >
      <source src="%s" type="audio/mpeg" />
      Your browser does not support the audio element.

An the following function to download random bird calls and let us play them. The function can be run as many times as you want and it produces 5 random samples of bird calls from the xeno-canto database.

cols = ['date',
def get_random_bird_calls(df, vgenus):
    sample_df = df[merged_df['vgenus']==vgenus].sample(n=5)[cols]
    for index, row in sample_df.iterrows():
        file_path = row.file
        page = requests.get(file_path)
        file_path = page.url
        ! curl $file_path --output $index".mp3"
    return sample_df

The output looks something like this:

Now for a final treat you can listen to what I listened right now in your browser!

4. Downloads and further reading

All of the above demonstrations are based on these notebooks

  • This notebook in ipynb
  • This notebook in html

Best in the world examples of Audio processing: