What did the bird say? Bird voice recognition. Part 1 - the beginning
What did the bird say? Bird voice recognition. Part 2 - taxonomy
What did the bird say? Bird voice recognition. Part 3 - Listen to the birds
What did the bird say? Bird voice recognition. Part 4 - Dataset choice, data download and pre-processing, visualization and analysis
What did the bird say? Bird voice recognition. Part 5 - Data pre-processing for CNNs
What did the bird say? Bird voice recognition. Part 6 - Neural network MVP
What did the bird say? Bird voice recognition. Part 7 - full dataset preprocessing (169GB)
What did the bird say? Bird voice recognition. Part 8 - fast Squeeze-net on 800k images - 500s per epoch (10-20x speed up)
Eurasian bullfinch looks like the berries it eats
1. Add taxonomy to the mix
So, here we finished off with downloading a bird taxonomy database. Let's make use of it! Let's read the data, add it to our database and see how much taxonomic aggregation will enable us to balance the dataset (I understand that if we have 10 bird songs, it probably means that we have ca. 10x the size of samples, but having a decent database is a must have for any such problem).
Let's join the tables using pd.merge by applying lower() to the genus (all other taxonomic units have some variations in their spelling). This gives us ca. 70% of the xeno-canto database with taxonomy out of box with zero data wrangling!
The code for joining tables looks like this (I omitted all the boring parts):
bird_calls = pd.read_csv('bird_api_data.csv')
bird_taxonomy = pd.read_csv('birds_pivot.csv')
bird_taxonomy['l_genus'] = bird_taxonomy.genus.str.lower()
bird_taxonomy[bird_taxonomy['genus'].apply(lambda x: x.lower()) == 'Automolus'.lower()]
taxonomy_cols = ['class',
bird_taxonomy_distinct = bird_taxonomy[taxonomy_cols].drop_duplicates()
merged_df = pd.merge(bird_calls,
This produces (367577, 28). Note that merged_df[merged_df.family.isnull()].shape gives us (81430, 28) which illustrates the point above. Also pay attention to the drop_duplicates() bit in the code above - it's very important - it transforms relatively big taxonomy table (30k+ entries) to (1702, 8) entries.
2. Is the amount of data now sufficient?
The key reason why we decided to go for the taxonomy database (despite the fact that this is awesome) is that event given average length of files of 30s (I did not know that when making the decisions) average of ca. 30 bird songs seems not enough for resilient classifier training. So let's see now if that helped. Let's briefly analyze which data we have using vernacular names of birds families, orders and genus!
table = pd.pivot_table(merged_df,
margins = True,
table = table.sort_values(by=('id','All'),ascending=False)
We get this. Seems too general.
Also seems too general.
Seems like the best fit. Also please notice that below genus there is are only species, which are too granular. Also it will be cool to analyze bird call similarity based on their closeness inside of taxonomy tree...
3. A small treat - listen to the birds!
Now that we have a rough idea what data we have, let's finally listed to the birds. For this purpose there are generally 2 approaches:
- Use HTML5 player, that can play both MP3 and WAV;
- Use ipython display audio - plays only WAV;
All the bird calls are in MP3. So we will use the following function to play MP3:
import scipy.constants as const
from scipy.io import wavfile
from IPython.core.display import HTML
from __future__ import division
""" will display html 5 player for compatible browser
filepath : relative filepath with respect to the notebook directory ( where the .ipynb are not cwd)
of the file to play
The browser need to know how to play wav through html5.
there is no autoplay to prevent file playing when the browser opens
src = """
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<audio controls="controls" style="width:600px" >
<source src="%s" type="audio/mpeg" />
Your browser does not support the audio element.
An the following function to download random bird calls and let us play them. The function can be run as many times as you want and it produces 5 random samples of bird calls from the xeno-canto database.
cols = ['date',
def get_random_bird_calls(df, vgenus):
sample_df = df[merged_df['vgenus']==vgenus].sample(n=5)[cols]
for index, row in sample_df.iterrows():
file_path = row.file
page = requests.get(file_path)
file_path = page.url
! curl $file_path --output $index".mp3"
The output looks something like this:
Now for a final treat you can listen to what I listened right now in your browser!
4. Downloads and further reading
All of the above demonstrations are based on these notebooks
Best in the world examples of Audio processing: