The radio playlists that you see on Muziekweb are automatically compiled by our computers. These computers listen to streaming radio stations. Based on 'audio fingerprints', the correct music is recognized. This is done from the complete Muziekweb collection. If the music does not appear in the Muziekweb collection, our computers cannot recognize it. This can occur, for example, with live performances. Because of this you can sometimes see 'holes' at a playlist.
All approximately 6.5 million music files are indexed using a fingerprint algorithm. For this the audio signal from 44.1 kHz is reduced to 5 kHz. Then a Fast Fourier Transform (FFT) is applied over a window of 371ms of audio. A 32-bit number is calculated from this data. This is the 'subfinger' that is saved. Then we continue 11.6 ms in the audio and do the same again. This continues until the entire track has been converted to subfingers. The whole of all subfingers is the fingerprint of the music. This fingerprint is considerably shorter than the original audio track. This fingerprint is stored in a MySQL database, so it can be queried later to compare with another piece of music.
From the created MySQL database, a inverted index is built. We use Lucene for this. This is the same software that we use to create the search interface on Muziekweb. This makes it possible to quickly select the correct track based on subfingers. This lucene index is approximately 286gb and grows weekly. It consists of approximately 2.5 billion unique inputs (subfingers).
The MySQL database and the lucene index are updated weekly, this process takes approximately 24 hours. As soon as the new index is available, it is copied to the various computers that listen to the radio stations.
The computers that listen to the radio stations proceed as follows:
The audio signal (mp3 or aac stream) is decoded to wav format and converted to a 5 kHz signal. Subfingers are created from this again by the described algorithm. This is done in 15 seconds of audio.
The match algorithm looks up the subfingers in the lucene index. Possible audio tracks are selected based on this. To find the right music track now, the Hamming distance determines which music track is the most similar. You can imagine that the original music file as it is on CD looks different due to all kinds of compressions, for example MP3. A match is therefore found by allowing a certain degree of error. Ultimately this will (hopefully) result in the 'recognition' of the correct audio track. Some radio stations speed up the audio somewhat, making detection difficult, for example, this is the case with Radio 538. To still recognize the music, the audio is first delayed before the fingerprint is calculated.
A similar algorithm with description can be found here , disadvantage of this implementation is that the algorithm used is difficult to scale.
Two servers and three Intel NUCs are used. One server serves as MySQL database, the other server is used for making the fingerprints and the inverted (lucene) index. We use Intel NUCs to listen to the radio stations. Each NUC can follow three radio stations simultaneously. We are currently following 8 radio stations, so we have three NUCs in use for this.
The code is available on github, including a 1,3 miljoen fingerprint database.
The software is developed for Muziekweb by Yvo Nelemans.
Muziekweb.nl is onderdeel van het Nederlands instituut voor Beeld & Geluid
© Stichting Nederlands instituut voor Beeld & Geluid, 1995 - 2023