Image: Courtesy of Busta Rhymes, for the nostalgics
In this article, we are going to build your very own personalized music search engine using an open-source sound-embedding model and Qdrant vector database, which has been gaining serious attention in the AI community lately.
Musicians, sound designers and sound professionnals alike all struggle with a special reality : finding sounds in the vast ocean of samples, stems, tracks, sounds is especially complicated. First and foremost because the sound is linked to time and that it is hard to listen to 2 tracks at once, and virtually impossible to listen to more at the same time.
On the contrary, scrolling to dozens of images at once is so much easier ! Think of you scrolling, you can absorb tons of images. Sound is a much more difficult media, setting the stage for needing a much more detailed approach.
As our world becomes increasingly auditory, the demand for efficient ways to search and access information through sound is rapidly growing. The traditional text-based search methods face challenges in accurately capturing the essence of complex audio or musical pieces (imagine trying to describe a symphony or the ambient sounds of a rainforest accurately in words — it's quite challenging).
Enter the revolutionary concept of Reverse Audio Search, akin to Google Images for sound. This technology enables users to upload or input a sound clip into a search engine to locate similar or identical audio tracks, marking a significant advancement in the realm of auditory search.
Leveraging the power of AI takes this efficiency and convenience to unprecedented levels. By incorporating AI capabilities with a Reverse Audio Search Engine, the search experience is greatly enhanced. Imagine the possibility of creating your own personalized audio search engine right on your device, akin to having your own bespoke version of Google for sound.
This process is divided into the following sections:
Please use the great documentation of our friends at digital Ocean !
Here’s the Ubuntu 22.04 (but it surely exists for your own flavour)
https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-22-04
Kicking off the setup for this musical project starts with retrieving the Docker container image imbued with melodious elements, followed by deploying it on your local Docker daemon. (Make sure to orchestrate the Docker application before proceeding.)
Harmonize your system by pulling the Qdrant client container, a symphony of data, from the Docker Hub repository. Then, conduct the container with the following command, setting the stage for the application to perform at localhost:6333
, a digital concert hall for your project's operatic debut.
docker pull qdrant/qdrant
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant
NOTE: If you are running on Windows, kindly replace $(pwd) with your local path.
Setup your environment, install the requirements and
git clone git@github.com:stackadoc/audiosearch.git
cd audiosearch
virtualenv --python 3.10 venv
source venv/bin/activate
pip install -r requirements.txt
For this project I’ve used the https://www.kaggle.com/competitions/park-spring-2023-music-genre-recognition/data dataset, which is a collection of 4500 files split into 10 music genres categories
kaggle competitions download -c park-spring-2023-music-genre-recognition
# For me, it gives the following :
KAGGLE_DB_PATH = '/home/arthur/data/kaggle/park-spring-2023-music-genre-recognition/train/train'
Simply modify the parameters for your CACHE_FOLDER and KAGGLE_DB_PATH, in the database.py folder
Here’s the simple code :
python app.py
Go on your favorite web browser and open the local URL displayed in your execution stack (for me, Gradio opens at : http://127.0.0.1:7861/ )
For the impatients and those who want to see results, please take a look at our video !