Machine Learning Techniques for Music Prediction
In this research project supervised by Prof. Nick Webb, the aim is to predict the year of release and genre of songs using machine learning methods. The study uses the Million Song Dataset and WEKA software for analysis. Challenges like missing data and data format issues were encountered, leading to a focus on relevant attributes for prediction. Motivated by attributes like loudness, duration, tempo, and more, the research delves into music data analysis to enhance prediction accuracy.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
MACHINE LEARNING TECHNIQUES FOR MUSIC PREDICTION S. Grant Lowe Advisor: Prof. Nick Webb
RESEARCH QUESTIONS Can we predict the year in which a song was released? Can we predict the genre of a song? Can we identify which attributes are the strongest in answering these questions?
BACKGROUND Hit Song Science Genre Classification Year Prediction
APPROACH Use WEKA Use the Million Song Dataset
WEKA Machine Learning Software Contains Visualization tools and algorithms for data analysis and modeling
DATA Million Song Data Set: commercial tracks from 1922-2011,collected by LabROSA
EARLY CHALLENGES Data in the wrong Format: HDF5 vs CSV Lots of missing Data! Almost half of the songs are missing year, a very important attribute Many attributes are being ignored because a majority of the songs are missing data. ArtistID -> Year?
ATTRIBUTES The MSD contains 53 descriptive attributes for each song, along with 90 timbre attributes. Attributes were removed if they were not good indicators of release year or genre, or if they were too closely tied to what was being classified.
ATTRIBUTE MOTIVATION Ranked Descriptive Attributes Loudness (measured in decibels) Duration (in seconds) Tempo (estimated tempo in BPM) Time Signature (estimated beats per bar) Key Mode (major or minor) Timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. It is a complex notion also referred to as sound color, texture, or tone quality, and is derived from the shape of a segment s spectro- temporal surface, independently of pitch and loudness.
EARLY RESULTS DESCRIPTIVE ATTRIBUTES Discretized into 6 decades; 1960-1970, 1970-1980, etc. Baseline (Chance selection): 16.67% First Tests: 6-9% correctly classified More recent Tests: 25-30% Why Random Forest and BayesNet?
GENRE PREDICTION Genres: Classic pop and rock Classical Dance and Electronica Folk Hip-Hop Jazz Metal Pop Rock and Indie Soul and Reggae
CONCLUSIONS & FUTURE WORK Timbre Attributes are better than Descriptive Attributes Why? Taste Profile Lyrical/Emotional Content Tag Dataset