Application of Deep Learning in Music Genre Classification

Authors

  • Minghao Chen School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China Author

DOI:

https://doi.org/10.71465/fair331

Keywords:

Music Genre Classification, Deep Learning, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), GTZAN Dataset

Abstract

Music genre classification is a fundamental task in Music Information Retrieval, yet achieving high accuracy remains challenging due to overlapping genre characteristics. This paper investigates deep learning approaches—specifically convolutional neural networks (CNNs) and recurrent neural networks (RNNs)—for automatic music genre classification using the GTZAN dataset. Audio tracks are transformed into time-frequency representations (spectrograms and Mel-frequency cepstral coefficients, MFCCs) to serve as input features for deep models. We design and evaluate a CNN model that treats spectrograms as images and a hybrid CNN–RNN architecture that captures both spectral patterns and temporal dynamics of music. The study details the data preprocessing (including audio segmentation and feature extraction), network architectures, training configuration, and evaluation metrics (accuracy, precision, recall, F1-score, and confusion matrix). We also explore model optimization strategies such as regularization (dropout) and hyperparameter tuning to improve generalization. Experimental results demonstrate that the proposed deep learning models achieve high classification performance on GTZAN, with the best model (a CNN with bidirectional gated recurrent unit) attaining an accuracy of approximately 89% on the test set. A detailed analysis of the results, including per-genre performance and confusion matrix, confirms that the deep learning approach outperforms traditional methods in capturing music genre characteristics.

Downloads

Download data is not yet available.

Downloads

Published

2025-09-07