Application of Deep Learning in Music Genre Classification
DOI:
https://doi.org/10.71465/fair331Keywords:
Music Genre Classification, Deep Learning, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), GTZAN DatasetAbstract
Music genre classification is a fundamental task in Music Information Retrieval, yet achieving high accuracy remains challenging due to overlapping genre characteristics. This paper investigates deep learning approaches—specifically convolutional neural networks (CNNs) and recurrent neural networks (RNNs)—for automatic music genre classification using the GTZAN dataset. Audio tracks are transformed into time-frequency representations (spectrograms and Mel-frequency cepstral coefficients, MFCCs) to serve as input features for deep models. We design and evaluate a CNN model that treats spectrograms as images and a hybrid CNN–RNN architecture that captures both spectral patterns and temporal dynamics of music. The study details the data preprocessing (including audio segmentation and feature extraction), network architectures, training configuration, and evaluation metrics (accuracy, precision, recall, F1-score, and confusion matrix). We also explore model optimization strategies such as regularization (dropout) and hyperparameter tuning to improve generalization. Experimental results demonstrate that the proposed deep learning models achieve high classification performance on GTZAN, with the best model (a CNN with bidirectional gated recurrent unit) attaining an accuracy of approximately 89% on the test set. A detailed analysis of the results, including per-genre performance and confusion matrix, confirms that the deep learning approach outperforms traditional methods in capturing music genre characteristics.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Minghao Chen (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.