Nonperiodic Pathologic Voice Signals Classification Using Mel-Spectrogram and VGGish

Nonperiodic Pathologic Voice Signals Classification Using Mel-Spectrogram and VGGish

In: Health Technologies and Demographic Challenges

Author

Listed:

Joana Filipa Teixeira Fernandes
(Research Centre in Digitalization and Intelligent Robotics (CeDRI), Laboratório para a Sustentabilidade e Tecnologia em Regiões de Montanha (SusTEC), - Instituto Politécnico de Bragança (IPB)
Faculty of Engineering of University of Porto (FEUP))
João Viana Pinto
(University Hospital Centre of São João, Otorhinolaryngology Department
University of Oporto, Faculty of Medicine, Unit of Otorhinolaryngology, Department of Surgery and Physiology
Centre for Health Technology and Services Research (CINTESIS) Rua Dr. Plácido da Costa)
Carla Pinto Moura
(University of Porto, Genetics, Faculty of Medicine, Department of Pathology
University Hospital Centre of São JoãoPorto, Department of Otorhinolaryngology)
Helena Vilarinho
(University Hospital Centre of São JoãoPorto, Department of Otorhinolaryngology
School of Health Sciences (ESSUA), University of Aveiro)
Felipe Teixeira
(Research Centre in Digitalization and Intelligent Robotics (CeDRI), Laboratório para a Sustentabilidade e Tecnologia em Regiões de Montanha (SusTEC), - Instituto Politécnico de Bragança (IPB)
Applied Management Research Unit (UNIAG) - Instituto Politécnico de Bragança (IPB)
School of Sciences and TechnologyUniversity of Trás-os-Montes and Alto Douro (UTAD) Quinta de Prados, Engineering Department)
Diamantino Freitas
(Faculty of Engineering of University of Porto (FEUP))
João Paulo Teixeira
(Research Centre in Digitalization and Intelligent Robotics (CeDRI), Laboratório para a Sustentabilidade e Tecnologia em Regiões de Montanha (SusTEC), - Instituto Politécnico de Bragança (IPB))

Registered:

Abstract

In this work and the literature, voice signals can be classified as periodic (type 1) or either some periodicity (type 2) and chaos (type 3). This work aims to classify signs into types 1, 2 or 3 to be subsequently applied in a classification system for pathological/control signs. The original dataset is composed of 466 type 1 individuals, 900 type 2 individuals, and 84 type 3 individuals classified by an otolaryngologist. 15% of the data was used for testing and the remaining 85% was used for training and validation. A data augmentation technique was applied to balance the data in training set. Therefore, for the test set, 3380 sounds were used, 1020 type 1, 1280 type 2 and 1080 type 3. Of these, 80% were used for training and 20% for validation. The Mel spectrograms of the signals were used in the input of a VGGish to retrain the model in classifying the 3 types of signals. Regarding test accuracy, this network obtained 71.2%.

Suggested Citation

Joana Filipa Teixeira Fernandes & João Viana Pinto & Carla Pinto Moura & Helena Vilarinho & Felipe Teixeira & Diamantino Freitas & João Paulo Teixeira, 2025. "Nonperiodic Pathologic Voice Signals Classification Using Mel-Spectrogram and VGGish," Springer Proceedings in Business and Economics, in: Pedro Miguel Gaspar & Juan Manuel Cueva Lovelle & Carlos Mentenegro-Marín & Teresa Guarda (ed.), Health Technologies and Demographic Challenges, pages 3-13, Springer.

Handle: RePEc:spr:prbchp:978-3-031-94901-2_1
DOI: 10.1007/978-3-031-94901-2_1

Download full text from publisher

To our knowledge, this item is not available for download. To find whether it is available, there are three options:
1. Check below whether another version of this item is available online.
2. Check on the provider's web page whether it is in fact available.
3. Perform a for a similarly titled item that would be available.

More about this item

Keywords

; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:prbchp:978-3-031-94901-2_1. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Nonperiodic Pathologic Voice Signals Classification Using Mel-Spectrogram and VGGish

In: Health Technologies and Demographic Challenges

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data