Using Cough Sounds for COVID-19 Classification: Pretraining and Data Augmentation Approach

John Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David Beiser, David Chestek



Coronavirus Disease of 2019 (COVID-19) is caused by

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-

CoV2)



Highly contagious and has resulted in many deaths



Isolating infected individuals is key to inhibiting transmission



Fast and accurate testing is needed



Existing techniques like viral and serology tests are

expensive and difficult to scale



Machine learning techniques have used CT scans, X-rays,

etc. which are also expensive and require a visit to a

hospital



Using cough sounds allows for cheap, scalable, fast and

remote screening for COVID-19



Cough audio data was distributed at 44.1kHz in FLAC

format



Data was manually annotated for cough type (shallow or

heavy)



1040 total cough samples (75 COVID-19 positive, 965

COVID-19 negative)



Training data split into 5 folds with 233 blind test samples

distributed with no attached metadata



Over 20,000 crowdsourced cough audio recordings



Collected at 48kHz



23% of collected samples were either COVID-19

symptomatic or COVID-19 positive



Goal is to classify presence or absence of COVID-19



Due to data scarcity in challenge dataset, we use

pretraining and data augmentation to avoid overfitting



We pretrain an LSTM network using the COUGHVID

dataset



We finetune using additional layers on the DiCOVA dataset

and augment the training data using spectral augmentation

[1] Chung et al. “

An unsupervised autoregressive model for speech representation learning”



We use 4 LSTM layers followed

by 2 fully-connected layers



BLSTM cannot be used because

it would break causality



The network is split into “upper”

and “lower” layers, where the

lower layers are used to extract

features for finetuning



Two BLSTM layers are added to frozen

LSTM layers from APC pretraining.

Forward and backward summaries from

BLSTM are used to map sequence to

fixed-length vector and pass through 3

fully-connected layers to predict the

probability of presence of COVID-19



SpecAugment [2] is used during

finetuning

[2] Park et al. “

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition”



Baseline: Models are trained at frame

level and sample probability is the

mean of all frame probabilities



Linear Regression Baseline:

Classifier is trained for maximum of

25 iterations



Multi-layer Perceptron: One layer with

25 hidden units



Random Forest: 50 trees and Gini

impurity



Best Test Config: Proposed approach



APC and spectral augmentation are both critical for improved

performance over baselines for classification of COVID-19 presence

from cough audio recordings



Our approach demonstrates that a large amount of unlabeled cough

data can be leveraged to improve performance on a small labeled

cough dataset



Related audio like speech can also be used during pretraining to

improve over baselines but is not as effective as using cough recordings



Overall, a cough-based classification system can be used to assist in

diagnosis of COVID-19 almost instantaneously and at virtually no cost

Slide Note

Embed Share

Download

Explore how cough sounds can aid in COVID-19 classification through autoregressive predictive coding pretraining and spectral data augmentation. Leveraging datasets like DiCOVA and COUGHVID, the goal is to develop a model that can distinguish COVID-19 based on cough type, providing a scalable and remote screening method.

pako712 Follow

Uploaded on Sep 08, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation John Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David Beiser, David Chestek

COVID-19 Coronavirus Disease of 2019 (COVID-19) is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS- CoV2) Highly contagious and has resulted in many deaths Isolating infected individuals is key to inhibiting transmission Fast and accurate testing is needed

Motivation Existing techniques like viral and serology tests are expensive and difficult to scale Machine learning techniques have used CT scans, X-rays, etc. which are also expensive and require a visit to a hospital Using cough sounds allows for cheap, scalable, fast and remote screening for COVID-19 3

DiCOVA Challenge (dataset) Cough audio data was distributed at 44.1kHz in FLAC format Data was manually annotated for cough type (shallow or heavy) 1040 total cough samples (75 COVID-19 positive, 965 COVID-19 negative) Training data split into 5 folds with 233 blind test samples distributed with no attached metadata 4

COUGHVID dataset Over 20,000 crowdsourced cough audio recordings Collected at 48kHz 23% of collected samples were either COVID-19 symptomatic or COVID-19 positive 5

Overview of Approach Goal is to classify presence or absence of COVID-19 Due to data scarcity in challenge dataset, we use pretraining and data augmentation to avoid overfitting We pretrain an LSTM network using the COUGHVID dataset We finetune using additional layers on the DiCOVA dataset and augment the training data using spectral augmentation 6

Pretraining Autoregressive Predictive Coding (APC) [1]: The goal is to predict a future spectral frame given previous spectral frames. Given an audio signal ?[?] and the output ? ? , error ? is computed as: ? = ?=1 The number of future frames ? to predict is a hyperparameter 2 ? ?? ? ? ? + ? 2 [1] Chung et al. An unsupervised autoregressive model for speech representation learning 7

Pretraining We use 4 LSTM layers followed by 2 fully-connected layers BLSTM cannot be used because it would break causality The network is split into upper and lower layers, where the lower layers are used to extract features for finetuning 8

Finetuning Two BLSTM layers are added to frozen LSTM layers from APC pretraining. Forward and backward summaries from BLSTM are used to map sequence to fixed-length vector and pass through 3 fully-connected layers to predict the probability of presence of COVID-19 SpecAugment [2] is used during finetuning 9 [2] Park et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Experiments Baseline: Models are trained at frame level and sample probability is the mean of all frame probabilities Linear Regression Baseline: Classifier is trained for maximum of 25 iterations Multi-layer Perceptron: One layer with 25 hidden units Random Forest: 50 trees and Gini impurity Best Test Config: Proposed approach 10

Ablations Spectral Augmentation: 50% or 0% augmentation instead of 100% Future Frames: ? = 1 instead of ? = 10 Higher Layers: Use features output from higher layers of APC pretraining network (purple layers) Pretraining data: Use LibriSpeech data instead of COUGHVID data 11

Conclusions APC and spectral augmentation are both critical for improved performance over baselines for classification of COVID-19 presence from cough audio recordings Our approach demonstrates that a large amount of unlabeled cough data can be leveraged to improve performance on a small labeled cough dataset Related audio like speech can also be used during pretraining to improve over baselines but is not as effective as using cough recordings Overall, a cough-based classification system can be used to assist in diagnosis of COVID-19 almost instantaneously and at virtually no cost 12

Using Cough Sounds for COVID-19 Classification: Pretraining and Data Augmentation Approach

Download Presentation

Presentation Transcript

Related

More Related Content