Using Cough Sounds for COVID-19 Classification: Pretraining and Data Augmentation Approach
Explore how cough sounds can aid in COVID-19 classification through autoregressive predictive coding pretraining and spectral data augmentation. Leveraging datasets like DiCOVA and COUGHVID, the goal is to develop a model that can distinguish COVID-19 based on cough type, providing a scalable and remote screening method.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation John Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David Beiser, David Chestek
COVID-19 Coronavirus Disease of 2019 (COVID-19) is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS- CoV2) Highly contagious and has resulted in many deaths Isolating infected individuals is key to inhibiting transmission Fast and accurate testing is needed
Motivation Existing techniques like viral and serology tests are expensive and difficult to scale Machine learning techniques have used CT scans, X-rays, etc. which are also expensive and require a visit to a hospital Using cough sounds allows for cheap, scalable, fast and remote screening for COVID-19 3
DiCOVA Challenge (dataset) Cough audio data was distributed at 44.1kHz in FLAC format Data was manually annotated for cough type (shallow or heavy) 1040 total cough samples (75 COVID-19 positive, 965 COVID-19 negative) Training data split into 5 folds with 233 blind test samples distributed with no attached metadata 4
COUGHVID dataset Over 20,000 crowdsourced cough audio recordings Collected at 48kHz 23% of collected samples were either COVID-19 symptomatic or COVID-19 positive 5
Overview of Approach Goal is to classify presence or absence of COVID-19 Due to data scarcity in challenge dataset, we use pretraining and data augmentation to avoid overfitting We pretrain an LSTM network using the COUGHVID dataset We finetune using additional layers on the DiCOVA dataset and augment the training data using spectral augmentation 6
Pretraining Autoregressive Predictive Coding (APC) [1]: The goal is to predict a future spectral frame given previous spectral frames. Given an audio signal ?[?] and the output ? ? , error ? is computed as: ? = ?=1 The number of future frames ? to predict is a hyperparameter 2 ? ?? ? ? ? + ? 2 [1] Chung et al. An unsupervised autoregressive model for speech representation learning 7
Pretraining We use 4 LSTM layers followed by 2 fully-connected layers BLSTM cannot be used because it would break causality The network is split into upper and lower layers, where the lower layers are used to extract features for finetuning 8
Finetuning Two BLSTM layers are added to frozen LSTM layers from APC pretraining. Forward and backward summaries from BLSTM are used to map sequence to fixed-length vector and pass through 3 fully-connected layers to predict the probability of presence of COVID-19 SpecAugment [2] is used during finetuning 9 [2] Park et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Experiments Baseline: Models are trained at frame level and sample probability is the mean of all frame probabilities Linear Regression Baseline: Classifier is trained for maximum of 25 iterations Multi-layer Perceptron: One layer with 25 hidden units Random Forest: 50 trees and Gini impurity Best Test Config: Proposed approach 10
Ablations Spectral Augmentation: 50% or 0% augmentation instead of 100% Future Frames: ? = 1 instead of ? = 10 Higher Layers: Use features output from higher layers of APC pretraining network (purple layers) Pretraining data: Use LibriSpeech data instead of COUGHVID data 11
Conclusions APC and spectral augmentation are both critical for improved performance over baselines for classification of COVID-19 presence from cough audio recordings Our approach demonstrates that a large amount of unlabeled cough data can be leveraged to improve performance on a small labeled cough dataset Related audio like speech can also be used during pretraining to improve over baselines but is not as effective as using cough recordings Overall, a cough-based classification system can be used to assist in diagnosis of COVID-19 almost instantaneously and at virtually no cost 12