Using Cough Sounds for COVID-19 Classification: Pretraining and Data Augmentation Approach

 
C
l
a
s
s
i
f
i
c
a
t
i
o
n
 
o
f
 
C
O
V
I
D
-
1
9
 
f
r
o
m
 
C
o
u
g
h
 
U
s
i
n
g
A
u
t
o
r
e
g
r
e
s
s
i
v
e
 
P
r
e
d
i
c
t
i
v
e
 
C
o
d
i
n
g
 
P
r
e
t
r
a
i
n
i
n
g
 
a
n
d
 
S
p
e
c
t
r
a
l
D
a
t
a
 
A
u
g
m
e
n
t
a
t
i
o
n
 
John Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David Beiser, David Chestek
 
Coronavirus Disease of 2019 (COVID-19) is caused by
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-
CoV2)
Highly contagious and has resulted in many deaths
Isolating infected individuals is key to inhibiting transmission
Fast and accurate testing is needed
 
C
O
V
I
D
-
1
9
 
Existing techniques like viral and serology tests are
expensive and difficult to scale
Machine learning techniques have used CT scans, X-rays,
etc. which are also expensive and require a visit to a
hospital
Using cough sounds allows for cheap, scalable, fast and
remote screening for COVID-19
 
M
o
t
i
v
a
t
i
o
n
 
3
 
Cough audio data was distributed at 44.1kHz in FLAC
format
Data was manually annotated for cough type (shallow or
heavy)
1040 total cough samples (75 COVID-19 positive, 965
COVID-19 negative)
Training data split into 5 folds with 233 blind test samples
distributed with no attached metadata
 
D
i
C
O
V
A
 
C
h
a
l
l
e
n
g
e
 
(
d
a
t
a
s
e
t
)
 
4
 
Over 20,000 crowdsourced cough audio recordings
 
Collected at 48kHz
 
23% of collected samples were either COVID-19
symptomatic or COVID-19 positive
 
C
O
U
G
H
V
I
D
 
d
a
t
a
s
e
t
 
5
 
Goal is to classify presence or absence of COVID-19
Due to data scarcity in challenge dataset, we use
pretraining and data augmentation to avoid overfitting
We pretrain an LSTM network using the COUGHVID
dataset
We finetune using additional layers on the DiCOVA dataset
and augment the training data using spectral augmentation
 
O
v
e
r
v
i
e
w
 
o
f
 
A
p
p
r
o
a
c
h
 
6
 
P
r
e
t
r
a
i
n
i
n
g
 
7
 
[1] Chung et al. “
An unsupervised autoregressive model for speech representation learning”
 
We use 4 LSTM layers followed
by 2 fully-connected layers
BLSTM cannot be used because
it would break causality
The network is split into “upper”
and “lower” layers, where the
lower layers are used to extract
features for finetuning
 
P
r
e
t
r
a
i
n
i
n
g
 
8
 
Two BLSTM layers are added to frozen
LSTM layers from APC pretraining.
Forward and backward summaries from
BLSTM are used to map sequence to
fixed-length vector and pass through 3
fully-connected layers to predict the
probability of presence of COVID-19
SpecAugment [2] is used during
finetuning
 
F
i
n
e
t
u
n
i
n
g
 
9
 
[2] Park et al. “
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition”
 
Baseline: Models are trained at frame
level and sample probability is the
mean of all frame probabilities
Linear Regression Baseline:
Classifier is trained for maximum of
25 iterations
Multi-layer Perceptron: One layer with
25 hidden units
Random Forest: 50 trees and Gini
impurity
Best Test Config: Proposed approach
 
E
x
p
e
r
i
m
e
n
t
s
 
10
 
A
b
l
a
t
i
o
n
s
 
11
 
APC and spectral augmentation are both critical for improved
performance over baselines for classification of COVID-19 presence
from cough audio recordings
Our approach demonstrates that a large amount of unlabeled cough
data can be leveraged to improve performance on a small labeled
cough dataset
Related audio like speech can also be used during pretraining to
improve over baselines but is not as effective as using cough recordings
Overall, a cough-based classification system can be used to assist in
diagnosis of COVID-19 almost instantaneously and at virtually no cost
 
C
o
n
c
l
u
s
i
o
n
s
 
12
Slide Note
Embed
Share

Explore how cough sounds can aid in COVID-19 classification through autoregressive predictive coding pretraining and spectral data augmentation. Leveraging datasets like DiCOVA and COUGHVID, the goal is to develop a model that can distinguish COVID-19 based on cough type, providing a scalable and remote screening method.


Uploaded on Sep 08, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Classification of COVID-19 from Cough Using Autoregressive Predictive Coding Pretraining and Spectral Data Augmentation John Harvill, Yash R. Wani, Mark Hasegawa-Johnson, Narendra Ahuja, David Beiser, David Chestek

  2. COVID-19 Coronavirus Disease of 2019 (COVID-19) is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS- CoV2) Highly contagious and has resulted in many deaths Isolating infected individuals is key to inhibiting transmission Fast and accurate testing is needed

  3. Motivation Existing techniques like viral and serology tests are expensive and difficult to scale Machine learning techniques have used CT scans, X-rays, etc. which are also expensive and require a visit to a hospital Using cough sounds allows for cheap, scalable, fast and remote screening for COVID-19 3

  4. DiCOVA Challenge (dataset) Cough audio data was distributed at 44.1kHz in FLAC format Data was manually annotated for cough type (shallow or heavy) 1040 total cough samples (75 COVID-19 positive, 965 COVID-19 negative) Training data split into 5 folds with 233 blind test samples distributed with no attached metadata 4

  5. COUGHVID dataset Over 20,000 crowdsourced cough audio recordings Collected at 48kHz 23% of collected samples were either COVID-19 symptomatic or COVID-19 positive 5

  6. Overview of Approach Goal is to classify presence or absence of COVID-19 Due to data scarcity in challenge dataset, we use pretraining and data augmentation to avoid overfitting We pretrain an LSTM network using the COUGHVID dataset We finetune using additional layers on the DiCOVA dataset and augment the training data using spectral augmentation 6

  7. Pretraining Autoregressive Predictive Coding (APC) [1]: The goal is to predict a future spectral frame given previous spectral frames. Given an audio signal ?[?] and the output ? ? , error ? is computed as: ? = ?=1 The number of future frames ? to predict is a hyperparameter 2 ? ?? ? ? ? + ? 2 [1] Chung et al. An unsupervised autoregressive model for speech representation learning 7

  8. Pretraining We use 4 LSTM layers followed by 2 fully-connected layers BLSTM cannot be used because it would break causality The network is split into upper and lower layers, where the lower layers are used to extract features for finetuning 8

  9. Finetuning Two BLSTM layers are added to frozen LSTM layers from APC pretraining. Forward and backward summaries from BLSTM are used to map sequence to fixed-length vector and pass through 3 fully-connected layers to predict the probability of presence of COVID-19 SpecAugment [2] is used during finetuning 9 [2] Park et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

  10. Experiments Baseline: Models are trained at frame level and sample probability is the mean of all frame probabilities Linear Regression Baseline: Classifier is trained for maximum of 25 iterations Multi-layer Perceptron: One layer with 25 hidden units Random Forest: 50 trees and Gini impurity Best Test Config: Proposed approach 10

  11. Ablations Spectral Augmentation: 50% or 0% augmentation instead of 100% Future Frames: ? = 1 instead of ? = 10 Higher Layers: Use features output from higher layers of APC pretraining network (purple layers) Pretraining data: Use LibriSpeech data instead of COUGHVID data 11

  12. Conclusions APC and spectral augmentation are both critical for improved performance over baselines for classification of COVID-19 presence from cough audio recordings Our approach demonstrates that a large amount of unlabeled cough data can be leveraged to improve performance on a small labeled cough dataset Related audio like speech can also be used during pretraining to improve over baselines but is not as effective as using cough recordings Overall, a cough-based classification system can be used to assist in diagnosis of COVID-19 almost instantaneously and at virtually no cost 12

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#