Efficient Small-Footprint Keyword Spotting Strategies

small footprint keyword spotting n.w
1 / 21
Embed
Share

Explore cutting-edge methods in small-footprint keyword spotting, including end-to-end architectures and domain adversarial training. Learn about the advantages, dataset used, experimental setups, and future implications.

  • Small Footprint
  • Keyword Spotting
  • End-to-End
  • Architecture
  • Domain Adversarial

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Small-footprint Keyword Spotting C.Y.Eddie Lin ( ) MIR Lab, CSIE Dept. National Taiwan University eddie.lin@mirlab.org 2020/01/13

  2. 2025/4/12 Outline Introduction Dataset End-to-end architectures TDNN+SWSA TDNN only Domain Adversarial Training Experimental setup and results Demo Conclusions and future work 2/21

  3. 2025/4/12 Introduction End-to-end models for small-footprint keyword spotting Advantages of end-to-end approaches Directly outputs keyword detection No complicated searching involved No alignments needed beforehand 1. 2. 3. Small-footprint requirement Highly accurate Low latency Run in computationally constrained environment 1. 2. 3. 3/21

  4. 2025/4/12 Dataset Google Speech Commands 64752 utterances 51088 for training 6798 for validation 6835 for testing 4/21

  5. 2025/4/12 End-to-End Architecture TDNN+SWSA A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting , Interspeech 2019 5/21

  6. 2025/4/12 End-to-End Architecture TDNN only 6/21

  7. 2025/4/12 End-to-End Architecture Domain Adversarial Training 7/21

  8. 2025/4/12 Experimental Setup MFCC : Window size = 25 ms, window step = 10 ms FFT size = 512 Number of filters = 40 Dimension = 40 1. 2. 3. 4. Weight initialization : Xavier_uniform Optimizer : Adam Loss : Cross entropy Acc : Classification error 8/21

  9. 2025/4/12 Xavier Initialization Rationale Random initialization 0 gradient Xavier initialization : 0 9/21

  10. 2025/4/12 Number of Parameters Layer w k d l #para Input 40 99 TDNN 3 3 32 33 3840 TDNN-SUB 3 1 32 31 3072 TDNN 3 1 32 29 3072 Global Pooling 32 Softmax 352 Total 10336 w: Kernel size ( frame) k: Steps d: Feature dimension l: Input length 10/21

  11. 2025/4/12 Number of Parameters (Cont d) #para = 3*40*32 = 3840 32 40 3 11/21

  12. 2025/4/12 Feature Disentanglement Phonetic Speaker Feature Phonetic Feature GAN Domain Adversarial Training (Speaker Speech Classifier) Encoder Phonetic Feature 12/21

  13. 2025/4/12 Domain Adversarial Training (DAT) Noise Adaptive Speech Enhancement using Domain Adversarial Training , Interspeech 2019 13/21

  14. 2025/4/12 Domain Adversarial Training (Cont d) 14/21

  15. 2025/4/12 Impact of is the importance weight between two classifiers. ( ) Encoder Speaker_Classifier Speech_Classifier Training Encoder Speaker_Classifier DAT = 0.1, 0.05 15/21

  16. 2025/4/12 Training Plot and Result TDNN only Epoch=243, CER=5.2% 16/21

  17. 2025/4/12 Training Plot and Result - DAT Epoch=208, CER=4.8% 17/21

  18. 2025/4/12 Confusion Matrix - DAT 18/21

  19. 2025/4/12 Comparison Model CER #Para Resnet15* 4.2% 238K TDNN+SWSA* 4.19% 12K DAT 4.8% 10K TDNN only 5.2% 10K * Other papers. 19/21

  20. 2025/4/12 Demo Demo system : https://140.112.29.4:5566 20/21

  21. 2025/4/12 Conclusions and Future Work Conclusions End-to-end model TDNN DAT model feature Future work model DAT Feature 21/21

More Related Content