Assistive Communication System for Individuals with Cerebral Palsy Using Head Movement and Speech Recognition
Development of a multimodal communication aid system for individuals with cerebral palsy, addressing challenges such as dysarthria and involuntary movements. The system allows users to communicate through head movements and speech recognition, reducing the reliance on caregivers and minimizing disruptions from involuntary actions. By leveraging advanced technologies, this innovative solution aims to improve communication efficiency and accessibility for people with cerebral palsy.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
A Multimodal Communication Aid for Persons with Cerebral Palsy Using Head Movement and Speech Recognition Tomoka Ikeda, Masakazu Hirokawa, Kenji Suzuki University of Tsukuba 17th International Conference on Computers Helping People with Special Needs
Background What s Cerebral Palsy(CP)? Motor dysfunction caused by brain damage up to 4 weeks after birth Vocalization is a movement of articulatory organs, so it is easy to cause dysarthria Challenges Need communication support Method Merit Demerit Need a helper who understands their symptoms Communication Board Caregiver checks and judges Almost accurate Character selection by touch panel, etc. Disturbed by involuntary movements Talking Aid No caregiver Speech recognition for Deaf person Speech recognition by deep learning Utilization of remaining functions Not compatible with dysarthria A communication system with less restrictions on use is needed for people with cerebral palsy 2020/09/11 ICCHP 1
Interview about current communication Interview with a 20-year-old man with athetosis-type (having involuntary movements) cerebral palsy and his family End user Caregiver Expression of intention Call out by speaking Notice communication intention Confirmation of intention Prepare board when it is judged that the speech cannot be understood Understanding the content of communication Point one character at a time and observe the reaction of the subject Expression of intension Send a signature one character at a time by raising his head to look at the other person's eyes and uttering "Oh" or CHIGAU(Different)" Understanding the content of communication Understand as a word Remaining functions that are easy to express intention are judged as speech and head movements 2020/09/11 ICCHP 2
Purpose Development a multimodal communication assistance system The participant : Persons with CP with dysarthria and involuntary movement System requirements Remove the restrictions so far No need to rely on a caregiver Less likely to be disturbed by involuntary movements High communication costs make it difficult for people to communicate Communication time Time to share recognition Communication time is shorter than the conventional method (about 5 seconds per character) 2020/09/11 ICCHP 3
Proposed Method IMU microphone Communication Main controller Caregiver Target person for support Head Utterance Movement Decision Convert to a vowel string behavior recognition Output (e.g. display, speaker,email) Reference Language corpus Speech Recognition Word candidate Decision Predict character string, number of characters /atsui/ (hot) /atsui/ (hot) /atsui/(hot) /kajui/(itch) /samui/(cold) Mail /aui/ Utterance Display Recognition rate of consonants for humans is less than 5% 2020/09/11 ICCHP 4
System configuration hardware pitch yaw IMU headset Bluetooth Battery for IMU Voice information Operation information Main Controller Decision making and estimation ? ?/? ? ? ?(?) = ??[??]? 15 +??[??]? 15 ? ? ?(?) = ??[??]? 0.5 +??[??]? 0.5 ???? (??) ??= Pitch angle variance ??= Yaw angle variance When? ?> ?, presumed to be a decision ???(??) t ? 15 ? 0.5 ? Pratool Bharti et al, IEEE Journal of Biomedical and Health Informatics, 2017 2020/09/11 ICCHP 5
System configuration(Speech Recognition) User Utterance word by character Turn on Monitor volume Record Removal of Silent sections Segmentation Window:100ms Overlap:30ms MFCC feature extraction SVM identification Processing 2020/09/11 ICCHP 6
System Configuration(Word Prediction) Features of the language corpus used this time Number of Characters in a word Total 2 3 4 5 6 7 Number of unique vowel strings 6 26 21 7 1 1 62 Maximum number of words with a common vowel string 3 5 2 1 1 1 5 All 19 50 23 7 1 1 101 We assume the case is Japanese. This technique is possible even in multiple languages, if word candidates can be narrowed down by uttering a few sounds. 2020/09/11 ICCHP 7
Experiments (1) Speech Recognition To investigate the speech recognition rate (2) Decision Behavior To investigate the rate at which intentions can be extracted from head movements (3) Usability Test To evaluate this communication system 2020/09/11 ICCHP 8
Experiments (1) Speech Recognition Purpose To investigate the speech recognition rate with the speech recognition algorithm described in the system configuration Prepared data Speech data of the target person 5 types x 12 sets = 60 sounds Result Evaluation of learning model by leave-one-out 0.6944 The correct answer rate for the vowel listening test by humans was 60% It was better than humans 2020/09/11 ICCHP 9
Experiments (2) Decision Behavior Purpose Examine the accuracy of head movement recognition and the time from visual stimulation to estimation A simple reaction task to visual stimuli (37 seconds. 25 presentations. 6 stimuli (24%)) Method State of experiment Result The average time from visual stimulation to estimation as decision is 0.4 seconds Precision Recall F-score 0.625 0.857 0.72 The decision making can be estimated from the reaction by the proposed method. 2020/09/11 ICCHP 10
Experiments (3) Usability Testing Purpose Evaluate the communication system using the proposed method Method Measure and compare the time when using the conventional method and the created device Result Calculate the average time it takes to transfer 3 words each with 3 to 5 letters System usage example Time[s] Speech 5 Speech Recognition Word Prediction 1 Word Selection 2.5 Suggests that the proposed method may shorten the time compared to the conventional method 2020/09/11 ICCHP 11
Discussion Speech Recognition With a correct answer rate of 70%, it is possible to narrow down the candidates of 3-5 character words to about 10%. (Of the total number of combinations of vowel sequences, 3,875, 415 considering 40% incorrect answers) In particular, mistakes are likely to be biased to "a" and "o", and it is possible to narrow down word candidates by using word prediction, so it is possible to use voice recognition. 2020/09/11 ICCHP 12
Discussion Decision Behavior Head movements can be used for decision making estimation Improvement points It is necessary to distinguish between head movements other than decision making, involuntary movements, and movements that change posture Threshold setting considering balance with precision Usability Test Possibility that the proposed method can alleviate the constraints in communication The word selection method is useful, but you can combine words into a single sentence. 2020/09/11 ICCHP 13
Conclusion We have developed a communication system that utilizes head movements and speech, confirmed the validity of the proposed method. Experiments Evaluate Speech Recognition System and Behavior Decision System Usability Test Future Works Other potential communication applications; for example, our system could be linked with chat applications on smartphones Training the vocalizations of persons with CP 2020/09/11 ICCHP 14