Addressing the Rare Word Problem in Neural Machine Translation

Slide Note

Thang Luong and team addressed the rare word problem in Neural Machine Translation by proposing an approach to track the origins of rare words in target sentences. They utilized unsupervised alignments and relative indices in the training data and implemented a post-processing method for test translations. Attention mechanisms for rare words were also incorporated, improving translation quality for sentences with rare words.

mats258 Follow

Uploaded on Oct 02, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript

Addressing the Rare Word Problem in Neural Machine Translation Thang Luong ACL 2015 Joint work with: Ilya Sutskever, Quoc Le, Oriol Vinyals, & Wojciech Zaremba.

Standard Machine Translation (MT) Cindy loves cute cats Cindy aime les chats mignons Translate locally phrases by phrases: Good progress: Moses (Koehn et al., 2007) among many others. Many subcomponents need to be tuned separately. Hybrid systems with neural components: Language model: (Schwenk et al., 2006), (Vaswani et al., 2013). Translation model: (Schwenk, 2012), (Devlin et al., 2014). Complex pipeline. Desire: a simple system that translates globally.

Neural Machine Translation (NMT) Target Sentence X Y Z A Source Sentence B C D X Y Z (Sutskever et al., 2014) Encoder-decoder: first proposed at Google & Montreal. Advantages: Minimal domain knowledge. Dimensionality reduction: up to 100-gram source-conditioned LMs. No gigantic phrase tables or LMs. Simple beam-search decoder.

Existing NMT Work Encoder Decoder (Kalchbrenner & Blunsom, 2013) Convolutional Net RNN (Sutskever et al., 2014) Long-short term memory (LSTM) LSTM (Cho et al., 2014), (Bahdanau et al., 2015) Gated Recurrent Unit (GRU) GRU All decoders use recurrent networks. All* NMT work uses fixed modest-size vocabulary <unk> to represent all OOV words. Translations with <unk> are troublesome! *Except the very recent work (Jean et al., 2015): scale to large vocabulary.

The Rare Word Problem Original Actual input The ecotax portico in Pont-de-Buis Le portique cotaxe de Pont-de-Buis The <unk> portico in <unk> Le <unk> <unk>de <unk> NMTs translate poorly for sentences with rare words. 40 35 BLEU Durrani et al. (37.0) 30 Sutskever et al. (34.8) 25 Sentences ordered by average frequency rank

Our approach Original Actual input The ecotax portico in Pont-de-Buis The <unk> portico in <unk> Le unk1 unk-1 de unk1 Le portique cotaxe de Pont-de-Buis Idea: track where each target <unk> comes from Annotate train data: unsupervised alignments & relative indices. Post-process test translations: word/identity translations. Attention for rare words (Bahdanau et al., 2015).

Our approach Original Actual input The ecotax portico in Pont-de-Buis The <unk> portico in <unk> Le unk1 unk-1 de unk1 Le portique cotaxe de Pont-de-Buis Idea: track where each target <unk> comes from Annotate train data: unsupervised alignments & relative indices. Post-process test translations: word/identity translations. Attention for rare words (Bahdanau et al., 2015).

Our approach Original Actual input The ecotax portico in Pont-de-Buis The <unk> portico in <unk> Le unk1 unk-1 de unk1 Le portique cotaxe de Pont-de-Buis Idea: track where each target <unk> comes from Annotate train data: unsupervised alignments & relative indices. Post-process test translations: word/identity translations. Attention for rare words (Bahdanau et al., 2015).

Our approach Original Actual input The ecotax portico in Pont-de-Buis The <unk> portico in <unk> Le unk1 unk-1 de unk1 Le portique cotaxe de Pont-de-Buis Idea: track where each target <unk> comes from Annotate train data: unsupervised alignments & relative indices. Post-process test translations: word/identity translations. Attention for rare words (Bahdanau et al., 2015). Treat any neural MT as a black box: annotate training data & post-process translations.

Experiments WMT 14 English-French Hyper-parameters: newstest2012+2013. BLEU: newstest2014. Setup: similar to (Sutskever et al., 2014) Stacking LSTMs: 1000 cells, 1000-dim embeddings. Reverse source sentences.

Results Systems BLEU SOTA in WMT 14(Durrani et al., 2014) 37.0 Our NMT systems (40K target vocab) Single 6-layer LSTM 30.4 Single 6-layer LSTM + Our technique 32.7 (+2.3) Ensemble of 8 LSTMs 34.1 Ensemble of 8 LSTMs + Our technique 36.9 (+2.8) Better models: better gains with our technique Na ve approach: monotonic alignments of <unk> Only +0.8 BLEU gain.

Results Systems BLEU SOTA in WMT 14(Durrani et al., 2014) 37.0 Our NMT systems (40K target vocab) Single 6-layer LSTM 30.4 Single 6-layer LSTM + Our technique 32.7 (+2.3) Ensemble of 8 LSTMs 34.1 Ensemble of 8 LSTMs + Our technique 36.9 (+2.8) Our NMT systems (80K target vocab) Single 6-layer LSTM 31.5 Single 6-layer LSTM + Our technique 33.1 (+1.6) Ensemble of 8 LSTMs 35.6 Ensemble of 8 LSTMs + Our technique 37.5 (+1.9) New SOTA: about +2.0 BLEU gain with our technique

Existing Work Systems Vocab BLEU Ensemble 8 LSTMs (This work) 80K 37.5 SOTA in WMT 14 (Durrani et al., 2014) All 37.0 Standard MT + neural components Neural Language Model (Schwenk, 2014) All 33.3 Phrase table neural features (Cho et al., 2014) All 34.5 Ensemble 5 LSTMs, rerank n-best lists (Sutskever et al., 2014) All 36.5

Existing Work Systems Vocab BLEU Ensemble 8 LSTMs (This work) 80K 37.5 SOTA in WMT 14 (Durrani et al., 2014) All 37.0 Standard MT + neural components Neural Language Model (Schwenk, 2014) All 33.3 Phrase table neural features (Cho et al., 2014) All 34.5 Ensemble 5 LSTMs, rerank n-best lists (Sutskever et al., 2014) All 36.5 End-to-end NMT systems Ensemble 5 LSTMs (Sutskever et al., 2014) 80K 34.8 Single RNNsearch (Bahdanau et al., 2015) 30K 28.5 Ensemble 8 RNNsearch + Unknown replace (Jean et al., 2015) 500K 37.2 Still SOTA performance until now! We got 37.7 after ACL camera-ready version.

Effects of Translating Rare Words 40 Durrani et al. (37.0) 35 BLEU Sutskever et al. (34.8) 30 This work (37.5) 25 Sentences ordered by average frequency rank Better than existing SOTA on both frequent and rare words.

Effects of Network Depths 32 Before unk After unk +2.2 +2.0 29 +1.9 26 23 20 Depth 3 Depth 4 Depth 6 Each layer gives on average about +1 BLEU gain. More accurate models: better gains with our technique.

Perplexity vs. BLEU 27 26 BLEU 25 24 23 5.5 6 6.5 7 Perplexity Training objective: perplexity. Strong correlation: 0.5 perplexity reduction gives about +1.0 BLEU.

Sample translations An additional 2600 operations including orthopedic and cataract surgery will help clear a backlog . src 2600 op rations suppl mentaires , notamment dans le domaine de la chirurgie orthop dique et de la cataracte , aideront rattraper le retard . ref En outre , unk1 op rations suppl mentaires , dont la chirurgie unk5 et la unk6 , permettront de r sorber l' arri r . trans trans +unk En outre , 2600 op rations suppl mentaires , dont la chirurgie orthop diques et la cataracte , permettront de r sorber l' arri r . Predict well long-distance alignments.

Sample translations This trader , Richard Usher , left RBS in 2010 and is understand to have be given leave from his current position as European head of forex spot trading at JPMorgan . src Ce trader , Richard Usher , a quitt RBS en 2010 et aurait t mis suspendu de son poste de responsable europ en du trading au comptant pour les devises chez JPMorgan ref Ce unk0 , Richard unk0 , a quitt unk1 en 2010 et a compris qu' il est autoris quitter son poste actuel en tant que leader europ en du march des points de vente au unk5 . Ce n gociateur , Richard Usher , a quitt RBS en 2010 et a compris qu' il est autoris quitter son poste actuel en tant que leader europ en du march des points de vente au JPMorgan . trans trans +unk Translate well long sentences.

Sample translations But concerns have grown after Mr Mazanga was quoted as saying Renamowas abandoning the 1992 peace accord . src Mais l' inqui tude a grandi apr s que M. Mazanga a d clar que la Renamo abandonnait l' accord de paix de 1992 . ref Mais les inqui tudes se sont accrues apr s que M. unkpos3 a d clar que la unk3unk3 l' accord de paix de 1992 . trans trans +unk Mais les inqui tudes se sont accrues apr s que M. Mazanga a d clar que la Renamo tait l' accord de paix de 1992 . Incorrect alignment prediction: was tait vs. abandonnait.

Conclusion Simple technique to tackle rare words: Applicable to any NMT (+2.0 BLEU improvements). State-of-the-art result in WMT 14 English-French. Future work: More challenging language pairs. Thank you!

Addressing the Rare Word Problem in Neural Machine Translation

Download Presentation

Presentation Transcript

Related

More Related Content