Terminology Translation Accuracy in SMT vs. NMT

Terminology Translation Accuracy in SMT vs. NMT
Slide Note
Embed
Share

This research compares the quality of Google's NMT and PBMT for English-Slovene and Slovene-English domain-specific texts in Karstology, emphasizing the significance of terminology translation accuracy and the impact it has on professional translators' work efficiency. The study utilizes the Karst Corpus and Termbase, evaluating both automatic and human assessment methods for term translation accuracy, highlighting the challenges faced in maintaining consistency in terminology across industry-used MT systems.

  • Translation
  • Accuracy
  • SMT
  • NMT
  • Terminology

Uploaded on Mar 09, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. 1 Terminology translation accuracy in SMT vs. NMT pela Vintar, Dept. of Translation Studies, University of Ljubljana spela.vintar@ff.uni-lj.si , http://www.lojze.si/spela

  2. Our aims 2 Compare the quality of Google s NMT vs. PBMT for English-Slovene and Slovene-English Domain-specific texts: Karstology Corpus Special focus on terminology translation automatic evaluation using an existing termbase human evaluation by domain expert

  3. Why terminology matters 3 Professional translators spend up to 45% of their total working time researching terminology Terminology errors amount to over 70% of errors found in QA Guidelines for post-editors emphasize terminology consistency as one of the main problems of industry-used MT systems

  4. The Karst Corpus & the Karst Termbase 4 15 abstracts and 5 articles from 2 scientific journals, Acta Geographica and Acta Carstologica Slovenica, fully bilingual Total size: 25,423 English, 18,985 Slovene All texts translated twice using Google s PBMT and NMT models (via GT API) QUIKK termbase: karst landforms and processes, 81 fully populated concepts Google Translate is a general purpose MT system, so why test it on a domain- specific text? Karstology at least for English-Slovene is not as exotic as it may sound Lots of parallel data in both directions In many professional environments, on-the-fly domain adaptation is still not feasible

  5. Evaluation methods 5 Automatic overall MT evaluation document-level BLEU and NIST Automatic evaluation of term translations linguistic pre-processing matching terms & equivalents from the QUIKK termbase Human evaluation of term translations 300 random term occurrences (both systems & both directions) manual evaluation by domain-expert using three categories: Correct: The system uses the right term equivalent, regardless of grammar errors. False: The system does not use the right equivalent. Partially correct multi- word term was considered wrong. Omitted: Original term skipped in translation.

  6. English-Slovene Slovene-English PBMT NMT PBMT NMT 6 Automatic evaluation BLEU NIST BLEU NIST BLEU NIST BLEU NIST AC1 26.26 4.56 30.72 4.78 26.1 4.73 31.12 5.03 AC2 7.86 2.36 10.85 2.58 16.95 3.77 15.66 3.70 AC3 16 2.53 15.04 2.48 14.23 3.15 19.77 3.54 AC4 24.84 4.03 34.47 4.69 26.99 4.41 27.65 4.38 AC5 4.55 1.56 6.79 1.51 6.37 1.72 8.83 2.04 AC6 18.3 3.13 20.35 2.93 28.87 3.97 34.1 4.28 English-Slovene Slovene-English AC7 36.26 4.92 43.41 5.09 38.14 5.40 40.93 5.24 PBMT NMT PBMT NMT AC8 17.76 3.29 22.57 3.77 24.23 4.02 24.13 4.00 AC9 15.06 3.43 31.81 4.30 21.85 4.21 35.75 5.06 BLEU NIST BLEU NIST BLEU NIST BLEU NIST AC10 15.01 3.52 18.14 4.12 23.19 4.34 23.32 4.28 18.50 3.59 22.49 3.85 22.53 4.24 25.43 4.35 AC11 19.6 3.65 22.54 3.78 26.12 4.25 25.97 4.46 AC12 11.76 2.45 11.05 2.19 17.49 3.11 17.63 3.10 AC13 8.04 2.09 11.94 2.47 16.09 3.33 11.4 3.15 AC14 21.41 3.87 29.3 4.28 27.11 4.71 35.92 4.79 AC15 20.96 3.45 24.08 3.85 22.93 4.16 27.25 4.39 AGS1 25.77 5.08 23.89 4.91 22.6 5.28 23.24 5.28 AGS2 21.69 4.47 21.3 4.54 21.71 4.87 24.98 4.99 AGS3 22.02 5.24 28.42 5.92 23.11 4.78 28.11 4.53 AGS4 13.49 3.41 17.21 3.78 19 4.85 23.47 5.08 AGS5 23.28 4.75 25.97 5.13 27.55 5.76 29.38 5.76 Average 18.50 3.59 22.49 3.85 22.53 4.24 25.43 4.35 St. dev. 7.24 1.02 8.85 1.13 6.41 0.90 7.97 0.88

  7. Terms and equivalents matching the termbase 7 For each source term found in the original we check whether the translation contains the equivalent Normalisation on both sides English-Slovene PBMT Slovene-English PBMT NMT NMT Terms in original Terms in translation 538 538 680 680 420 431 476 446

  8. Human evaluation of term translations 8 500 random occurences for each system and language pair were checked by a domain expert Categories: Correct (even if the case and number were wrong) False (even if one part of a multi-word term was correct, or if the system used the correct expression but not for the domain) Omitted English-Slovene PBMT 184 113 3 Slovene-English PBMT 201 94 5 % NMT % 70.3 28.3 1.3 % NMT % Correct False Omitted 61.3 37.7 211 85 67 195 99 65 33 2 31.3 1.7 1 4 6

  9. A glance at errors 9 En-Sl PBMT: untranslated term / term component epigenic aquifer epigenic vodonosnik solution runnel raztopina runnel wrong sense spring vzmet Mlava Spring Mlava pomlad NMT: out-of-the-blue translations cave diving jalovo potapljanje coined words ajerno, nekarska, glacijacija

  10. A glance at errors 10 Sl-En PBMT: untranslated term / term component nepaleokra ke kamnine nepaleokra ke rocks grammatical but non-terminological translation brezstropa jama roofless cave (denuded cave) udornica hollow / precipice / collapsed / sinkhole (collapse doline) NMT: out-of-the-blue translations vrta a crop rotation (sinkhole) zakraselost naivety (karstification) melioracija reclamation (melioration) unsuccessful attempts at proper names inconsistencies: udornica collapse / udder / cliff / collision / burrow / groove

  11. Conclusions 11 Measured with BLEU/NIST, Google s NMT outperforms PBMT for En-Sl and Sl-En Translations of domain-specific terminology are not significantly improved in NMT On-the-fly domain adaptation may not be available in many end- user environments Need for post-processing methods

More Related Content