CLIR: Cross-Lingual Information Retrieval in PatentScope System

Slide Note
Embed
Share

"Explore the CLIR technology in the PatentScope search system, unveiling its latest developments, benefits, and future additions. Discover how CLIR finds synonyms, translates into 12 languages, and facilitates patent retrieval across different linguistic barriers. Join the webinar to grasp the significance of CLIR in the cyberworld and its implications for patent research. Uncover the power of CLIR in unlocking national patent collections and enhancing cross-lingual information access, paving the way for innovation and collaboration."


Uploaded on Nov 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Presentation Title Presentation Subtitle and/or Conference Name Place Day Month Year First Name Last Name Job Title

  2. CLIR PATENTSCOPE search system Cyberworld April 2015 Sandrine Ammann Marketing & Communications Officer

  3. To the PATENTSCOPE search system webinar CLIR

  4. Agenda Latest developments CLIR What is CLIR? How to use it? Why is it useful? How was it developed? What is next? Quiz Q & A session

  5. Latest developements

  6. New: https

  7. National patent collections be added in the future UK DK AU NZ

  8. CLIR CLIR C Cross-L Lingual I Information Retrieval

  9. What is it? 1. Finds synonyms: container receptacles/ reservoir/tank 2. Translates into 11 languages emballage conteneurs Verpackung contenants container Transportbeh lter behaallare Beh ltnisses viravattenbeh llare pappersmaskins envase recipienti contentor toevoertank contenedor serbatoio recept culo watervat tanque riserva embalagem opslagtank

  10. CLIR 12 languages available NON-ASIAN ASIAN Dutch English French German Italian Portuguese Russian Spanish Swedish Chinese Japanese Korean

  11. How to use it?

  12. Interface

  13. Query language Define the language of the query:

  14. Expansion mode 2 modes: Automatic = 1 step Supervised = 4 steps

  15. CLIR: precision vs recall

  16. CLIR: precision vs recall Precision = the ability to retrieve the most precise results. Trying to find only precisely relevant items (high precision) = miss important items because they don't use quite the same vocabulary. Recall = the ability to retrieve as many documents as possible that match or are related to a query. Trying to find all the relevant items (high recall) = often get a lot of junk.

  17. Example: precision

  18. Results for precision

  19. Example: recall

  20. Results for recall

  21. Examples Source:https://www.kickstarter.com/projects/igreenpod/biodegradable-coffee-pod-from-portland-oregon

  22. Automatic mode

  23. Result list

  24. Supervised mode

  25. Step 1: technical field selection

  26. Step 2: synonym selection

  27. Step 3: translated term selection

  28. Relevance checking

  29. Fields

  30. Acceptable distance

  31. Stemming

  32. Stemming Use of the root form of a word displayed Display displays displaying

  33. IPC checking

  34. Why is CLIR useful? A) Search full text collections simultaneously in many foreign languages B) Improve significantly the number of relevant results without increasing significantly the number of irrelevant results C) Have confidence in your searches: No black box: users have access to the CLIR generated Boolean queries (albeit complex) and have the full control on them D) Have a responsive system even for complex queries

  35. How to make the most of out CLIR? Expansion modes Keyword very specific with only 1 meaning AUTOMATIC For any other queries, SUPERVISED is recommended Variants/synonyms Select words that you would like to appear in your search results If you have too much noise in the result list, remove generic variant

  36. How to make the most of out CLIR? Parameters 1. Title and abstract: unconstrained distance 2. Claims: sentence/paragraph distance 3. Description: sentence/paragraph distance Stemming recommended

  37. How was it developed? Compilation of a long list of titles in language pairs Creation of in-house extraction methodology Tool learns statistical bilingual dictionaries of titles

  38. Quality of dictionaries Quality of dictionaries: no human intervention The more title available, the better the coverage Chinese English French German Japanese Korean Portuguese Russian Spanish Dutch Italian Swedish

  39. Disambiguation Disambiguation: process of identifying the sense of a word in a sentence. http://en.wikipedia.org/wiki/Disambiguation_%28disambiguation%29 Disambiguation is applied to keywords: 1. Technical domains based on the IPC 2. Synonyms selection

  40. What is next? Improve terminology coverage of Korean, Chinese and Japanese Add Polish and Danish

  41. Q:1: About latest developments A Some fee-based search features B Secure https protocol B

Related


More Related Content