AI Projects at WIPO: Text Classification Innovations

Slide Note
Embed
Share

WIPO is applying artificial intelligence to enhance text classification in international patent and trademark systems. The projects involve automatic text categorization in the International Patent Classification and Nice classification for trademarks using neural networks. Challenges such as the availability of technical expertise and multi-language document classification are being addressed. WIPO's AI solutions aim to improve precision and recall in predicting classifications, with a focus on optimizing IPC coverage and accuracy across various languages.


Uploaded on Jul 22, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Active AI Projects at WIPO AI applied to text in international classifications: 1. For automatic text classification in the International Patent Classification: IPCCAT-Neural (see related presentations) 2. For Trade Marks, Nice classification :NCLCAT-Neural P 1 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  2. IPCCAT-neural: Automatic text categorization in the IPC What is it about? International Patent Classifications : IPC (and CPC) Automatic text CATegorization (in the specific context of patent documents) AI-based solution, trained to mimic legacy IPC classification of patent documents in the IPC reference database Large collections of classified patent documents for this training P 2 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  3. IPCCAT challenge: predictions among ~73,000 P 3 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  4. IPCCAT-neural Problems to be addressed: Availability of various technical expertise e.g. in small Patent Offices For one document, several IPC symbols needed with an indicative level of confidence in each guess Documents to be classified are in various languages The IPC classification of the same document done twice by human classifiers may not give twice the same result Typical usage since 2003: Automatic routing of electronic documents based on the technical content of their text e.g. of a patent abstract 4 WIPO FOR OFFICIAL USE ONLY

  5. IPCCAT-neural performance evaluation challenge Precision versus recall for IPC: Highest precision for the top IPC guess is not the best option in the domain of patents (e.g. in prior art search) Top prediction Three guesses All classes guess real mc guess real mc guess real mc 1 1 1 2 ic 2 ic 2 ic 3 ic 3 ic 3 ic P 5 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  6. IPCCAT-neural English in a nutshell Baseline of the solution: Un-supervised training for ~8,000+ neural networks, with 30 million of already classified patent documents in English (see WIPO-delta dataset) Several IPC predictions with confidence levels Retrained every year (new vocabulary, IPC revisions, patent reclassification, ) IPC Coverage and accuracy of predictionsmeasured on millions of test cases Other languages also need consideration P 6 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  7. IPCCAT-neural is now cross lingual Large collection of EN documents however input text may not always be in English Input text in XX language 30 Mo of EN Patent Documents with IPC IPCCAT EN IPC guess for text in XX language WIPO translate: XX into EN P 7 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  8. IPCCAT-neural cross-lingual 2019 performance Automatic prediction among 99% of the IPC i.e. among 73,633 categories Three guess precision: 84%! 9 supported language :Arabic, German, Spanish, French, Korean, Japanese, Portuguese, Russian, Chinese. P 8 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  9. IPCCAT-neural cross-lingual potential use Consistency of AI-based IPC classification: IPCCAT mimics the legacy usage of the IPC in DOCDB (the IPC reference patent database) IPCCAT classification of the same document done twice, gives twice the same IPCs P 9 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

  10. NCLCAT-neural project 2017 Proof of Concept: Potential use of AI for the Nice classification (NCL) AI support to NCL is promising however training sets were too small 2019: Larger training collections, Madrid Goods and Service (MGS) Prediction of NCL 11 Classes for query terms: > 99.5% accuracy Proposal of possibly related MGS terms (in progress) Languages : EN, FR, ES Outcomes of Research and development in Q4 2019 P 10 WIPO and Artificial Intelligence WIPO FOR OFFICIAL USE ONLY

Related


More Related Content