Innovative Tools and Approaches in Language Annotation and Visualization at CLARIN 2020
Presentations at CLARIN 2020 Session focused on annotation and visualization tools, including a neural syntax annotator for Dutch and German, exploring and visualizing Wordnet data with GermaNet Rover, named entity recognition for distant reading in ELTeC, and the semi-automatic analysis of spontaneous language for Dutch.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
CLARIN 2020 Session on Annotation and Visualization Tools Moderators: Stelios Piperidis & Koenraad De Smedt Monday 5 October, 13:25 -14:05
Structure of the session A. Lightning talks: main goal of the research (only 1 min. per paper) B. Questions 1. Does your paper address well-identified annotation/visualization requirements of a community of researchers, and if so, how have these requirements been elicited? (2 min. per paper) 2. What new research questions can be addressed or facilitated by the resources and/or tools developed and presented in your paper? (2 min. per paper) 3. How easy has it been to reuse and adapt existing resources and tools for the purposes of your work? What are the main challenges you have faced/are facing in terms of adaptation? (2 min. per paper) C. Follow-up questions by the audience 2
sticker2: a neural syntax annotator for Dutch and German Dani l de Kok, Neele Witte and Tobias P tz sticker2 is a neural sequence labeler for Dutch and German: Uses multi-task learning to output several layers at the same time. Supports structured prediction: dependency relations & lemmas. Finetuning of pretrained transformers: BERT, XLM-RoBERTa, ALBERT Production features: Standalone binary, linked against libtorch Fast:up to 200 sentences per second on a CPU with SOTA models Supports distillationto extract smaller models Thread-safe Get sticker2 and models from: https://github.com/stickeritis/sticker2 Try sticker2 through WebLicht: https://weblicht.sfs.uni-tuebingen.de/ CLARIN 3
Exploring and Visualizing Wordnet Data with GermaNet Rover Marie Hinrichs, Richard Lawrence and Erhard Hinrichs Rover displays the GermaNet data in an interactive interface designed for researchers. Features include: advanced searching for synsets visualizingthe hypernym graph calculating synsets semantic relatedness via graph-based measures Try it: https://weblicht.sfs.uni-tuebingen.de/rover/ 4
Named Entity Recognition for Distant Reading in ELTeC Francesca Frontini, Carmen Brando, Joanna Byszuk, Ioana Galleron, Diana Santos and Ranka Stankovic Can state of the art multilingual and language specific NLP tools produce a sufficiently good and adapted annotation to cater for the needs of literary scholars? ELTeC collection: European novels (1840-1902) WG 2: Subset of ELTeC; manually annotated for NER with domain specific guidelines GOLD TEST 5
Towards Semi-Automatic Analysis of Spontaneous Language for Dutch (SASTA) Jan Odijk App partially automates spontaneous language analysis - TARSP (children 1-4), STAP (older children), ASTA (aphasia) methods - Focus on grammatical analysis - Results: See poster session. Demo: Bazaar Relation to CLARIN - Based on CLARIN application GrETEL 4 - Societal impact; may contribute to CLARIN (derivative CHAMP-NL) Deviant Language - Small experiment Next steps: - Successor project SASTA+ (deviant language) - Testing usage in clinical environments - Integration with other Apps (UU, HU, Auris) Cooperation: UU, CLARIAH, HU, VKL, Vogellanden 6
A Neural Parsing Pipeline for Icelandic Using the Berkeley Neural Parser runn Arnard ttir and Anton Karl Ingason IceNeuralParsingPipeline is a parsing pipeline for Icelandic. Parses using an Icelandic model of the Berkeley Neural Parser - Trained on the Icelandic Parsed Historical Corpus Delivers an 84.74F1 score Can parse up to 228 sentences per second Includes all steps necessary for parsing Icelandic text Available at http://hdl.handle.net/20.500.12537/17 7
Question 1: Does your paper address well-identified annotation/visualization requirements of a community of researchers, and if yes, how have these requirements been elicited? (2 min. per paper) 1. sticker2: a neural syntax annotator for Dutch and German 2. Exploring and Visualizing Wordnet Data with GermaNet Rover 3. Named Entity Recognition for Distant Reading in ELTeC 4. Towards Semi-Automatic Analysis of Spontaneous Language for Dutch 5. A Neural Parsing Pipeline for Icelandic Using the Berkeley Neural Parser 8
Question 2: What new research questions can be addressed or facilitated by the resources and/or tools developed and presented in your paper? (2 min. per paper) 1. sticker2: a neural syntax annotator for Dutch and German 2. Exploring and Visualizing Wordnet Data with GermaNet Rover 3. Named Entity Recognition for Distant Reading in ELTeC 4. Towards Semi-Automatic Analysis of Spontaneous Language for Dutch 5. A Neural Parsing Pipeline for Icelandic Using the Berkeley Neural Parser 9
Question 3: How easy has it been to reuse and adapt existing resources and tools for the purposes of your work? What are the main challenges you have faced/are facing in terms of adaptation? (2 min. per paper) 1. sticker2: a neural syntax annotator for Dutch and German 2. Exploring and Visualizing Wordnet Data with GermaNet Rover 3. Named Entity Recognition for Distant Reading in ELTeC 4. Towards Semi-Automatic Analysis of Spontaneous Language for Dutch 5. A Neural Parsing Pipeline for Icelandic Using the Berkeley Neural Parser 10