Advances in Digital Humanities: CLARIN2020 Sessions Overview

Slide Note

Presentations at CLARIN2020 highlighted enhancements to research tools, reproducible annotation services, and the transition to more generalized repository systems. Discussions encompassed the optimization of Wittgenstein research tools, reproducibility in WebLicht workflows, and the implementation of the FLAT repository in the digital humanities domain. Common themes included the importance of deterministic tool generation and the sharing of open data across platforms.

ale_ri Follow

Uploaded on Sep 29, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

CLARIN2020 Session on Repositories and Workflows Moderators: Jan Haji Martin Matthiesen Tuesday 6 October, 12.45 -13.25

Alois Pichler: CLARINO+ Optimization of Wittgenstein Research Tools Electronic Wittgenstein Archives at U Bergen are in need of an upgrade. Three viewers: Source, Interactive Dynamic Presentation and Semantic Faceted Search and Browsing Three search engines: Source, SFB, WITTFind (LMU) TEI XML encoding already permits complex views. More filtering/facets are desirable with implies manual annotation and code changes in IDP and SFB WITTFind now enhances IDP, integration into SFB planned. CLARIN 2

Danil de Kok, Neele Falk: Reproducible annotation services for WebLicht Workflow managers like Weblicht offer weak reproducability Only the latest tools are offered, no versioning Docker+Weblicht would offer more stability but still too many moving parts Docker+Nix makes container generation deterministic. Conclusion: - Deterministic installations and version controlled tool handling would vastly improve reproducability. - Suggestion for further research: Weblicht + Version Control + Docker + Nix CLARIN 3

Paul Trilsbeek: Using the FLAT repository: Two years in FLAT (Fedora/Islandora) replaced TLA s own LAT platform Users pleased with ease of use and ingest. Admins appreciate modular approach Issues that required modifications - Slow display of objects consisting of many other objects - Cubersome access management Update to newer Fedora/Islandora version planned Conclusion: - Move from self-developed repository system to more generalized one is possible and does not harm user satisfaction. - For best results, modifications are needed CLARIN 4

Remarks Search in Munich and display in Bergen Reuse of existing tools/frameworks (FLAT, IDP, SFB) Open data (WAB, WITTFind) Deterministic tool (and by proxy: output) generation - Relevant in all 3 papers CLARIN 5

Javier de la Rosa et al.: PoetryLab as Infrastructure for the Analysis of Spanish Poetry PoetryLab: environment for enrichment / annotation of Spanish poetry focus on ontologies / Linked Open Data Open Source software, open access UI + REST API to backend Backend implemented as docker images, easy maintenance Automatic analysis part of the backend / workflow scansion and rhyme detection enjambment detection historical NER 6

Maarten Janssen: Integrating TEITOK and Kontext at LINDAT Corpus search tools: different objectives, different tools Kontext : search tool for language studies Manatee (database) backend, universal ( table format ) TEITOK : search (end editing/annotation/enrichment) tool with visualization Corpus WorkBench backend, (semi)direct TEI/XML use Combining TEITOK and Kontext Visualisation of linked-to resources (facsimile pages, audio, ) Separate backends and UIs, but interlinked at a fine granularity level Service (REST) request allowed to provide a context in Kontext Ingestion of files with in a range of formats (ELAN, FoLiA, PagesXML, ) 7

Bart Jongejan: The CLARIN-DK Text Tonsortium Text Tonsortium (TT): Workflow Management System Remake of an older system Main features Automatic computation of the workflow based on static User description Standoff annotation (as an option) Hub-based, two-way communication with engaged tools/services Workflow(s) suggestion - interactive help with workflow selection Five dimensions of data description: Language, format, type, ambiguity, historical period To be again integrated into CLARIN-DK (cst.dk) 8

Questions to Alois, Danil, Neele and Paul Alois, Paul, do you see a need in T bingen s approach (stricter versioning of dependencies) for the development of your respective services? Dani l, Neele, do you see obstacles in reproducibility more on the technical side or on the side of it being a desirable goal? Alois, Paul, do you support versioning and if you don t how important do you consider this feature? CLARIN 9

Questions to Javier et al., Maarten, Bart Javier, looking at poetry from the multilingual perspective, what is inherently Spanish in your system / workflow and how easily can be adapted to other languages? Maarten, Corpus WorkBench probably cannot handle gigawords in the really large corpora; do you see it as a problem, or can some of the TEITOK functionality be integrated to Kontext for such big data? E.g. by reference? Bart, in TT, how difficult is it to add third-party tools to your actual system? 10

Questions to all presenters in the Workflows session Tools in workflows are sometimes missing just a bit to be interoperable, but without it, the workflow cannot work. Any idea how to handle this? Suggesting convertors (from outside)? To create them? I.e., do you think it is possible to automatically suggest filling the gaps ? How do you envision to support reproducibility in an ever changing world of NLP and DH services? How to handle versioning, persistency, user-generated content? 11

Backup questions 3 & 4 12

Questions to all presenters in the Workflows session How important is allowing users to do annotation (in the general sense) in or as part of workflows? How important is to allow users to process their own data by workflows, to even train tools using their data? What problems arise and why this is not such a common feature today? 13

Advances in Digital Humanities: CLARIN2020 Sessions Overview

Download Presentation

Presentation Transcript

Related

More Related Content