Latest Developments in GrETEL: An Overview of CLARIN, DARIAH, and CLARIAH Projects
GrETEL, a linguistic research tool, showcases the latest advancements in the field of humanities research, particularly within the CLARIN, DARIAH, and CLARIAH projects. It offers functionalities for linguistic research, treebank searching, and user-generated corpus analysis. The tool continues to evolve with new features like Xpath queries, improved interface functionality, and support for parsing and analyzing user-uploaded corpora.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
The Latest and the Greatest of GrETEL 4 Jan Odijk CLARIN Bazar 2018-10-09 1
Overview CLARIN, DARIAH,CLARIAH GrETEL 1,2,3 GrETEL 4 Developers: Martijn van der Klis, Sheean Spoel, Gerson Foks (DH Lab) Illustration 2
CLARIN, DARIAH, CLARIAH CLARIN: European research infrastructure for humanities researchers who work with language resources DARIAH: European research infrastructure for humanities researchers NL: has a national project that contributes to both: CLARIAH(-CORE), next year: CLARIAH- PLUS 3
CLARIN, DARIAH, CLARIAH Linguistic Research: Identify a problem, study literature, study grammars, form and test hypotheses, look for relevant data sets, create new datasets, enrich data with annotations, search in and through datasets, analyze data and visualize analysis results, design and carry out experiments, design and do simulations, ., publish data, publish research results CLARIN/CLARIAH can assist with the bold-faced items Here: 1 small example 4
GrETEL 1,2,3 GrETEL: KU Leuven Cooperation CLARIN-NL and CLARIN Flanders GrETEL 2,3: extensions, improvements in other Flemish projects Application for searching in a treebank Treebank = text corpus in which each sentence has been assigned a syntactic structure Syntactic structure is usually a tree Core feature: example based querying 5
GrETEL 1,2,3 Treebanks (Dutch only): LASSY-Small (1 m tokens, written language) CGN (1 m tokens, spoken language) (V3) SoNaR Treebank (>500 m tokens) V1: http://nederbooms.ccl.kuleuven.be/eng/gretel/ V2: http://gretel.ccl.kuleuven.be/gretel-2.0/ V3: http://gretel.ccl.kuleuven.be/gretel3/index.php 6
GrETEL 4 GrETEL 4: UU Utrecht In CLARIAH and UU-internal AnnCor project New functionality: Upload a user s own corpus incl. metadata (Dutch only) Search in the user s own automatically parsed corpus (parsebank) Analysis of search results combined with metadata Better support for Xpath Queries Improved interface functionality V4 (alpha!) http://gretel.hum.uu.nl/gretel4/ 7
Illustration Upload Corpus Plain text or CHILDES CHAT TEI and FoLIA to follow CHAT Utterances are cleaned and metadata uploaded: knor knor [!= pigsound], ik heb honger knor knor, ik heb honger 8
Query Example Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen Laura Corpus Example sentence: Hij zal dat willendoen 12
Parse Tree 14
Select Parts 15
Query Tree 16
Query //node[@cat and node[@pt="ww" and @rel="hd"] and node[@cat="inf" and @rel="vc" and node[@rel="hd" and @pt="ww"] and node[@rel="vc" and @cat="inf" and node[@pt="ww" and @rel="hd"]]]] 18
Analysis 22
Some Results 3 verbs: 325 hits found 313 by adults, 12 by target child (LAU) 4 by child do not occur among adults 8 others are not in most frequent of adults Child examples as of month 43 (3;7) 2 verbs: 6,645 in total, 1,363 uttered by target child (LAU) as of month 23 (1;11). 23
Concluding remarks GrETEL is a very user-friendly search engine Enables searching for constructions Enables search for disambiguated words Utrecht extensions Enable searching in your own research corpus Enable detailed analysis of search results + metadata 24
Concluding remarks User-friendliness Also implies limitations! Automatic parsing Is not flawless Requires additional checks before conclusions can be reliably drawn Try it out! http://gretel.hum.uu.nl/gretel4/index.php Even if it is still under development 25
Other Languages? Version of GrETEL for another language? requires an automatic parser for that language A version exists for Afrikaans A multilingual version exists: Poly-GrETEL Dutch, English in parallel treebanks 26
Other Languages? The CLARIN infrastructure offers: PMLTQ > 320 treebanks, > 70 lgs T ndra >55 treebanks, > 45 lgs INESS > 470 treebanks, > 70 lgs All offer search applications, analysis tools, annotation tools, etc. 27
What about you? Follow a GrETEL course ? (0.5 day) Do your thesis research using GrETEL or similar CLARIN tools? Want to know more about other CLARIN tools? Contact me j.odijk@uu.nl 28
More information http://portal.clarin.nl, http://www.clariah.nl Recorded lecture on GrETEL: http://lecturenet.uu.nl/Site1/Catalog/Full/c9f887bc45154af5bd7cdb218216816621 Educational Package: http://dev.clarin.nl/sites/default/files/EducationalModule-v4b.pdf Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F. 2017. GrETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) 2017. CLARIN in the Low Countries, Pp. 269 280. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.22 License: CC-BY 4.0 Odijk, J., van der Klis, M., and Spoel, S. (2018). Extensions to the GrETEL treebank query application. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT16) pp 46-55, Prague. http://aclweb.org/anthology/W/W17/W17-7608.pdf Odijk & Van Hessen (eds.) 2017. CLARIN in the Low Countries. London: Ubiquity Press. (Open Access). DOI: http://dx.doi.org/10.5334/bbi 30