Latest Developments in GrETEL: An Overview of CLARIN, DARIAH, and CLARIAH Projects

Slide Note
Embed
Share

GrETEL, a linguistic research tool, showcases the latest advancements in the field of humanities research, particularly within the CLARIN, DARIAH, and CLARIAH projects. It offers functionalities for linguistic research, treebank searching, and user-generated corpus analysis. The tool continues to evolve with new features like Xpath queries, improved interface functionality, and support for parsing and analyzing user-uploaded corpora.


Uploaded on Sep 23, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The Latest and the Greatest of GrETEL 4 Jan Odijk CLARIN Bazar 2018-10-09 1

  2. Overview CLARIN, DARIAH,CLARIAH GrETEL 1,2,3 GrETEL 4 Developers: Martijn van der Klis, Sheean Spoel, Gerson Foks (DH Lab) Illustration 2

  3. CLARIN, DARIAH, CLARIAH CLARIN: European research infrastructure for humanities researchers who work with language resources DARIAH: European research infrastructure for humanities researchers NL: has a national project that contributes to both: CLARIAH(-CORE), next year: CLARIAH- PLUS 3

  4. CLARIN, DARIAH, CLARIAH Linguistic Research: Identify a problem, study literature, study grammars, form and test hypotheses, look for relevant data sets, create new datasets, enrich data with annotations, search in and through datasets, analyze data and visualize analysis results, design and carry out experiments, design and do simulations, ., publish data, publish research results CLARIN/CLARIAH can assist with the bold-faced items Here: 1 small example 4

  5. GrETEL 1,2,3 GrETEL: KU Leuven Cooperation CLARIN-NL and CLARIN Flanders GrETEL 2,3: extensions, improvements in other Flemish projects Application for searching in a treebank Treebank = text corpus in which each sentence has been assigned a syntactic structure Syntactic structure is usually a tree Core feature: example based querying 5

  6. GrETEL 1,2,3 Treebanks (Dutch only): LASSY-Small (1 m tokens, written language) CGN (1 m tokens, spoken language) (V3) SoNaR Treebank (>500 m tokens) V1: http://nederbooms.ccl.kuleuven.be/eng/gretel/ V2: http://gretel.ccl.kuleuven.be/gretel-2.0/ V3: http://gretel.ccl.kuleuven.be/gretel3/index.php 6

  7. GrETEL 4 GrETEL 4: UU Utrecht In CLARIAH and UU-internal AnnCor project New functionality: Upload a user s own corpus incl. metadata (Dutch only) Search in the user s own automatically parsed corpus (parsebank) Analysis of search results combined with metadata Better support for Xpath Queries Improved interface functionality V4 (alpha!) http://gretel.hum.uu.nl/gretel4/ 7

  8. Illustration Upload Corpus Plain text or CHILDES CHAT TEI and FoLIA to follow CHAT Utterances are cleaned and metadata uploaded: knor knor [!= pigsound], ik heb honger knor knor, ik heb honger 8

  9. Corpus Upload 9

  10. Corpus Overview 10

  11. Corpus Details 11

  12. Query Example Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen Laura Corpus Example sentence: Hij zal dat willendoen 12

  13. Example Sentence 13

  14. Parse Tree 14

  15. Select Parts 15

  16. Query Tree 16

  17. Select Treebank 17

  18. Query //node[@cat and node[@pt="ww" and @rel="hd"] and node[@cat="inf" and @rel="vc" and node[@rel="hd" and @pt="ww"] and node[@rel="vc" and @cat="inf" and node[@pt="ww" and @rel="hd"]]]] 18

  19. Example: Query Output 19

  20. Utterance Details 20

  21. Result Statistics 21

  22. Analysis 22

  23. Some Results 3 verbs: 325 hits found 313 by adults, 12 by target child (LAU) 4 by child do not occur among adults 8 others are not in most frequent of adults Child examples as of month 43 (3;7) 2 verbs: 6,645 in total, 1,363 uttered by target child (LAU) as of month 23 (1;11). 23

  24. Concluding remarks GrETEL is a very user-friendly search engine Enables searching for constructions Enables search for disambiguated words Utrecht extensions Enable searching in your own research corpus Enable detailed analysis of search results + metadata 24

  25. Concluding remarks User-friendliness Also implies limitations! Automatic parsing Is not flawless Requires additional checks before conclusions can be reliably drawn Try it out! http://gretel.hum.uu.nl/gretel4/index.php Even if it is still under development 25

  26. Other Languages? Version of GrETEL for another language? requires an automatic parser for that language A version exists for Afrikaans A multilingual version exists: Poly-GrETEL Dutch, English in parallel treebanks 26

  27. Other Languages? The CLARIN infrastructure offers: PMLTQ > 320 treebanks, > 70 lgs T ndra >55 treebanks, > 45 lgs INESS > 470 treebanks, > 70 lgs All offer search applications, analysis tools, annotation tools, etc. 27

  28. What about you? Follow a GrETEL course ? (0.5 day) Do your thesis research using GrETEL or similar CLARIN tools? Want to know more about other CLARIN tools? Contact me j.odijk@uu.nl 28

  29. Thanks for your attention 29

  30. More information http://portal.clarin.nl, http://www.clariah.nl Recorded lecture on GrETEL: http://lecturenet.uu.nl/Site1/Catalog/Full/c9f887bc45154af5bd7cdb218216816621 Educational Package: http://dev.clarin.nl/sites/default/files/EducationalModule-v4b.pdf Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F. 2017. GrETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) 2017. CLARIN in the Low Countries, Pp. 269 280. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.22 License: CC-BY 4.0 Odijk, J., van der Klis, M., and Spoel, S. (2018). Extensions to the GrETEL treebank query application. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT16) pp 46-55, Prague. http://aclweb.org/anthology/W/W17/W17-7608.pdf Odijk & Van Hessen (eds.) 2017. CLARIN in the Low Countries. London: Ubiquity Press. (Open Access). DOI: http://dx.doi.org/10.5334/bbi 30

Related