Latest Developments in GrETEL: An Overview of CLARIN, DARIAH, and CLARIAH Projects

 
The Latest and the Greatest
of GrETEL 4
 
Jan Odijk
CLARIN Bazar
2018-10-09
 
1
 
Overview
 
CLARIN, DARIAH,CLARIAH
GrETEL 1,2,3
GrETEL 4
Developers: Martijn van der Klis, Sheean Spoel,
Gerson Foks (DH Lab)
Illustration
 
2
 
CLARIN, DARIAH,
CLARIAH
 
CLARIN: European research infrastructure for
humanities researchers who work with
language resources
DARIAH: European research infrastructure for
humanities researchers
NL: has a national project that contributes to
both: CLARIAH(-CORE), next year: CLARIAH-
PLUS
 
3
 
CLARIN, DARIAH,
CLARIAH
 
Linguistic Research:
Identify a problem, study literature, study grammars,
form and test hypotheses, 
look for relevant data sets
,
create new datasets
, 
enrich data with annotations
,
search in and through datasets
, 
analyze data and
visualize analysis results
, design and carry out
experiments, design and do simulations, …., 
publish
data
, publish research results
CLARIN/CLARIAH can assist with the bold-faced
items
Here: 1 small example
 
4
 
GrETEL 1,2,3
 
GrETEL: KU Leuven
Cooperation CLARIN-NL and CLARIN Flanders
GrETEL 2,3: extensions, improvements in
other Flemish projects
Application for searching in a treebank
Treebank = text corpus in which each sentence
has been assigned a syntactic structure
Syntactic structure is usually a tree
Core feature: example based querying
 
5
 
GrETEL 1,2,3
 
Treebanks (Dutch only):
LASSY-Small (1 m tokens, written language)
CGN (1 m tokens, spoken language)
(V3) SoNaR Treebank (>500 m tokens)
V1: 
http://nederbooms.ccl.kuleuven.be/eng/gretel/
V2: 
http://gretel.ccl.kuleuven.be/gretel-2.0/
V3: 
http://gretel.ccl.kuleuven.be/gretel3/index.php
 
 
6
 
GrETEL 4
 
GrETEL 4: UU Utrecht
In CLARIAH and UU-internal AnnCor project
New functionality:
Upload a user’s own corpus incl. metadata (Dutch
only)
Search in the user’s own automatically parsed corpus
(parsebank)
Analysis of search results combined with metadata
Better support for Xpath Queries
Improved interface functionality
V4 (alpha!) 
http://gretel.hum.uu.nl/gretel4/
 
7
 
Illustration
 
Upload Corpus
Plain text or CHILDES CHAT
TEI and FoLIA to follow
CHAT Utterances are cleaned and metadata
uploaded:
knor knor [!= pigsound], ik heb honger 
knor knor, ik heb honger
 
8
 
Corpus Upload
 
9
 
Corpus Overview
 
10
 
Corpus Details
 
11
 
Query Example
 
12
 
Constructions with 3 bare verbs in the Dutch
CHILDES Van Kampen Laura Corpus
Example sentence:
Hij 
zal
 dat 
willen
 
doen
 
Example Sentence
 
13
 
Parse Tree
 
14
 
Select Parts
 
15
 
Query Tree
 
16
 
Select Treebank
 
17
 
Query
 
18
 
//node[@cat and
 
node[@pt="ww" and @rel="hd"] and
 
node[@cat="inf" and @rel="vc" and
  
node[@rel="hd" and @pt="ww"] and
   
node[@rel="vc" and @cat="inf" and
    
node[@pt="ww" and @rel="hd"]]]]
 
Example: Query Output
 
19
 
Utterance Details
 
20
 
Result Statistics
 
21
 
Analysis
 
22
 
Some Results
 
23
 
3 verbs:
325 hits found
313 by adults, 12 by target child (LAU)
4 by child do not occur among adults
8 others are not in most frequent of adults
Child examples as of month 43 (3;7)
2 verbs:
6,645 in total, 1,363 uttered by target child (LAU)
as of month 23 (1;11).
 
Concluding remarks
 
24
 
GrETEL is a very user-friendly search engine
Enables searching for constructions
Enables search for disambiguated words
Utrecht extensions
Enable searching in your own research corpus
Enable detailed analysis of search results +
metadata
 
Concluding remarks
 
25
 
User-friendliness
Also implies limitations!
Automatic parsing
Is not flawless
Requires additional checks before conclusions can
be reliably drawn
 
Try it out! 
http://gretel.hum.uu.nl/gretel4/index.php
Even if it is still under development
 
Other Languages?
 
26
 
Version of GrETEL for another language?
requires an automatic parser for that language
A version exists for 
Afrikaans
A multilingual version exists: 
Poly-GrETEL
Dutch, English in parallel treebanks
 
Other Languages?
 
27
 
The CLARIN infrastructure offers:
PMLTQ
 > 320 treebanks, > 70 lgs
Tündra
 >55 treebanks, > 45 lgs
INESS
 > 470 treebanks, > 70 lgs
All offer search applications, analysis tools,
annotation tools, etc.
 
 
What about you?
 
28
 
Follow a GrETEL course ? (0.5 day)
Do your thesis research using GrETEL or
similar CLARIN tools?
Want to know more about other CLARIN
tools?
Contact me 
j.odijk@uu.nl
 
 
 
 
 
 
Thanks for your attention
 
29
 
More information
 
http://portal.clarin.nl
, 
http://www.clariah.nl
Recorded lecture on GrETEL:
http://lecturenet.uu.nl/Site1/Catalog/Full/c9f887bc45154af5bd7cdb218216816621
Educational Package: 
http://dev.clarin.nl/sites/default/files/EducationalModule-v4b.pdf
Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F. 2017. GrETEL: A Tool for
Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) 2017. CLARIN in the
Low Countries, Pp. 269–280. London: Ubiquity Press. DOI: 
https://doi.org/10.5334/bbi.22
License: CC-BY 4.0
Odijk, J., van der Klis, M., and Spoel, S. (2018). Extensions 
to the GrETEL treebank query
application. 
Proceedings of the 16th International Workshop on Treebanks and Linguistic
Theories (TLT16) pp 46-55
, Prague. 
http://aclweb.org/anthology/W/W17/W17-7608.pdf
Odijk & Van Hessen (eds.)  2017. 
CLARIN in the Low Countries
. London: Ubiquity Press. (Open
Access). DOI: 
http://dx.doi.org/10.5334/bbi
 
 
 
30
Slide Note
Embed
Share

GrETEL, a linguistic research tool, showcases the latest advancements in the field of humanities research, particularly within the CLARIN, DARIAH, and CLARIAH projects. It offers functionalities for linguistic research, treebank searching, and user-generated corpus analysis. The tool continues to evolve with new features like Xpath queries, improved interface functionality, and support for parsing and analyzing user-uploaded corpora.

  • GrETEL
  • Linguistic Research
  • CLARIN
  • DARIAH
  • Humanities Research

Uploaded on Sep 23, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. The Latest and the Greatest of GrETEL 4 Jan Odijk CLARIN Bazar 2018-10-09 1

  2. Overview CLARIN, DARIAH,CLARIAH GrETEL 1,2,3 GrETEL 4 Developers: Martijn van der Klis, Sheean Spoel, Gerson Foks (DH Lab) Illustration 2

  3. CLARIN, DARIAH, CLARIAH CLARIN: European research infrastructure for humanities researchers who work with language resources DARIAH: European research infrastructure for humanities researchers NL: has a national project that contributes to both: CLARIAH(-CORE), next year: CLARIAH- PLUS 3

  4. CLARIN, DARIAH, CLARIAH Linguistic Research: Identify a problem, study literature, study grammars, form and test hypotheses, look for relevant data sets, create new datasets, enrich data with annotations, search in and through datasets, analyze data and visualize analysis results, design and carry out experiments, design and do simulations, ., publish data, publish research results CLARIN/CLARIAH can assist with the bold-faced items Here: 1 small example 4

  5. GrETEL 1,2,3 GrETEL: KU Leuven Cooperation CLARIN-NL and CLARIN Flanders GrETEL 2,3: extensions, improvements in other Flemish projects Application for searching in a treebank Treebank = text corpus in which each sentence has been assigned a syntactic structure Syntactic structure is usually a tree Core feature: example based querying 5

  6. GrETEL 1,2,3 Treebanks (Dutch only): LASSY-Small (1 m tokens, written language) CGN (1 m tokens, spoken language) (V3) SoNaR Treebank (>500 m tokens) V1: http://nederbooms.ccl.kuleuven.be/eng/gretel/ V2: http://gretel.ccl.kuleuven.be/gretel-2.0/ V3: http://gretel.ccl.kuleuven.be/gretel3/index.php 6

  7. GrETEL 4 GrETEL 4: UU Utrecht In CLARIAH and UU-internal AnnCor project New functionality: Upload a user s own corpus incl. metadata (Dutch only) Search in the user s own automatically parsed corpus (parsebank) Analysis of search results combined with metadata Better support for Xpath Queries Improved interface functionality V4 (alpha!) http://gretel.hum.uu.nl/gretel4/ 7

  8. Illustration Upload Corpus Plain text or CHILDES CHAT TEI and FoLIA to follow CHAT Utterances are cleaned and metadata uploaded: knor knor [!= pigsound], ik heb honger knor knor, ik heb honger 8

  9. Corpus Upload 9

  10. Corpus Overview 10

  11. Corpus Details 11

  12. Query Example Constructions with 3 bare verbs in the Dutch CHILDES Van Kampen Laura Corpus Example sentence: Hij zal dat willendoen 12

  13. Example Sentence 13

  14. Parse Tree 14

  15. Select Parts 15

  16. Query Tree 16

  17. Select Treebank 17

  18. Query //node[@cat and node[@pt="ww" and @rel="hd"] and node[@cat="inf" and @rel="vc" and node[@rel="hd" and @pt="ww"] and node[@rel="vc" and @cat="inf" and node[@pt="ww" and @rel="hd"]]]] 18

  19. Example: Query Output 19

  20. Utterance Details 20

  21. Result Statistics 21

  22. Analysis 22

  23. Some Results 3 verbs: 325 hits found 313 by adults, 12 by target child (LAU) 4 by child do not occur among adults 8 others are not in most frequent of adults Child examples as of month 43 (3;7) 2 verbs: 6,645 in total, 1,363 uttered by target child (LAU) as of month 23 (1;11). 23

  24. Concluding remarks GrETEL is a very user-friendly search engine Enables searching for constructions Enables search for disambiguated words Utrecht extensions Enable searching in your own research corpus Enable detailed analysis of search results + metadata 24

  25. Concluding remarks User-friendliness Also implies limitations! Automatic parsing Is not flawless Requires additional checks before conclusions can be reliably drawn Try it out! http://gretel.hum.uu.nl/gretel4/index.php Even if it is still under development 25

  26. Other Languages? Version of GrETEL for another language? requires an automatic parser for that language A version exists for Afrikaans A multilingual version exists: Poly-GrETEL Dutch, English in parallel treebanks 26

  27. Other Languages? The CLARIN infrastructure offers: PMLTQ > 320 treebanks, > 70 lgs T ndra >55 treebanks, > 45 lgs INESS > 470 treebanks, > 70 lgs All offer search applications, analysis tools, annotation tools, etc. 27

  28. What about you? Follow a GrETEL course ? (0.5 day) Do your thesis research using GrETEL or similar CLARIN tools? Want to know more about other CLARIN tools? Contact me j.odijk@uu.nl 28

  29. Thanks for your attention 29

  30. More information http://portal.clarin.nl, http://www.clariah.nl Recorded lecture on GrETEL: http://lecturenet.uu.nl/Site1/Catalog/Full/c9f887bc45154af5bd7cdb218216816621 Educational Package: http://dev.clarin.nl/sites/default/files/EducationalModule-v4b.pdf Augustinus, L, Vandeghinste, V, Schuurman, I and Van Eynde, F. 2017. GrETEL: A Tool for Example-Based Treebank Mining. In: Odijk, J and van Hessen, A. (eds.) 2017. CLARIN in the Low Countries, Pp. 269 280. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.22 License: CC-BY 4.0 Odijk, J., van der Klis, M., and Spoel, S. (2018). Extensions to the GrETEL treebank query application. Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT16) pp 46-55, Prague. http://aclweb.org/anthology/W/W17/W17-7608.pdf Odijk & Van Hessen (eds.) 2017. CLARIN in the Low Countries. London: Ubiquity Press. (Open Access). DOI: http://dx.doi.org/10.5334/bbi 30

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#