Tools for Language Documentation - Overview and Principles
Explore the tools and methodologies for documenting languages, including physical and conceptual tools, procedural techniques, and dissemination strategies. Understand the goals of language documentation and the principles involved in preserving linguistic data. Delve into the process of language documentation as a subfield of Documentary Linguistics, covering data collection, preservation, and analysis for theoretical and empirical linguistics.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Week 1: Overview Tools for Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013
Tools for documentation Physical tools: Hardware Software Stimuli Conceptual tools: What makes a good documentary corpus Procedural tools: How to go about documenting a language Tools for disseminating results
Overview Week 1: overview, hardware, software Week 2: elicitation techniques, grammar writing Week 3: narratives, conversation, corpus building Week 4: lexicon, archiving
About the class How to describe/document a language *No practical component* (in that we won t be working with speakers) However, there will be time (I hope!) to talk about your own field data And we will be doing some exercises with existing data I will provide datasets for exercises (if you don t have data of your own to use) You can also use data from the field methods class here at the Institute.
A few assumptions for this class Not talking about community-oriented materials here (I see documentary materials as feeding into that though) Assuming that the language doesn t have a lot of other materials apart from what the linguist will be producing Assuming that the linguist will be the one doing most of the writing. Implicitly assuming a grammar/dictionary/texts model (more on this below). None of these assumptions are crucial, they re just there so we can limit the topic a bit.
What is language documentation? Documentary Linguistics as its own subfield. Doing things with linguistic data: Getting the data Preserving it Processing it (Analyzing it) Cf Woodbury (2002): Language documentation is the creation, annotation, preservation, and dissemination of transparent records of a language. Important for both theoretical and empirical branches of linguistics: typology, historical linguistics, etc
What shapes the language record? The linguist (i.e. you!) Their interests Their abilities The speakers and their interests! External circumstances funding time available lucky breaks unlucky breaks
Language Documentation as a Language Legacy Particularly relevant for endangered languages. Your work might be the only substantive record of a language: few speakers field might view the language as done speakers might view the language as done
Planned Documentation vs Collect it all making a record of the language : comprehensive grammar You can t collect everything. All documentation is sampling. Unstructured, unanalyzed corpora usually aren t very useful They are hard to use; They don t get worked on; They usually aren t big enough to test hypotheses computationally; They require native speakers (or people who are already very familiar with the language) -> fine for languages with a major presence, but what about the quarter of the world s languages with fewer than 10,000 speakers?
What counts as documentation? When is a collection big enough to count as language documentation? Is an article in Linguistic Inquiry language documentation? creation annotation preservation dissemination but only a very small fragment of a language.
How much time/space does a documentary corpus take? Depends on the resources: Time Speakers Money Levels of Interest
Grammar, Dictionary, Texts The Boasian Trilogy Structure, Lexicon, Culture Way to present the analysis and also allow others to recreate it (or challenge it) from the underlying data. Conceived broadly: Capture language structure Capture language in use Capture lexicon and meaning
Sampling: Documentation as snapshots A big part of documentation is constructing a good set of samples . To do that, you will need to consider what the purpose of the documentary record is. That is, why are you collecting data on the language? to make a lasting record of the language to reclaim the language to future speakers to write a reference grammar to document the culture in the traditional language to investigate a particular aspect of the language all of the above
Sampling Are your snapshots representative? Speakers Subjects/Topics Grammatical constructions Lexicon
Planned versus opportunistic collection Planned: translated sentences. grammaticality judgments etc. Unplanned (or planning gone wrong): Speakers reinterpret your prompts and construct narratives from them. New speaker comes to a session and wants to tell stories. You find a new (to you) morpheme in your data and want to find out how it works. You overhear a new construction in conversation.
What constitutes a documentary corpus? ***Everything*** sound files videos transcripts (elicitation prompts part of the annotation) photographs maps (artifacts) metadata (data about the data) metametadata
Workflow: 1. What do you need to do to document a language? 2. What order do you need to do it in? 3. (How will you know if it s been done right?)
Scaled workflow Project as a whole (timescale of years) e.g. Bardi language documentation Immediate tasks (timescale of weeks or months) e.g. Bardi learners guide Subtasks (timescale of days or weeks) e.g. write the section on numbers Data gathering (timescale of single session) e.g. get data on numerals in use
Sample field kit: Equipment: Laptop Audio recorder Video recorder + microphones + backup means of recording (e.g. from laptop, second recorder) Media: backup devices [hard drive, DVDs, etc] memory cards for recorders paper! pens! Other ways of keeping the equipment clean carry bag stills camera (cell phone, ipad, etc) batteries, other power equipment tripod Stimuli/research prompts
Audio The field has converged on solid state recorders using SD cards Handy Zoom H2 or H4 (or H6 coming soon!) Edirol R-09 Marantz PMD 660 or 670 And/or laptops (or laptop plus external sound card/preprocessor) small/portable AA batteries high quality, lossless formats easy to use easy to transfer data
Not recommended: Dictaphones Cassette recorders DAT
Video Less consensus on models Major component of the documentation or side-project? Options: smart phone ipad stills camera with video function dedicated video camera SD card mic jack Problems: mpeg vs other proprietary video formats large files memory-intensive
Microphones headset vs lapel vs meeting microphone dynamic vs cardioid wired vs wireless SLR vs 1/8 jack The built-in mics in the Edirol, Handy, etc, are also ok You get what you pay for, approximately. Remember that microphone placement and volume monitoring is much more important than the quality of the microphone (far more recordings are ruined through the former than the latter).
Computer Laptop Lots of memory Lots of hard drive space Usually don t need ruggedization features Get cheapest possible and assume it won t last for more than a season, or try for a higher end model Special considerations for high altitude, high humidity, or low temperature work. High altitude: hard drives fail: use solid state High humidity: condensation issues Low temperatures: battery issues (See Lanz 2010)
Tablets? Most language software won t run on ipads or other tablets. Great for stimuli, backup recorder, camera, etc. Too much data
Sample field kit: Equipment: Laptop Audio recorder Video recorder + microphones + backup means of recording (e.g. from laptop, second recorder) Media: backup devices [hard drive, DVDs, etc] memory cards for recorders paper! pens! Other ways of keeping the equipment clean carry bag stills camera (cell phone, ipad, etc) batteries, other power equipment tripod Stimuli/research prompts