Understanding COCA: Corpus of Contemporary American English Workshop Overview

Slide Note
Embed
Share

COCA (Corpus of Contemporary American English) is a valuable resource for researchers and linguists containing a vast database of text types from various registers such as spoken, fiction, magazines, newspapers, and academic sources. This overview discusses the collection timeframe, interface, search functions, and comparison with other corpora like Wikipedia, CoHA, and BNC, providing insights into its uses and potential research applications.


Uploaded on Sep 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH Workshop Purdue University November 2015

  2. Agenda Essential background: COCA, other BYU corpora, basics of the interface Search functions: information & practice Search syntax: information & practice Results analysis Activities (Possibly: Pedagogical uses)

  3. COCA: Overview (1 & 2) The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. (COCA website) Corpus: a database of texts that you can query Text types (registers) in COCA: spoken, fiction, popular magazines, newspapers, and academic (page 2) Timeframe of COCA collection: 1990-2012

  4. COCA and other corpora (3) Wikipedia Corpus Global Web-based English (the power to compare across dialects, e.g. US/UK) Corpus of Historical American English (CoHA) ( texts from 1810- 2000) Time Magazine British National Corpus (BNC) Question: What might a researcher who is looking up of the same words and phrases in: - Wikipedia and Globwe - COCA and BNC - CoHA & COCA be looking for exactly?

  5. COCA Interface: Welcome Screen Interface consists of 3 active & independent frames

  6. COCA Interface: Results Display

  7. COCA Interface: How to search? Display: List, Chart, KWIC, Compare Search String (clicking on the word collocates turns off and on the function; the same with POS) Sections: Registers (Spoken, Fiction, Magazine, Newspaper, Academic) Time of publication Subregisters: MAG: Sci/Tech; FIC:Juvenile Click and scroll time (click on Collocates, POS List, Section Scroll)

  8. Corpus: What to search for? Cheat Sheet mysterious words nooks and crannies or faint + noun phrases all forms of words, like sing or tall lemmas un*ly or r?n* wildcards such as un-X-ed adjectives or verb + any word + a form of ground. complex searches

  9. COCA Interface: What are tags? faint + noun phrases faint [nn*] Tags can be easily checked in the POS list Add a space between the word and the tag Let s check the tags for - singular nouns - wh- adverbs (who, when, where, how)

  10. Activities time!

  11. Activity 3 FREQ: tokens Per milion: shows proportion of tokens in the corpus

  12. Activity 4.

  13. Activity 5. Collocates delimiting function. = Search any (*) noun collocates of the word laugh (in the role of a noun) 5 spaces before or after the word laugh. Crystal threw back her head and laughed, a throaty little laugh of sheer exuberance with a sort of purr in it. In a moment he LEFT node RIGHT node and laughed a throaty little of sheer exuberance with a laugh 5 4 3 2 1 0 1 2 3 4 5

  14. Activity 6. KWIC: looking at research prepositions.

  15. Pedagogical applications of corpora: Words and Phrase Analysis http://www.wordandphrase.info

  16. THANK YOU!

Related


More Related Content