COCA: Corpus of Contemporary American English Workshop Overview

undefined
 
BYU 
COCA:
CORPUS OF
CONTEMPORARY
AMERICAN ENGLISH
 
Workshop
Purdue University
November 2015
 
Agenda
 
Essential background: COCA, other B
YU 
corpora, basics of the interface
Search functions: information & practice
Search 
s
yntax: information & practice
Results analysis
Activities
(Possibly: Pedagogical uses)
 
COCA: Overview (1 & 2)
 
“The Corpus of Contemporary American English (COCA) 
is the 
largest
freely-available corpus of English, and the only 
large and balanced 
corpus
of American English. “ (COCA website)
 
Corpus
: a database of texts that you can query
 
Text types (registers) in COCA: 
spoken, fiction, popular magazines,
newspapers, and academic (page 2)
 
Timeframe of COCA collection: 
1990-2012
 
 
COCA and other corpora (3)
 
Wikipedia Corpus
Global Web-based English 
(the power to compare across dialects, e.g.
US/UK)
Corpus of Historical American English  (CoHA) 
( te
xts from 
1810-
2000)
Time Magazine
British National Corpus (BNC)
Question: W
hat might 
 
a researcher who is looking up of the
same words and phrases in:
-
Wikipedia 
and
 Globwe
-
COCA and BNC
-
CoHA & COCA
 
      
be looking
for exactly?
 
COCA Interface: Welcome Screen
 
 
Interface consists of 3 active & independent 
frames
 
COCA Interface: Results Display
 
COCA Interface: 
 How to search?
Display
:  List, Chart, KWIC, Compare
Search String 
(clicking on the word
“collocates” turns off and on the
function; the same with POS)
Sections
:
Registers  (Spoken, Fiction,
Magazine, Newspaper, Academic)
Time of publication
Subregisters:
MAG: Sci/Tech; FIC:Juvenile
 
Click and scroll time (click on 
Collocates,
POS List, Section Scroll)
 
 
 Corpus: What to search for?
 
 
 
 
 
C
h
e
a
t
S
h
e
e
t
 
COCA Interface: 
What are tags?
 
 
 
 
 
Tags can be
easily checked
in the POS list
Add a space
between the
word and the
tag
 
 
Let’s check the tags for
-
singular nouns
-
wh- adverbs (who, when, where, how)
 
Activities time!
 
Activity 3
 
FREQ: tokens
 
Per milion: shows proportion of tokens in the corpus
 
 
 
Activity 4.
 
 
Activity 5. Collocates delimiting function.
 
C
r
y
s
t
a
l
 
t
h
r
e
w
 
b
a
c
k
 
h
e
r
 
h
e
a
d
 
a
n
d
 
l
a
u
g
h
e
d
,
 
a
 
t
h
r
o
a
t
y
 
l
i
t
t
l
e
 
l
a
u
g
h
 
o
f
 
s
h
e
e
r
 
e
x
u
b
e
r
a
n
c
e
 
w
i
t
h
 
a
 
s
o
r
t
 
o
f
 
p
u
r
r
 
i
n
i
t
.
 
I
n
 
a
 
m
o
m
e
n
t
 
h
e
 
LEFT node
 
RIGHT node
 
= Search any (*) noun collocates
of the word laugh (in the role of a
noun) 5 spaces before or after
the word laugh.
 
Activity 6.  KWIC: looking at 
research 
prepositions.
 
P
e
d
a
g
o
g
i
c
a
l
 
a
p
p
l
i
c
a
t
i
o
n
s
 
o
f
 
c
o
r
p
o
r
a
:
W
o
r
d
s
 
a
n
d
 
P
h
r
a
s
e
 
A
n
a
l
y
s
i
s
 
h
t
t
p
:
/
/
w
w
w
.
w
o
r
d
a
n
d
p
h
r
a
s
e
.
i
n
f
o
 
THANK YOU!
Slide Note
Embed
Share

COCA (Corpus of Contemporary American English) is a valuable resource for researchers and linguists containing a vast database of text types from various registers such as spoken, fiction, magazines, newspapers, and academic sources. This overview discusses the collection timeframe, interface, search functions, and comparison with other corpora like Wikipedia, CoHA, and BNC, providing insights into its uses and potential research applications.

  • COCA
  • Corpus of Contemporary American English
  • Linguistics
  • Research
  • Text Analysis

Uploaded on Sep 19, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH Workshop Purdue University November 2015

  2. Agenda Essential background: COCA, other BYU corpora, basics of the interface Search functions: information & practice Search syntax: information & practice Results analysis Activities (Possibly: Pedagogical uses)

  3. COCA: Overview (1 & 2) The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. (COCA website) Corpus: a database of texts that you can query Text types (registers) in COCA: spoken, fiction, popular magazines, newspapers, and academic (page 2) Timeframe of COCA collection: 1990-2012

  4. COCA and other corpora (3) Wikipedia Corpus Global Web-based English (the power to compare across dialects, e.g. US/UK) Corpus of Historical American English (CoHA) ( texts from 1810- 2000) Time Magazine British National Corpus (BNC) Question: What might a researcher who is looking up of the same words and phrases in: - Wikipedia and Globwe - COCA and BNC - CoHA & COCA be looking for exactly?

  5. COCA Interface: Welcome Screen Interface consists of 3 active & independent frames

  6. COCA Interface: Results Display

  7. COCA Interface: How to search? Display: List, Chart, KWIC, Compare Search String (clicking on the word collocates turns off and on the function; the same with POS) Sections: Registers (Spoken, Fiction, Magazine, Newspaper, Academic) Time of publication Subregisters: MAG: Sci/Tech; FIC:Juvenile Click and scroll time (click on Collocates, POS List, Section Scroll)

  8. Corpus: What to search for? Cheat Sheet mysterious words nooks and crannies or faint + noun phrases all forms of words, like sing or tall lemmas un*ly or r?n* wildcards such as un-X-ed adjectives or verb + any word + a form of ground. complex searches

  9. COCA Interface: What are tags? faint + noun phrases faint [nn*] Tags can be easily checked in the POS list Add a space between the word and the tag Let s check the tags for - singular nouns - wh- adverbs (who, when, where, how)

  10. Activities time!

  11. Activity 3 FREQ: tokens Per milion: shows proportion of tokens in the corpus

  12. Activity 4.

  13. Activity 5. Collocates delimiting function. = Search any (*) noun collocates of the word laugh (in the role of a noun) 5 spaces before or after the word laugh. Crystal threw back her head and laughed, a throaty little laugh of sheer exuberance with a sort of purr in it. In a moment he LEFT node RIGHT node and laughed a throaty little of sheer exuberance with a laugh 5 4 3 2 1 0 1 2 3 4 5

  14. Activity 6. KWIC: looking at research prepositions.

  15. Pedagogical applications of corpora: Words and Phrase Analysis http://www.wordandphrase.info

  16. THANK YOU!

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#