Spoken BNC2014 Workshop: Explore Data with Experts
Dive into the world of the Spoken BNC2014 corpus with Robbie Love from Cambridge English and Andrew Hardie from Lancaster University. Discover the background, structure, and purpose of this unique project, led by a dedicated team from Lancaster University and Cambridge University Press. Learn about data collection, transcription, and how participation is encouraged in this valuable linguistic resource.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Introducing the Spoken BNC2014 Explore the data yourself! Robbie Love Cambridge English / Lancaster University Andrew Hardie Lancaster University #CL2017bham #BNC2014 http://cass.lancs.ac.uk
Workshop structure 1. Robbie Background compiling the corpus Speaker & text metadata 2. Andrew Exploring the Spoken BNC2014 with CQPweb 3. You! Practical session #CL2017bham #BNC2014 http://cass.lancs.ac.uk 2
Introducing the Spoken BNC2014 Robbie Love Cambridge English / Lancaster University r.m.love@lancaster.ac.uk Twitter: lovermob #CL2017bham #BNC2014 http://cass.lancs.ac.uk
Why build a new Spoken BNC? BNC1994 BNC2014 #CL2017bham #BNC2014 http://cass.lancs.ac.uk 4
Who built the Spoken BNC2014? Lancaster University: Robbie Love, Andrew Hardie, Vaclav Brezina, Tony McEnery Cambridge University Press: Claire Dembry Olivia Goodman, Imogen Dickens, Sarah Grieves, Laura Grimes, Samantha Owen, 20 transcribers #CL2017bham #BNC2014 http://cass.lancs.ac.uk 5
The Spoken BNC2014 Lancaster University + Cambridge University Press Lancaster & CUP Plan & design Encourage participation media campaigns Disseminate information Fund project equally CUP Claire Dembry & team Corresponds with contributors Collects & transcribes recordings Lancaster Robbie Love, Andrew Hardie, Vaclav Brezina, Tony McEnery Documents the compilation of the corpus Carries out methodological investigations Converts transcripts to XML, encoding Annotates corpus Initial analysis Prepares for public release/hosts corpus #CL2017bham #BNC2014 http://cass.lancs.ac.uk 6
Data collection #CL2017bham #BNC2014 http://cass.lancs.ac.uk 7
Transcription #CL2017bham #BNC2014 8
Processing --> CQPweb #CL2017bham #BNC2014 9
The Spoken BNC2014 SPOKEN #CL2017bham #BNC2014 10
The Spoken BNC2014 Conversational, L1 British English 2012-2016 672 speakers 1,251 texts 11,422,622 words #CL2017bham #BNC2014 11
Metadata Speaker metadata provided by speakers as part of consent form Text metadata provided by contributors #CL2017bham #BNC2014 12
Speaker metadata: age BNC1994 categorisation scheme: 0-14|15-24|25-34|35-44|45-59|60+|Unknown BNC2014 offers two schemes the original, plus: 0-10|11-18|19-29|30-39|40-49|50-59|60-69| 70-79|80-89|90-99|Unknown #CL2017bham #BNC2014 13
Speaker metadata: age (1994 scheme) 3000000 2500000 2000000 1500000 1000000 500000 0 0-14 15-24 25-34 35-44 45-59 60+ Unknown #CL2017bham #BNC2014 14
Speaker metadata: age (new scheme) 4500000 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 0-10 11-18 19-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99 Unknown #CL2017bham #BNC2014 15
Speaker metadata: dialect Purely objective metadata seems insufficient Subjective metadata offers an imperfect solution: Self-reported dialect E.g. Geordie = north east England #CL2017bham #BNC2014 16
Speaker metadata: dialect (1) Global (2) Country (3) Supra-region (4) Region UK England North North East Yorkshire & Humberside North West (not Merseyside) Merseyside Midlands East Midlands West Midlands South Eastern South West South East (not London) London Scotland Scotland Scotland Wales Wales Wales Northern Ireland Northern Ireland Northern Ireland Non-UK Republic of Ireland Republic of Ireland Republic of Ireland Other non-UK variety Other non-UK variety Other non-UK variety Unspecified Unspecified Unspecified Unspecified #CL2017bham #BNC2014 17
Speaker metadata: dialect (level 1 global) 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 uk non_uk #CL2017bham #BNC2014 18
Speaker metadata: dialect (level 2 country) 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 england scotland wales n_ireland r_ireland #CL2017bham #BNC2014 19
Speaker metadata: dialect (level 3 supra-region) 6000000 5000000 4000000 3000000 2000000 1000000 0 north midlands south #CL2017bham #BNC2014 20
Speaker metadata: dialect (level 4 region) 900000 800000 700000 600000 500000 400000 300000 200000 100000 0 northeast yorkshire northwest liverpool e_midlands w_midlands eastern_engl southwest southeast london #CL2017bham #BNC2014 21
Speaker metadata: highest qualification 5000000 4500000 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 Primary Secondary Sixth-form Graduate Postgraduate Unknown #CL2017bham #BNC2014 22
Speaker metadata: gender 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 Female Male #CL2017bham #BNC2014 23
Speaker metadata: socio- economic status Occupation --> socio-economic status BNC1994: Social Grade AB|C1|C2|DE|Unknown BNC2014: NS-SEC (--> Social Grade) 1.1|1.2|2|3|4|5|6|7|8|*|Unknown #CL2017bham #BNC2014 24
Speaker metadata: socio- economic status NS-SEC Description Social Grade Description 1 Higher managerial, administrative and professional A Higher managerial, administrative and professional occupations:8 1.1 Large employers and higher managerial and administrative occupations 1.2 Higher professional occupations MAPS ON TO 2 Lower managerial, administrative and professional B Intermediate managerial, administrative and occupations professional 3 Intermediate occupations C1 Supervisory, clerical and junior managerial, 4 Small employers and own account workers administrative and professional 5 Lower supervisory and technical occupations C2 Skilled manual workers 6 Semi-routine occupations D Semi-skilled and unskilled manual workers 7 Routine occupations 8 Never worked and long-term unemployed E State pensioners, casual and lowest grade workers, unemployed with state benefits only * Students/unclassifiable #CL2017bham #BNC2014 25
Speaker metadata: socio- economic status (2014 scheme) 3000000 2500000 2000000 1500000 1000000 500000 0 #CL2017bham #BNC2014 26
Speaker metadata: socio- economic status (1994 scheme) 4500000 4000000 3500000 3000000 2500000 2000000 1500000 1000000 500000 0 A B C1 C2 D E unknown #CL2017bham #BNC2014 27
Speaker metadata: core set 6400000 6200000 6000000 5800000 5600000 5400000 5200000 5000000 4800000 4600000 In core set Not in core set #CL2017bham #BNC2014 28
Text metadata: conventions 1200 1000 800 600 400 200 0 original revised #CL2017bham #BNC2014 29
Text metadata: sample release 800 700 600 500 400 300 200 100 0 sample release not in sample release #CL2017bham #BNC2014 30
Text metadata: no. of speakers 700 600 500 400 300 200 100 0 two three four five six seven eight nine twelve #CL2017bham #BNC2014 31
Text metadata: year of recording 500 450 400 350 300 250 200 150 100 50 0 twelve thirteen fourteen fifteen sixteen #CL2017bham #BNC2014 32
Text metadata: transcriber 400 350 300 250 200 150 100 50 0 T10 T15 T09 T02 T18 T11 T04 T19 T03 T06 T20 T13 T01 T05 T14 T07 T08 T12 T17 T16 #CL2017bham #BNC2014 33
Publications International Journal of Corpus Linguistics Spoken BNC2014 special issue, eds. McEnery, Love & Brezina Love, Dembry, Hardie, Brezina & McEnery Fuchs; Laws et al.; Hessner & Gawlitzek; Calude Routledge Advances in Corpus Linguistics book Corpus Approaches to Contemporary British Speech eds. Brezina, Love & Aijmer Wong & Kruger; Culpeper & Gillings; Aijmer; Jenset et al.; Caines et al.; Paterson #CL2017bham #BNC2014 http://cass.lancs.ac.uk 34
A date for your diary SPOKEN 25 SEPTEMBER 2017 #CL2017bham #BNC2014 http://cass.lancs.ac.uk 35
References Brezina, V., R. Love and K. Aijmer (eds.). (forthcoming). Corpus Approaches to Contemporary British Speech: Sociolinguistic Studies of the Spoken BNC2014. New York: Routledge. Hardie, A. (2012). CQPweb combining power, flexibility and usability in a corpus analysis tool. International Journal of Corpus Linguistics 17 (3): 380-409. Love, R., Dembry, C., Hardie, A., Brezina, V. and McEnery, T. (2017 fc). The Spoken BNC2014: designing and building a spoken corpus of everyday conversations. In International Journal of Corpus Linguistics, 22:3. McEnery, T., Love, R. and Brezina, V. (eds.). (2017, forthcoming). International Journal of Corpus Linguistics, 22:3, Special Issue. #CL2017bham #BNC2014 http://cass.lancs.ac.uk 36
r.m.love@lancaster.ac.uk @lovermob @BNC_2014 #CL2017bham #BNC2014 http://cass.lancs.ac.uk 37
Exploring the Spoken BNC2014 with CQPweb Andrew Hardie Lancaster University a.hardie@lancaster.ac.uk Twitter: HardieResearch #CL2017bham #BNC2014 http://cass.lancs.ac.uk
Login details 1. Go to: 2. Username: 3. Password: 4. Then go to: cqpweb.lancs.ac.uk/bnc2014spoken cqpweb.lancs.ac.uk spokbnc[1-50] cl2017 #CL2017bham #BNC2014 http://cass.lancs.ac.uk 39
Please give us your feedback! tinyurl.com/bncworkshop #CL2017bham #BNC2014 http://cass.lancs.ac.uk 40