Overview of Capital IQ Transcripts: Data Collection and Coverage

Slide Note
Embed
Share

The Capital IQ Transcripts package by Wharton Research Data Services provides historical conference call transcripts for approximately 8,000 public companies worldwide, covering various call types including earnings calls, shareholder/analyst calls, M&A calls, and more. The data collection process involves recording calls, automatic transcribing, editing, and auditing to ensure accuracy. Only calls in English are included due to collection methods. The package structure includes metadata, transcript details, key development events, speaker details, and full text components. Conference call types breakdown shows earnings calls dominating at 76%.


Uploaded on Jul 17, 2024 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. WHARTON RESEARCH DATASERVICES Capital IQ Transcripts Eunji Oh April, 2020

  2. Agenda Overview of Capital IQ Transcripts Database 1 Coverage and Data Sources 2 Data Collection Methodology 3 Sample Data 2 Wharton Research Data Services

  3. Overview Capital IQ Transcripts package provides historical conference call transcripts around the world covering approximately 8,000 public companies. It covers not only earnings conference calls but other types of conference calls such as shareholder/analysts calls, M&A calls, or company conference presentations. 3 Wharton Research Data Services

  4. Overview The calls spoken in English are covered. Earliest conference call event was in the year 2002, but there are less than 5 transcripts in the database before the year 2004. Transcripts before the year 2008 were backfilled several years later from the event date, while current data is relatively real-time. 4 Wharton Research Data Services

  5. Overview - Data Source How is the data collected? 99% of calls are recorded by S&P Two individuals are assigned for post-call processing Automatic transcribing and then several stages of editing/auditing They keep all the copies of a transcript in the Xpress feed database Only conference calls in English are included due to this issue. They also get copies from some of their vendors Accuracy compared to original audio files Preliminary = 88.46%, Edited = 97.93%, Proofed = 98.02% 5 Wharton Research Data Services

  6. Overview Package Structure Detail The CIQ Transcript Package on WRDS: Metadata of conference calls in connection with CIQ Key Developments/Events Transcript details version, delay reason, audio length, creation time Associated key development event, event type, announced date, event date, headline Associated company id Speaker details (name, company, type) Full text components Component-level full text 6 Wharton Research Data Services

  7. Conference Call Type Breakdown Analyst/Investor Day 2% M&A Calls 1% Special Calls 2% Shareholder/Analyst Calls 3% Sales/Trading Statement Calls 1% Company Conference Presentations 13% Earnings Calls 76% 7 Wharton Research Data Services

  8. Transcripts Breakdown by country and year * Count only latest copies of transcripts 8 Wharton Research Data Services

  9. Companies Covered 18.5 % of transcripts belongs to Fortune 500 companies Number of unique CIQ companyid: 13,819 (100%) Number of companyIds with gvkey: 11,616 (84.1%) Number of companyIds with North American gvkey: 8,238 (59.6%) 41 % of Compustat NA unique gvkey (2004 2019) 9 Wharton Research Data Services

  10. Data Tables ciqTranscript ciqTranscriptCollectionType ciqTranscriptDelayReason wrds_transcript_detail ciqTranscriptDelayReasonType ciqKeyDevelopment ciqTranscriptPresentationType wrds_transcript_person ciqTranscriptPerson ciqTranscriptSpeakerType ciqTranscriptComponentType Full text search / Json files / PostgreSQL ciqTranscriptComponent 10 Wharton Research Data Services

  11. wrds_transcript_detail.sas7bdat Transcript-level metadata + Some useful Key Development metadata Company ID, Most Important Date UTC, Event Headline , Key Development Type ID, Key Development Type Name are from Capital IQ Key Developments package 11 Wharton Research Data Services

  12. wrds_transcript_person.sas7bdat Transcript component-level metadata + WRDS custom variables Component Text Preview: 200-character preview of the full-text data Word count: The number of words of the full-text data 12

  13. Full-text sample (search tool) 13 Wharton Research Data Services

  14. Summary Overview of Capital IQ Transcripts Database - Coverage and Data Sources - Data Collection Methodology - Sample Data 14 Wharton Research Data Services

Related


More Related Content