Analyzing the Challenges and Potential of On-line Job Vacancy Data
Exploring the utilization of on-line job vacancy data for statistical purposes, this content covers the potential benefits, challenges, data sources, and approaches to data access and handling. With a focus on text analysis and classification, it discusses the nuances of extracting meaningful information from job postings and transforming it into valuable insights for statistical analysis. The landscape of on-line job vacancy data from various sources is examined, shedding light on the complexities involved in harnessing this data for statistical purposes.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Big Data ESSNet: Web Scraping for Job Vacancy Statistics Nigel Swier UK Office for National Statistics
Potential of On-line Job Vacancy Data Current Official Estimates (Survey) Monthly (Rolling Qtr) Online data Frequency Industry Sector Enterprise Size Job type / skills Geography National Totals Real-time? More frequent More timely More granular Less burden Cheaper???
Six challenges with using On-line Job Vacancy (OJV) data for statistical purposes 1. Not all jobs are advertised on-line. Coverage is therefore incomplete and not fully representative. 2. There is no definitive data source 3. Much OJV data is unstructured. Text processing and analysis is required to extract useful information. 4. Some job advertisements are not within the scope of official statistics definitions of a job vacancy (EU) 5. The official definition of a job advertisement does not correspond directly to the concept of a live job advertisement 6. The specific job vacancy data landscape varies between countries
On-line Job Vacancy Data Landscape Private Employment Agencies Employers Job Boards Job Search Engines National Employment Agency Enterprise Websites Data Aggregators Public Policy Cedefop Official Job Vacancy Statistics
Approaches to Data Access Direct web scraping Point and click Progammatic (e.g. Python Scrapy) Web-scraping enterprise websites Agreed Access National employment agency Private job portals Commercial providers CEDEFOP Images: Creative Commons
Data Handling: Text analysis and classification [e.g. classifying textual data with machine learning] Occupation is fairly straightforward in this case Industry is more difficult. This company is an employment agency not the employer. But there are clues . Can industry and occupation be classified from a job ad?
Data Handling: Flow to stock transformation Job Vacancy Lifecycle
Assessment against survey aggregates: by industry sector
Conclusions/Questions Agreed access arrangements are generally better than direct web scraping OJV data cannot replace the Job Vacancy Survey (in EU) OJV data does not correspond to target concepts and only measures part of the labour market. How useful are these measures? If useful, how should these measures be presented alongside the official estimates? (EU) Collaboration with CEDEFOP is essential. How do we get the best possible quality data for official statistics purposes?
Other possibilities ? Time series analysis, leading to flash estimates ? Data driven analysis: new insights into existing statistics, e.g. timing of advertisements being placed ? International/Overseas jobs as indicators of labour market tightness ? Identification of new job titles assisting development of standard statistical classifications ?
Drivers of Cedefop RLMI work Better labour market information for better policies Lack of comparable data and systematic analysis Complement skills intelligence toolkit