Exploring Enterprise Characteristics Through Web Data Analysis

Slide Note
Embed
Share

This presentation delves into the use of webscraping, text mining, and inference techniques to gather and improve information on enterprises from their websites. It covers the objectives, examples of use cases, pilot projects, and results from the ESSnet Big Data initiatives led by various European countries like Italy, Bulgaria, Netherlands, Poland, Sweden, and the UK. The focus is on leveraging web data for official statistics and exploring innovative methods in data processing.


Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Estimating Enterprise Characteristics from Web Data: Estimating Enterprise Characteristics from Web Data: Achievements and Future Developments Achievements and Future Developments Monica Scannapieco (Istat, Italy) Galya Stateva (BNSI, Bulgaria) Peter Struijs (CBS, Netherlands) NTTS New Techniques and Technologies for Statistics 12-14 March 2019 1

  2. Outline Background ESSnet Big Data I: Workpackage 2 Webscraping Enterprise Characteristics ESSnet Big Data II: Workpackage C Enterprise Characteristics Final Remarks 2

  3. Enterprise Websites as a source for Official Statistics Objective: to investigate whether webscraping, text mining and inference techniques can be used to collect, process and improve general information about enterprises Enterprises Websites National Business Register Business Statistics Surveys 3

  4. Example of Use Cases Use case 2: Websales - ECommerce Use case 1: URLs Inventory Use Case 3: Social Media Presence Use Case 5: Economic Activity Classification (NACE) Use Case 4: Job Advertisements 4

  5. Pilots of the ESSnet Big Data I List of pilot projects Webscraping (2 work packages) job vacancies enterprise characteristics Smart meters electricity consumption; temporary vacant dwellings Automatic Identification System (AIS) vessel identification data Mobile phone data preparing for access to data Early estimates various domains Multiple domains population, tourism / border crossing, agriculture 5

  6. WP2: Webscraping / Enterprise Characteristics WP leader: Italy Partners: Bulgaria Netherlands Poland Sweden UK 6

  7. Big Data Pilots I - WP2 Results (1) Pipeline for processing data scraped from enterprises websites defined in detail and shared among the participants 7

  8. Big Data Pilots I - WP2 Results (2) Methods: Webscraping methods (including URLs retrieval when necessary) Text representation and mining methods for processing webscraped texts Deterministic and Machine Learning methods tested for prediction at unit-level of enterprise characteristics 8

  9. Big Data Pilots I - WP2 Results (3) IT Solutions BNSI/Bulgaria URLs retrieval Scraping Deterministic analysis Istat/Italy Generalized scraping URLs retrieval ML analysis Used by Poland and Bulgaria GUS/Poland Social media scraping & analysis Used by Italy, Netherlands, Sweden, Bulgaria Developed software available at: https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WP2_Links 9

  10. Big Data Pilots I - WP2 Results (4) First Experimental Statistics https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/Category:WP 2_Experimental_statistics1 10

  11. ESSnet Big Data II: Overview of Workpackages WP WP name WP leader Country WPA Coordination and Communication Peter Struijs Marc Debusschere (deputy) NL BE WPB WPC WPD WPE WPF Online Job Vacancies Enterprise Characteristics Smart Energy Tracking Ships Process and Architecture Toma Speh Galya Stateva Arko Kesk la Anke Consten Monica Scannapieco SI BG EE NL IT WPG WPH WPI WPJ WPK Financial Transactions Data Earth Observation Mobile Networks Data Innovative Tourism Statistics Methodology and Quality Johan Fosen Marek Morze David Salgado Marek Cierpia -Wolan Alexander Kowarik NO PL ES PL AT WPL Preparing Smart Statistics Natalie Rosenski DE 11

  12. WPC: Enterprise Characteristics WP leader: Bulgaria Partners: Austria Germany Finland Ireland Italy Netherlands Poland Austria UK 12

  13. From WP2 to WPC Results of WP2 as a starting point for WPC, in particular: URLs retrieval methodology, i.e. a process and software implementations for detecting websites of enterprises based on search engines and machine learning techniques; Methodologies, processes and software implementations for detecting characteristics of enterprises such as E-commerce activities, Social media presence, Job advertisements, NACE code, etc. 13

  14. WPC Objective and Tasks From piloting to implementation Five tasks: ESS webscraping policies Methodological Framework/Guidelines Experimental Statistics, including reference metadata Starter Kit for NSIs Quality template for statistical outputs 14

  15. WPC: Updated Use Cases Use case 2: Variables on the ICT usage in enterprise survey Use case 1: URLs Inventory Use Case 3: Validation of BR and NACE classification Use Case 4: Experimental Language Statistics 15

  16. Final Remarks The work done within ESSnet Big Data I and the starting work of ESSnet Big Data 2 result in milestones in the route for using Internet as a Data Source for Official Statistics The work on using enterprise websites to support business statistics addresses the whole production pipeline from data collection to data dissemination and impacts on technical, legal and organizational levels The promising achieved results, as well as the concrete actions planned for facing implementation issues, are expected to move this pipeline towards a full-fledged statistical production in the short to medium term for several countries of the ESS 16

  17. Questions? https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata Thank you for your attention! scannapi@istat.it GStateva@NSI.bg p.struijs@cbs.nl 17

Related


More Related Content