Leveraging Data Science for Improving Survey Operations
Carla Medalia discusses challenges faced by statistical agencies such as declining response rates and limited budgets. The goal is to leverage data science methodologies and novel data sources to enhance data collection at the Census Bureau. The team uses machine learning, natural language processing, and other tools to innovate and collaborate with experts and overcome administrative hurdles. Presentation highlights offer insights for statistical agencies to innovate and improve operations.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Leveraging Data Science to Improve Survey Operations Carla Medalia Federal Committee on Statistical Methodology (FCSM) Research and Policy Conference 11/4/21 Disclaimer: Any views expressed are those of the author and not those of the U.S Census Bureau. 1
Challenge Stat agencies faced with competing challenges Declining response rates Reduced (or static) budgets Demand for more data Demand for more timely/frequent data 2
Challenge Stat agencies faced with competing challenges Declining response rates Reduced (or static) budgets Demand for more data Demand for more timely/frequent data We cannot address these competing challenges without innovating! 2
Our mission Leverage data science methodologies and novel data sources to improve the way the Census Bureau collects, produces, and disseminates data Who we are: Business Development Staff Housed within Economic Reimbursable Surveys Division at Census Bureau Statisticians, economists, data scientists and software engineers Mix of federals employees/contractors 3
Some methods we use Machine Learning Natural language processing Develop graphical user interface (gui) Web scraping PDF extraction Regular expressions 4
But its so much more than that In order to leverage novel data sources and data science methodologies, we need to: Collaborate with subject matter experts Work hand-in-hand with the SMEs, production teams, and survey sponsors Navigate policy/legal/administrative hurdles Acquire data, software, API calls Seek approval from: disclosure avoidance review board; web scraping board; software working group; policy office; acquisitions Work through IT challenges in using cloud, setting up new environments, etc. 6
Todays goal The presentations and discussion will highlight the challenges faced while innovating and offer a roadmap for statistical agencies to leverage data science to modernize their own operations. 7
Session overview Machine Learning and the Commodity Flow Survey Christian Moscardi Parsing the Code of Federal Regulations for the Commodity Flow Survey's Hazardous Materials Supplement Krista Chan and Christian Moscardi Using PDF Extraction and Web Scraping Tools to Collect Government Health Insurance Plan Information Virginia Gwengi Using Open Source Tools to Build a Custom Data Entry Application for Creating Truth Data Cecile Murray and Katie Genadek Discussion Kevin Deardorff 8
Contact information Session organizer: carla.medalia@census.gov Presenters: krista.c.chan@census.gov katie.r.genadek@census.gov martha.v.gwengi@census.gov christian.l.moscardi@census.gov cecile.m.murray@census.gov Discussant: kevin.e.deardorff@census.gov 9