Progress Towards a Quality-Controlled Historical Weather Data Platform
Comprehensive development progress towards a quality-controlled historical weather data platform led by Dr. Grace Di Cecco, focusing on the challenges of accessing and aggregating weather observations from various sources and the project's goals of cleaning, aggregating, and standardizing hourly weather data across WECC. The approach involves finding publicly available weather station networks, standardizing QA/QC methods, and providing easy accessibility to a continuously updated database through cloud-based storage.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Development Progress towards a Comprehensive Quality Controlled Historical Weather Data Platform Grace Di Cecco, PhD 9/14/2023 PIR-19-006
Open weather observations database Problem statement Hourly weather station observations are key inputs for many energy sector applications Organizations tend to use different sources with QA/QC methods that are not necessarily comparable Accessing publicly available data requires navigating and aggregating observations from many different data sources with differing standards
Open weather observations database Weather station observations vary in quality ASOS: Airport, regularly maintained RAWS: automated, often at treelines for fire weather monitoring CWOP: Citizen weather observers; oftentimes near homes
Open weather observations database Project goals Clean, aggregate, quality control, and standardize hourly weather observations across WECC for key variables Provide easy access to continuously updating database of observations through AWS Integrate where users access climate projections data through the Cal-Adapt enterprise
Open weather observations database Our approach Conduct extensive search to find publicly available weather station networks across WECC Programmatically pull and clean observations into a standard format Develop standardized QA/QC methodology to remove suspect data Provide open access to cleaned database of observations through cloud-based storage
Open weather observations database Variables in our database Station information Weather variables Latitude Air pressure (ps) Longitude Air temperature (tas) Elevation Dew-point temperature (tdps) Time of observation Precipitation (pr) Relative humidity (hurs) Wind speed at 10 m (sfcWind) Wind direction at 10 m (sfcWind_dir) Solar radiation (rsds)
Open weather observations database Data sources over time Data coverage: January 1, 1980 to December 31, 2022 (UTC)
Open weather observations database Spatial coverage of dataset Includes observations from buoys within WECC marine boundaries
Open weather observations database Data flow from source to finished product 2) Clean data 3) QA/QC Part 2 Raw data Finished data product! 4) Standardize to hourly 1) Pull data 3) QA/QC Part 1
Open weather observations database Data flow from source to finished product Status: complete! Data pulled by network through FTP/API; drop unneeded variables; save in our bucket 2) Clean data 3) QA/QC Part 2 Raw data Finished data product! 4) Standardize to hourly 1) Pull data 3) QA/QC Part 1
Open weather observations database Data flow from source to finished product 2) Clean data 3) QA/QC Part 2 Raw data Finished data product! 4) Standardize to hourly 1) Pull data 3) QA/QC Part 1 Status: complete! Standardize variable names and units; convert missing data to NaNs
Open weather observations database Data flow from source to finished product Status: Methodology & functions complete! Whole station checks - remove stations missing key metadata 2) Clean data 3) QA/QC Part 2 Raw data Finished data product! 4) Standardize to hourly 1) Pull data 3) QA/QC Part 1 Status: Methodology complete! Function development in progress Check and flag individual variables to identify suspect observations
Open weather observations database Data flow from source to finished product Status: Methodology in progress Derive any missing variables; deduplicate stations; aggregate to hourly observation 2) Clean data 3) QA/QC Part 2 Raw data Finished data product! 4) Standardize to hourly 1) Pull data 3) QA/QC Part 1
Open weather observations database QA/QC methodology QA/QC part one: Whole station checks QA/QC part two: Data flags (done for each variable unless otherwise stated) Sensor height within tolerance (wind) Cross variable logic: if wind speed 0, then wind dir 0 Missing lat-lon coordinates Monthly distribution: unusual gap between bins Cross variable logic: dewpoint < air temperature Elevation missing (fill w/ DEM) or out of range Sensor height within tolerance (air temp) Precip logic checks: accum in shorter window < accum in longer Precip logic checks: not negative Monthly distribution: flag extremes (tails) Values outside North American world records Station outside of WECC Timeseries: unusually large jumps Timeseries: unusual repeated value sequence Monthly distribution: unusually frequent values All station missing values to NA
Open weather observations database QA/QC methodology QA/QC part one: Whole station checks QA/QC part two: Data flags (done for each variable unless otherwise stated) Sensor height within tolerance (wind) Cross variable logic: if wind speed 0, then wind dir 0 Missing lat-lon coordinates Monthly distribution: unusual gap between bins Cross variable logic: dewpoint < air temperature Elevation missing (fill w/ DEM) or out of range Sensor height within tolerance (air temp) Precip logic checks: accum in shorter window < accum in longer Precip logic checks: not negative Monthly distribution: flag extremes (tails) Values outside North American world records In progres s Station outside of WECC Timeseries: unusually large jumps Timeseries: unusual repeated value sequence Monthly distribution: unusually frequent values All station missing values to NA In progres s
Open weather observations database Suspect observations examples Figure from Pierce & Cayan at CEC Workshop: Hourly Temperature Data on Cal-Adapt Dec. 2019
Open weather observations database Implications for the energy sector Extreme outliers may be sensor malfunctions or real meteorological events Conduct case studies of specific extreme weather events in California to target and tune our procedure for stakeholders NASA VIIRS January 3, 2023
Open weather observations database Automation and updating frequency Our data pipeline will pull, clean, QC, and standardize new observations automatically High accuracy, low false positive rate Need feedback from stakeholders on what frequency of updates they would like to see Quarterly? Monthly? Sub-monthly?
Open weather observations database How will users access the data? Observations are stored in NetCDF format in an AWS S3 bucket Organized by network and station Plan to integrate with the Analytics Engine and provide a Jupyter notebook (executable code and guidance text) to allow users to filter observations by variable, space, and time to produce tabular data objects
Open weather observations database Integration into the Cal-Adapt Enterprise Weather PIR-19-006 PIR-19-007 HEAVY USER GENERAL USER Historical Projections EPC-20-006 CAL-ADAPT DATABASE (2.5 PB+) Visualize & Download DATA CATALOGUE Direct Grab CLIMATE HISTORICAL YOUR CODE JUPYTER HUB FUTURE OUR CODE
Open weather observations database Questions? Feedback? Contact us: owen@eaglerockanalytics.com grace.dicecco@eaglerockanalytics.com