Data Pipeline and Workflow for OOI Systems Engineer/Architect
Steve Gaul and Ed Chapman, Chief Systems Engineer and Systems Engineer/Architect of OOI, aim to address areas related to data pipeline, workflow, sensing, ingest, data versioning, schema storage, and product delivery by 9/17/2024. The project also involves deploying platforms, instruments, algorithms, and data products, with a focus on data processing, collection methods, and system hardware. The detailed analysis encompasses various data classes, acquisition paths, and integration components across marine assets, cabled and uncabled instruments, and control interfaces for engineering and science users.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Data Pipeline & Workflow Ed Chapman Steve Gaul OOI Chief Systems Engineer OOI Systems Engineer/Architect 9/17/2024 1
Goal Address Areas for Recommendations #1 Data Pipeline and workflow and #5 Data Products and Data Product Algorithms Specific topics: Sensing, ingest (all data types instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery 9/17/2024 2
By the Numbers Number of Deployed Platforms= 89*** Number of Deployed Instruments= 814*** Number of Instrument types= 47 Number of Instrument Models= 77 Number of Uncabled dataset agent drivers= 71 Number of Cabled instrument agent drivers= 42 Number of Algorithms= 89 (52 L1 and 37 L2) Number of Data Product Types= 203 Number of Unique products= 3928 (L0, L1, L2) Number of Unique L0 products= 1640 Number of Unique L1 products= 1533 Number of Unique L2 products= 755 9/17/2024 ***As of ECR 1300-00419 3 3
Sense and Ingest Data Several classes of data Instrument samples/profiles Platform engineering Specialized data streams; video, tier 1 Physical samples Logs, photos Metadata; calibration sheets, as-built lists Several data acquisition paths Live streaming data to shore processing (RSN) Remote automated collection; data telemetered to shore (CG/EA) Generally sub-sampled or otherwise simplified Post recovery data collection from recovered platforms Physical samples processed post cruise/recovery Manual collection of logs, photos, etc.; associated via metadata 9/17/2024 4 4
Physical Flow Sensing and Ingest 9/17/2024 5
System Hardware Summary CG Platform Data Dataset Agents Marine Integration Components Instrument Agents Network OMS Platform Agent Uncabled Assets (CGSN & EA) Cabled Assets (RSN & EA) CI / COTS GUIs (status, control, Resource Life Cycle Management, etc.) CI / COTS GUIs (status, control, Resource Life Cycle Management, etc.) Instrument Data/ Metadata/ Command and Control Engineering Data/ System Status Data / Metadata Public, EPE, and Science Users Control / Status / Data / Metadata Control / Status Observation Management System (OMS) Shore Station User/ Operator Data Server Control / Status OMC User/ Operator Control / Status Engineering Data/ Infrastructure Command and Control Display/ Output to User Search Raw Data (L0/1) Control / Status Recovered Instruments Network Switch COTS Platform Shore Servers (COTS Glider, AUV, CSPP, GSPP, WFP) CG Mooring Platform Shore Servers Link Non-OOI System Interfaces Data Process and Q/A Processed Data (L1/2) MetaData Recovered Memory Devices Cable Ashore Capture/ Manage MetaData Data Product Spec Ashore User Defined Deployed Deployed Cabled Infrastructure COTS RSN/EA CGSN COTS Platforms (AUV, Glider, Profiler, etc.) CI CGSN Platforms Instruments Profilers MIO CI I/F 9/17/2024 6 6
Generic Data Flows Network CI GUI (status, control, Resource Life Cycle Management, etc.) Data/ Metadata COTS Tool GUI (Jira, etc.) Control/ Metadata Dataset Agent Alert Data Server Control/ Metadata Control/ Data OMC Users Alert Control/ Data Control Raw Data Store (L0/1?) Platform Shore Servers (CGSN Devices Moorings, etc.)) Platform Shore Servers (COTS Devices Gliders, AUVs, etc.) Dump/ Display Search Live Data Recovered Data Commands (to platform) Commands (copy to CI) Metadata Link Data Process and Q/A Processed Data Store (L1/2) Recovered Memory Devices MetaData Store Ashore Create/ Manage MetaData Data Process Spec Deployed User Defined COTS CG CI COTS Platforms (AUV, Glider, Profiler, etc.) CGSN Platforms (No CI Presence) CG CI I/F 9/17/2024 7 7
Functional Flow Sensing and Ingest 9/17/2024 8
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop
Permanent storage Instrument Driver and Agent User Ingest for Instruments Including Tier1 and HD video 9/17/2024 10
Permanent storage Engineering System Driver and Agent User Ingest for Engineering data 9/17/2024 11
Ingest for other items Cruise documents Algorithms 9/17/2024 12
Ingest flow for Calibration 9/17/2024 13
L0 L0 Permanent storage Instrument Driver and Agent Raw Permanent storage User 9/17/2024 14
L1a Permanent storage Instrument Driver and Agent Calibration Values User 9/17/2024 15
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values User 9/17/2024 16
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm User L1b(Post Deployment) 9/17/2024 17
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm L1b(PD) User L1b(PR) Secondary Post-Recovery calibration values POLYVAL Algorithm L1b(Intrp) Interpolation 9/17/2024 18
When? Primary Calibration is applied whenever someone or something asks for something that requires it. Secondary Calibration is applied whenever someone or something asks for something that requires it L1 and L2 products are produced on demand. 9/17/2024 19
Calibration actions Someone (PS or Marine Operator) creates Primary Calibration Values Someone (PS or Marine Operator) creates Secondary Post-Deployment Calibration Values Someone (PS or Marine Operator) creates Secondary Post-Recovery Calibration Values Values are uploaded through the UI as csv files (exact format, content, and UI dialog are TBD) Calibration Values are associated with a specific instrument for a specific period of time 9/17/2024 20
Calibration Updates If new values are uploaded for any of the three, the new values overwrite the prior values. Assumption is we will only upload new values if there was a mistake with the old ones. We don t want to allow errors to propagate so we delete the old values 9/17/2024 21
Ingest for etc Anything you want to know about? 9/17/2024 22
Versioning 9/17/2024 23
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop
Storage 9/17/2024 25
Data Volume Per Year 9/17/2024 26 26
Data Product Delivery 9/17/2024 27
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop
Database L0 L0 L1 Data Product Algorithm L2 Data Product Algorithm Primary Calibration Function L1a L2b Secondary Calibration Functions L1b QC QC Algorithms Algorithms Human In The Loop Human In The Loop L1a L1b and QC flags L1c L0 L2c QC flags L2b GUI User
Output Data Product Variables Single L1 data product, with the following variables (i.e., columns in the time series): <measurement>_L1a (e.g., Conductivity_L1a) <measurement>_L1b_Post_Deployment_Cal <measurement>_L1b_Post_Recovery_Cal <measurement>_L1b_Interpolated <measurement>_L1c QC_Flag_GlobalRange QC_Flag_LocalRange <additional QC flags> Single L2 data product, similar to above Single Parsed (Combined) product per instrument, with all variables for applicable L1 and L2 products, additional time stamps, and maybe other stuff. 9/17/2024 30
Output Data Product Metadata In the metadata (i.e., Metadata link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): Calibration coefficients (as a comma separated list) QC Look Up Table (as a url, or possibly as values in a TBD format) Data Product Algorithm (as a url) DPS for Data Product Algorithm (as a url) QC Algorithms (as urls) DPS s for QC Algorithms (as urls) POLYVAL Algorithm (as a url) 9/17/2024 31
Questions? Specific topics: Sensing, ingest (all data types instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery 9/17/2024 32