Data Pipeline and Workflow Overview for OOI Systems Engineering
This comprehensive overview delves into the data pipeline, workflow, and system architecture for OOI Systems Engineering, led by Ed Chapman and Steve Gaul. Key topics include sensing, data ingest, data versioning, data schema/storage, and data product delivery. The document provides insights into the deployed platforms, instruments, algorithms, and unique data products. It explores data acquisition paths, live streaming, remote automated collection, and post-recovery processing. The hardware summary covers various components, including marine integration, instrument agents, and network assets. The content is rich with visual representations and detailed descriptions, offering a deep dive into the intricate workings of the OOI Systems Engineering processes.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Data Pipeline & Workflow Ed Chapman Steve Gaul OOI Chief Systems Engineer OOI Systems Engineer/Architect 12/14/2024 1 1
Goal Address Areas for Recommendations #1 Data Pipeline and workflow and #5 Data Products and Data Product Algorithms Specific topics: Sensing, ingest (all data types instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery 12/14/2024 2 2
By the Numbers Number of Deployed Platforms= 89*** Number of Deployed Instruments= 814*** Number of Instrument types= 47 Number of Instrument Models= 77 Number of Uncabled dataset agent drivers= 71 Number of Cabled instrument agent drivers= 42 Number of Algorithms= 89 (52 L1 and 37 L2) Number of Data Product Types= 203 Number of Unique products= 3928 (L0, L1, L2) Number of Unique L0 products= 1640 Number of Unique L1 products= 1533 Number of Unique L2 products= 755 12/14/2024 ***As of ECR 1300-00419 3 3 3
Sense and Ingest Data Several classes of data Instrument samples/profiles Platform engineering Specialized data streams; video, tier 1 Physical samples Logs, photos Metadata; calibration sheets, as-built lists Several data acquisition paths Live streaming data to shore processing (RSN) Remote automated collection; data telemetered to shore (CG/EA) Generally sub-sampled or otherwise simplified Post recovery data collection from recovered platforms Physical samples processed post cruise/recovery Manual collection of logs, photos, etc.; associated via metadata 12/14/2024 4 4
Physical Flow Sensing and Ingest 12/14/2024 5 5
12/14/2024 6 6
System Hardware Summary CG Platform Data Dataset Agents Marine Integration Components Instrument Agents Network OMS Platform Agent Uncabled Assets (CGSN & EA) Cabled Assets (RSN & EA) CI / COTS GUIs (status, control, Resource Life Cycle Management, etc.) CI / COTS GUIs (status, control, Resource Life Cycle Management, etc.) Instrument Data/ Metadata/ Command and Control Engineering Data/ System Status Data / Metadata Public, EPE, and Science Users Control / Status / Data / Metadata Control / Status Observation Management System (OMS) Shore Station User/ Operator Data Server Control / Status OMC User/ Operator Control / Status Engineering Data/ Infrastructure Command and Control Display/ Output to User Search Raw Data (L0/1) Control / Status Recovered Instruments Network Switch COTS Platform Shore Servers (COTS Glider, AUV, CSPP, GSPP, WFP) CG Mooring Platform Shore Servers Link Non-OOI System Interfaces Data Process and Q/A Processed Data (L1/2) MetaData Recovered Memory Devices Cable Ashore Capture/ Manage MetaData Data Product Spec Ashore User Defined Deployed Deployed Cabled Infrastructure COTS RSN/EA CGSN COTS Platforms (AUV, Glider, Profiler, etc.) CI CGSN Platforms Instruments Profilers MIO CI I/F 12/14/2024 7 7
Generic Data Flows Network CI GUI (status, control, Resource Life Cycle Management, etc.) Data/ Metadata COTS Tool GUI (Jira, etc.) Control/ Metadata Dataset Agent Alert Data Server Control/ Metadata Control/ Data OMC Users Alert Control/ Data Control Raw Data Store (L0/1?) Platform Shore Servers (CGSN Devices Moorings, etc.)) Platform Shore Servers (COTS Devices Gliders, AUVs, etc.) Dump/ Display Search Live Data Recovered Data Commands (to platform) Commands (copy to CI) Metadata Link Data Process and Q/A Processed Data Store (L1/2) Recovered Memory Devices MetaData Store Ashore Create/ Manage MetaData Data Process Spec Deployed User Defined COTS CG CI COTS Platforms (AUV, Glider, Profiler, etc.) CGSN Platforms (No CI Presence) CG CI I/F 12/14/2024 8 8
Functional Flow Sensing and Ingest 12/14/2024 9 9
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop
Permanent storage Instrument Driver and Agent User Ingest for Instruments Including Tier1 and HD video 12/14/2024 11 11
Permanent storage Engineering System Driver and Agent User Ingest for Engineering data 12/14/2024 12 12
Ingest for other items Cruise documents Algorithms 12/14/2024 13 13
Calibration 12/14/2024 14 14
L0 L0 Permanent storage Instrument Driver and Agent User Uncalibrated Raw Instrument Data 12/14/2024 15 15
L1a Permanent storage Instrument Driver and Agent Calibration Values User Internally Calibrated Raw Instrument Data 12/14/2024 16 16
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values User Primary Calibration of Uncalibrated data 12/14/2024 17 17
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm User L1b(Post Deployment) Secondary calibration 12/14/2024 18 18
L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm L1b(PD) User L1b(PR) Secondary Post-Recovery calibration values POLYVAL Algorithm L1b(Intrp) Interpolation Secondary calibration 12/14/2024 19 19
Calibration actions PS or Marine Operator creates Primary Calibration Values PS or Marine Operator creates Secondary Post- Deployment Calibration Values PS or Marine Operator creates Secondary Post-Recovery Calibration Values Values are uploaded through the UI as csv files Calibration Values are associated with a specific instrument for a specific period of time 12/14/2024 20 20
Calibration Updates If new values are uploaded for any of the three, the new values overwrite the prior values. Assumption is we will only upload new values if there was a mistake with the old ones. We don t want to allow errors to propagate so we delete the old values 12/14/2024 21 21
Ingest for etc Is there anything you want to know about? 12/14/2024 22 22
Versioning 12/14/2024 23 23
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop
Storage 12/14/2024 25 25
Storage Intent-- for most instruments all science and engineering data is retained in OOI storage for the life of the program. (external archiving will be covered in a later presentation) Planned on an order of magnitude difference between-- 1. video camera (1) 2. Hydrophones (11), still cameras (10), seismometers (13) 3. Everything else (779) 12/14/2024 26 26
Data Volume Per Year 138 And Seismometer 60 HD Video Kept for 12/14/2024 27 27 27
Balancing intent & cost HD Video Camera L2-SR-RQ-3402 Buffering for not less than six months of all video imagery shall be provided NSF approved Data Use Policy (DCN 1102-00010)-- Minimum period of time for OOI storage before data and data products are moved to long term storage/ national archives instrument Type and Information HD Cameras 30 days Broadband Hydrophones, Still Cameras, Seismometers, Low Frequency Hydrophones 60 days All other Instruments 90 days Documentation and algorithms Life of program 12/14/2024 28 28
Data Product Delivery 12/14/2024 29 29
Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop
Database L0 L0 L1 Data Product Algorithm L2 Data Product Algorithm Primary Calibration Function L1a L2b Secondary Calibration Functions L1b QC QC Algorithms Algorithms Human In The Loop Human In The Loop L1a L1b and QC flags L1c L0 L2c QC flags L2b GUI User
Output Data Product Variables Single L1 data product, with the following variables (i.e., columns in the time series): <measurement>_L1a (e.g., Conductivity_L1a) <measurement>_L1b_Post_Deployment_Cal <measurement>_L1b_Post_Recovery_Cal <measurement>_L1b_Interpolated <measurement>_L1c QC_Flag_GlobalRange QC_Flag_LocalRange <additional QC flags> Single L2 data product, similar to above Single Parsed (Combined) product per instrument, with all variables for applicable L1 and L2 products, additional time stamps, and other variables. 12/14/2024 32 32
Output Data Product Metadata In the metadata (i.e., Metadata link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): Calibration coefficients (as a comma separated list) QC Look Up Table (as a url, or possibly as values in a TBD format) Data Product Algorithm (as a url) DPS for Data Product Algorithm (as a url) QC Algorithms (as urls) DPS s for QC Algorithms (as urls) POLYVAL Algorithm (as a url) 12/14/2024 33 33
OOI is about getting data to the users! Must maintain a balance between data quality, data quantity, and budget. 12/14/2024 34 34
Questions? Specific topics: Sensing, ingest (all data types instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery 12/14/2024 35 35