Data Pipeline and Workflow Overview for OOI Systems Engineering

Slide Note
Embed
Share

This comprehensive overview delves into the data pipeline, workflow, and system architecture for OOI Systems Engineering, led by Ed Chapman and Steve Gaul. Key topics include sensing, data ingest, data versioning, data schema/storage, and data product delivery. The document provides insights into the deployed platforms, instruments, algorithms, and unique data products. It explores data acquisition paths, live streaming, remote automated collection, and post-recovery processing. The hardware summary covers various components, including marine integration, instrument agents, and network assets. The content is rich with visual representations and detailed descriptions, offering a deep dive into the intricate workings of the OOI Systems Engineering processes.


Uploaded on Dec 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Data Pipeline & Workflow Ed Chapman Steve Gaul OOI Chief Systems Engineer OOI Systems Engineer/Architect 12/14/2024 1 1

  2. Goal Address Areas for Recommendations #1 Data Pipeline and workflow and #5 Data Products and Data Product Algorithms Specific topics: Sensing, ingest (all data types instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery 12/14/2024 2 2

  3. By the Numbers Number of Deployed Platforms= 89*** Number of Deployed Instruments= 814*** Number of Instrument types= 47 Number of Instrument Models= 77 Number of Uncabled dataset agent drivers= 71 Number of Cabled instrument agent drivers= 42 Number of Algorithms= 89 (52 L1 and 37 L2) Number of Data Product Types= 203 Number of Unique products= 3928 (L0, L1, L2) Number of Unique L0 products= 1640 Number of Unique L1 products= 1533 Number of Unique L2 products= 755 12/14/2024 ***As of ECR 1300-00419 3 3 3

  4. Sense and Ingest Data Several classes of data Instrument samples/profiles Platform engineering Specialized data streams; video, tier 1 Physical samples Logs, photos Metadata; calibration sheets, as-built lists Several data acquisition paths Live streaming data to shore processing (RSN) Remote automated collection; data telemetered to shore (CG/EA) Generally sub-sampled or otherwise simplified Post recovery data collection from recovered platforms Physical samples processed post cruise/recovery Manual collection of logs, photos, etc.; associated via metadata 12/14/2024 4 4

  5. Physical Flow Sensing and Ingest 12/14/2024 5 5

  6. 12/14/2024 6 6

  7. System Hardware Summary CG Platform Data Dataset Agents Marine Integration Components Instrument Agents Network OMS Platform Agent Uncabled Assets (CGSN & EA) Cabled Assets (RSN & EA) CI / COTS GUIs (status, control, Resource Life Cycle Management, etc.) CI / COTS GUIs (status, control, Resource Life Cycle Management, etc.) Instrument Data/ Metadata/ Command and Control Engineering Data/ System Status Data / Metadata Public, EPE, and Science Users Control / Status / Data / Metadata Control / Status Observation Management System (OMS) Shore Station User/ Operator Data Server Control / Status OMC User/ Operator Control / Status Engineering Data/ Infrastructure Command and Control Display/ Output to User Search Raw Data (L0/1) Control / Status Recovered Instruments Network Switch COTS Platform Shore Servers (COTS Glider, AUV, CSPP, GSPP, WFP) CG Mooring Platform Shore Servers Link Non-OOI System Interfaces Data Process and Q/A Processed Data (L1/2) MetaData Recovered Memory Devices Cable Ashore Capture/ Manage MetaData Data Product Spec Ashore User Defined Deployed Deployed Cabled Infrastructure COTS RSN/EA CGSN COTS Platforms (AUV, Glider, Profiler, etc.) CI CGSN Platforms Instruments Profilers MIO CI I/F 12/14/2024 7 7

  8. Generic Data Flows Network CI GUI (status, control, Resource Life Cycle Management, etc.) Data/ Metadata COTS Tool GUI (Jira, etc.) Control/ Metadata Dataset Agent Alert Data Server Control/ Metadata Control/ Data OMC Users Alert Control/ Data Control Raw Data Store (L0/1?) Platform Shore Servers (CGSN Devices Moorings, etc.)) Platform Shore Servers (COTS Devices Gliders, AUVs, etc.) Dump/ Display Search Live Data Recovered Data Commands (to platform) Commands (copy to CI) Metadata Link Data Process and Q/A Processed Data Store (L1/2) Recovered Memory Devices MetaData Store Ashore Create/ Manage MetaData Data Process Spec Deployed User Defined COTS CG CI COTS Platforms (AUV, Glider, Profiler, etc.) CGSN Platforms (No CI Presence) CG CI I/F 12/14/2024 8 8

  9. Functional Flow Sensing and Ingest 12/14/2024 9 9

  10. Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop

  11. Permanent storage Instrument Driver and Agent User Ingest for Instruments Including Tier1 and HD video 12/14/2024 11 11

  12. Permanent storage Engineering System Driver and Agent User Ingest for Engineering data 12/14/2024 12 12

  13. Ingest for other items Cruise documents Algorithms 12/14/2024 13 13

  14. Calibration 12/14/2024 14 14

  15. L0 L0 Permanent storage Instrument Driver and Agent User Uncalibrated Raw Instrument Data 12/14/2024 15 15

  16. L1a Permanent storage Instrument Driver and Agent Calibration Values User Internally Calibrated Raw Instrument Data 12/14/2024 16 16

  17. L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values User Primary Calibration of Uncalibrated data 12/14/2024 17 17

  18. L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm User L1b(Post Deployment) Secondary calibration 12/14/2024 18 18

  19. L0 Permanent storage Instrument Driver and Agent L1a Data Product Algorithm Calibration Values Secondary Post-Deployment calibration values POLYVAL Algorithm L1b(PD) User L1b(PR) Secondary Post-Recovery calibration values POLYVAL Algorithm L1b(Intrp) Interpolation Secondary calibration 12/14/2024 19 19

  20. Calibration actions PS or Marine Operator creates Primary Calibration Values PS or Marine Operator creates Secondary Post- Deployment Calibration Values PS or Marine Operator creates Secondary Post-Recovery Calibration Values Values are uploaded through the UI as csv files Calibration Values are associated with a specific instrument for a specific period of time 12/14/2024 20 20

  21. Calibration Updates If new values are uploaded for any of the three, the new values overwrite the prior values. Assumption is we will only upload new values if there was a mistake with the old ones. We don t want to allow errors to propagate so we delete the old values 12/14/2024 21 21

  22. Ingest for etc Is there anything you want to know about? 12/14/2024 22 22

  23. Versioning 12/14/2024 23 23

  24. Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop

  25. Storage 12/14/2024 25 25

  26. Storage Intent-- for most instruments all science and engineering data is retained in OOI storage for the life of the program. (external archiving will be covered in a later presentation) Planned on an order of magnitude difference between-- 1. video camera (1) 2. Hydrophones (11), still cameras (10), seismometers (13) 3. Everything else (779) 12/14/2024 26 26

  27. Data Volume Per Year 138 And Seismometer 60 HD Video Kept for 12/14/2024 27 27 27

  28. Balancing intent & cost HD Video Camera L2-SR-RQ-3402 Buffering for not less than six months of all video imagery shall be provided NSF approved Data Use Policy (DCN 1102-00010)-- Minimum period of time for OOI storage before data and data products are moved to long term storage/ national archives instrument Type and Information HD Cameras 30 days Broadband Hydrophones, Still Cameras, Seismometers, Low Frequency Hydrophones 60 days All other Instruments 90 days Documentation and algorithms Life of program 12/14/2024 28 28

  29. Data Product Delivery 12/14/2024 29 29

  30. Permanent storage Instrument Driver and Agent Data Product Algorithm Calibration Table Secondary Post-Deployment calibration values POLYVAL Algorithm User Secondary Post-Recovery calibration values POLYVAL Algorithm Interpolation QC algorithms Lookup Tables (range, spike, stuck, gradient, trend, combined) Human in the loop

  31. Database L0 L0 L1 Data Product Algorithm L2 Data Product Algorithm Primary Calibration Function L1a L2b Secondary Calibration Functions L1b QC QC Algorithms Algorithms Human In The Loop Human In The Loop L1a L1b and QC flags L1c L0 L2c QC flags L2b GUI User

  32. Output Data Product Variables Single L1 data product, with the following variables (i.e., columns in the time series): <measurement>_L1a (e.g., Conductivity_L1a) <measurement>_L1b_Post_Deployment_Cal <measurement>_L1b_Post_Recovery_Cal <measurement>_L1b_Interpolated <measurement>_L1c QC_Flag_GlobalRange QC_Flag_LocalRange <additional QC flags> Single L2 data product, similar to above Single Parsed (Combined) product per instrument, with all variables for applicable L1 and L2 products, additional time stamps, and other variables. 12/14/2024 32 32

  33. Output Data Product Metadata In the metadata (i.e., Metadata link from ERDDAP page, AND metadata on Data Product facepage on OOINet UI): Calibration coefficients (as a comma separated list) QC Look Up Table (as a url, or possibly as values in a TBD format) Data Product Algorithm (as a url) DPS for Data Product Algorithm (as a url) QC Algorithms (as urls) DPS s for QC Algorithms (as urls) POLYVAL Algorithm (as a url) 12/14/2024 33 33

  34. OOI is about getting data to the users! Must maintain a balance between data quality, data quantity, and budget. 12/14/2024 34 34

  35. Questions? Specific topics: Sensing, ingest (all data types instruments, engineering, tier 1, video, cruise, algorithms, calibration tables, etc), data versioning, data schema/storage, and data product delivery 12/14/2024 35 35

More Related Content