Challenges in Accessing IHO DCDB's CSB Data Assessment

CSBWG14, Stavanger, Norway, 16
th
 – 18
th
 August 2023
C
h
a
l
l
e
n
g
e
s
 
i
n
 
A
c
c
e
s
s
i
n
g
I
H
O
 
D
C
D
B
s
 
C
S
B
 
d
a
t
a
I
n
i
t
i
a
l
 
A
s
s
e
s
s
m
e
n
t
 
o
n
 
P
r
o
g
r
a
m
m
a
t
i
c
 
R
e
t
r
i
e
v
a
l
o
f
 
C
S
B
 
D
a
t
a
 
t
h
r
o
u
g
h
 
I
H
O
 
D
C
D
B
 
S
e
r
v
i
c
e
s
v4
Submitted by Denmark on behalf of the HO subWG (Canada, Sweden, UK, USA)
 INTRODUCTION
CSBWG14, 16
th
 – 18
th
 August 2023
1.
Hydrographic Offices (as well as other organizations) may have interest in
monitoring 
the CSB data collected by IHO DCDB to:
a.
Identify potential areas for 
navigational warnings
b.
Evaluate possible 
chart discrepancies
c.
Explore an internal 
QA workflow
 for CSB data
d.
Eventually 
update nautical documentation
 
(after the QA workflow)
.
2.
The monitoring of IHO DCDB’s CSB data should be 
automated to the greatest
possible extent
 to:
a.
Ease the work 
of HO analysts and 
reduce the time 
required for the analysis
b.
Minimize the reaction time
 to ensure safety of navigation.
 HOW TO 
ACCESS
 IHO DCDB’S CSB DATA?
CSBWG14, 16
th
 – 18
th
 August 2023
3.
CSB Data can be programmatically accessed through:
a.
Web Interface 
+ email
 → Fragile/cumbersome to script (web scraping)
b.
OGC Services
 
→ Lack of depth values
c.
ArcGIS REST + S3 Bucket
:
i.
Is this combination a temporary solution?
+
+
DCDB answer on 31/05/2023:
Using the ArcGIS REST service to discover the file’s uuid and
then constructing the S3 object key is 
still the best suggestion
we have to offer for file-based access. We recognize this
approach is 
less than ideal 
and see it as 
a temporary solution
while we are investigating better alternatives
.
Next steps:
Investigate alternatives to 
make it easier to identify S3
objects
 of interest based on platform and provider names.
A second alternative is to use the 
pre-release version
 of
the 
CSB pointstore API
 
HOW TO ORGANIZE THE RETRIEVED CSB DATA
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 18, 2023
4.
CSB Data from S3 Bucket are in 
CSV forma
t with 
8 (undocumented?) fields
:
a.
UNIQUE_ID
b.
FILE_UUID
c.
LON
d.
LAT
e.
DEPTH
f.
TIME
g.
PLATFORM_NAME
h.
PROVIDER
 
HOW TO ORGANIZE THE RETRIEVED CSB DATA
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 18, 2023
4.
CSB Data from S3 Bucket are in 
CSV forma
t with 
8 (undocumented?) fields
:
a.
UNIQUE_ID
b.
FILE_UUID
c.
LON
 → always DD? which accuracy?
d.
LAT
 → always DD? which accuracy?
e.
DEPTH
 → in meters? which accuracy?
f.
TIME
 → always ISO 8601? which accuracy?
g.
PLATFORM_NAME
h.
PROVIDER
Timestamped x, y, z
DCDB answer on 31/05/2023:
Language will be updated to reflect that all fields are
from the original, as-provided files, which are 
consistent
with B-12 guidance
 related to units, formats, etc.
 
HOW TO ORGANIZE THE RETRIEVED CSB DATA
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 18, 2023
4.
CSB Data from S3 Bucket are in 
CSV forma
t with 
8 (undocumented?) fields
:
a.
UNIQUE_ID
b.
FILE_UUID
c.
LON
d.
LAT
e.
DEPTH
f.
TIME
g.
PLATFORM_NAME
h.
PROVIDER
How to use these 4 fields to retrieve:
-
All the CSB data for a specific vessel?
Vessel names are not unique!
-
Group CSB data by vessel journey?
Journey provides context for data validation!
-
Retrieve the corresponding journey metadata?
DCDB answer on 31/05/2023:
The CSB objects (i.e. files) in the S3 bucket are 
not optimized for access by criteria
other than date 
but we are looking at options to enhance the flexibility (see
previous response).
Regarding the issue of unique vessel names, the uniqueID is included in the CSV
files available for download via the NODD bucket. Will include this information in
a future FAQ document.
Metadata associated with a given file is not currently available via web-based
access. Metadata for a “journey” would first require identifying the individual files
associated with a “ journey”.
 
HOW TO ORGANIZE THE RETRIEVED CSB DATA
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 18, 2023
4.
CSB Data from S3 Bucket are in 
CSV forma
t with 
8 (undocumented?) fields
:
a.
UNIQUE_ID
b.
FILE_UUID
c.
LON
d.
LAT
e.
DEPTH
f.
TIME
g.
PLATFORM_NAME
h.
PROVIDER
Is this field actually required?
 
WHAT DOES ‘ANONYMOUS PLATFORM’ MEAN
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 18, 2023
4.
CSB Data from S3 Bucket are in 
CSV forma
t with 
8 (undocumented?) fields
:
a.
UNIQUE_ID
b.
FILE_UUID
c.
LON
d.
LAT
e.
DEPTH
f.
TIME
g.
PLATFORM_NAME 
Anonymous
h.
PROVIDER
 A 
NEED FOR ANONYMITY CHECKS
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 18, 2023
DCDB answer on 31/05/2023:
In the example shown, “AIDACARA” is
included in the unique_ID, which our
system then includes in the FILE_UUID.
This unique_ID is set by trusted nodes
and is intentionally beyond the control
of the DCDB.  We leverage whatever is
provided to us.
It is important to protect the trust that the
DCDB and Trusted Nodes have earned.
 
HOW DO WE INTERPRET ZERO DEPTH
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 23, 2023
DCDB answer on 31/05/2023:
B-12 defines depth as 'The distance from
the vertical reference point to the
seafloor.'  With this, zero depth would
mean the vertical reference point is at
the seafloor.
 
There may be different reasons for zero depth: e.g., the sonar has
lost the bottom in deep waters.
What is the percentage of zero-depth entries in DCDB's CSB database?
Should entire tracklines with only zero-depth values be removed?
 
SHOULD SIGNIFICANT DIGITS BE ENFORCED
?
CSBWG14, 16
th
 – 18
th
 August 2023
Data retrieved on May 23, 2023
 
Having 3 decimal digits for latitude and
0 decimal digits for depth is not ideal.
What is the submission policy about significant digits?
Is rounding and/or truncation applied?
 ADDITIONAL INFO FROM DCDB'S CSB AWS
CSBWG14, 16
th
 – 18
th
 August 2023
5.
AWS S3 Explorer: 
https://noaa-dcdb-bathymetry-pds.s3.amazonaws.com/index.html
?
?
DCDB answer on 31/05/2023:
parquet, h3, and mb in the README.md are forward-
looking and will be removed from documentation until
those additional formats are ready.
Accessed 25 July 2023
Landing Page
Docs Page
Example of daily CSV Page
 AWS S3 README.HTML
CSBWG14, 16
th
 – 18
th
 August 2023
6.
https://noaa-dcdb-bathymetry-pds.s3.amazonaws.com/docs/readme.html
a.
S3 CSV format description
i.
Should B-12 be mentioned?
ii.
Is UNIQUE_ID unique for a vessel across trusted nodes?
iii.
Is FILE_UUID unique for a journey? If not, how to retrieve
all the files related to the same journey?
iv.
What to use the PROVIDER for?
v.
Is ‘platform’ and ‘ship’ used interchangeably?
'Vessel Name' is recommended metadata in B-12 3.3.3
vi.
DCDB answer on 31/05/2023:
The unique_ID is set by the trusted node and is intentionally outside the scope of the
DCDB.  If the same vessel were to contribute via multiple trusted nodes, it would likely
have a different unique_ID from each.
The concept of a “cruise” or “journey” is not inherent in the data submissions 
and what
appears on the map as a single continuous track may consist of multiple independent files
with separate FILE_IDs.
PROVIDER allows filtering to select data from a given trusted node.
Accessed 25 July 2023
 ADDITIONAL INFO FROM THE 
REGISTRY OF OPEN DATA
CSBWG14, 16
th
 – 18
th
 August 2023
7.
https://registry.opendata.aws/noaa-dcdb-bathymetry-pds/
a.
Update Frequency
i.
Syncing delay with web interface (?)
b.
License
i.
"There are no restrictions on the use of this
data"
 maps to a well-known license? Is the
same as CC0? 
ii.
What about previous CC-BY data? How to
honor the attribution requirement?
How CSB data are licensed is critical for being
used in HO products.
c.
Tutorials
i.
Does the tutorial provide the suggest way
to retrieve CSB data?
DCDB answer on 31/05/2023:
DCDB will update noaa-dcdb-bathymetry-pds
documentation as able to reference B-12,
standardize the terms “ship” and “platform”,
and clarify update frequencies
DCDB answer on 31/05/2023:
Language on the registry relating to licensing
will be updated, referencing B-12.
Conversation needed to discuss prior CCBY
license.
Accessed 25 July 2023
CSB VISUALIZATION NOTEBOOK
CSBWG14, 16
th
 – 18
th
 August 2023
8.
https://github.com/dneufeldcu/notebooks/blob/main/esipCSBfinal.ipynb
a.
Outdated or future design?
DCDB answer on 31/05/2023:
The referenced example was contributed and
has not been updated to reflect the most
recent changes in the archive. It still may be of
interest to those building their own access
tools for the archive.
Accessed 25 July 2023
 RECOMMENDATIONS
CSBWG14, 16
th
 – 18
th
 August 2023
9.
The IHO DCDB services for CSB data may be improved by focusing on:
a.
Easing the retrieval of 
a complete set of CSB data 
information:
i.
Position and depth (i.e., x, y, z)
ii.
Timestamp
iii.
Metadata
iv.
Auxiliary measurements (if available)
v.
Data license
b.
Publishing a page collecting 
official documentation:
i.
 ArcGIS REST & S3 .csv format description
ii.
Example scripts 
(e.g., filters by area, time and platform) in a IHO’s
GitHub repository.
 
DCDB answer on 31/05/2023:
The files available via S3 access do not
contain metadata, auxiliary
measurements, or data license.
DCDB answer on 31/05/2023:
https://noaa-dcdb-bathymetry-
pds.s3.amazonaws.com/docs/readme.ht
ml
 contains information on, and
examples for, programmatic access of
CSV files based on date.
Contributed tutorials and examples are
welcome.
RECOMMENDATIONS
CSBWG14, 16
th
 – 18
th
 August 2023
10.
The CSBWG is requested to:
 
a.
Note the information provided.
11.
The IHO DCDB is requested to:
a.
Take actions, as appropriate, to improve accessibility of the CSB data
through improved data services and related documentation/example
scripts.
b.
Engage HOs in the testing and development of enhancements to the
DCDB's CSB data interface.
DCDB answer on 31/05/2023:
Although the pointstore API is in a pre-release state and not intended for any production
workflow, we are working on documentation to make it easier to use outside the DCDB
Map Viewer. Suggestions for additional filtering options or other enhancements are
welcome
Slide Note
Embed
Share

Hydrographic Offices and organizations aim to automate the monitoring of CSB data collected by IHO DCDB for various purposes like identifying navigational warnings, evaluating chart discrepancies, and updating nautical documentation. Discover how to access IHO DCDB's CSB data programmatically through web interfaces, OGC services, and ArcGIS REST, including organizing the retrieved data in CSV format with key fields for analysis.

  • Hydrographic Offices
  • CSB Data
  • IHO DCDB
  • Automation
  • Nautical Documentation

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. v4 Challenges in Accessing Challenges in Accessing IHO DCDB s CSB data IHO DCDB s CSB data Initial Assessment on Programmatic Retrieval of CSB Data through IHO DCDB Services Submitted by Denmark on behalf of the HO subWG (Canada, Sweden, UK, USA) CSBWG14, Stavanger, Norway, 16th 18thAugust 2023

  2. INTRODUCTION 1. Hydrographic Offices (as well as other organizations) may have interest in monitoring the CSB data collected by IHO DCDB to: a. Identify potential areas for navigational warnings b. Evaluate possible chart discrepancies c. Explore an internal QA workflow for CSB data d. Eventually update nautical documentation (after the QA workflow). 2. The monitoring of IHO DCDB s CSB data should be automated to the greatest possible extent to: a. Ease the work of HO analysts and reduce the time required for the analysis b. Minimize the reaction time to ensure safety of navigation. CSBWG14, 16th 18thAugust 2023

  3. HOW TO ACCESS IHO DCDBS CSB DATA? 3. CSB Data can be programmatically accessed through: a. Web Interface + email Fragile/cumbersome to script (web scraping) b. OGC Services Lack of depth values c. ArcGIS REST + S3 Bucket: i. Is this combination a temporary solution? DCDB answer on 31/05/2023: Using the ArcGIS REST service to discover the file s uuid and then constructing the S3 object key is still the best suggestion we have to offer for file-based access. We recognize this approach is less than ideal and see it as a temporary solution while we are investigating better alternatives. Next steps: Investigate alternatives to make it easier to identify S3 objects of interest based on platform and provider names. A second alternative is to use the pre-release version of the CSB pointstore API + CSBWG14, 16th 18thAugust 2023

  4. HOW TO ORGANIZE THE RETRIEVED CSB DATA? 4. CSB Data from S3 Bucket are in CSV format with 8 (undocumented?) fields: a. b. c. d. e. f. g. h. UNIQUE_ID FILE_UUID LON LAT DEPTH TIME PLATFORM_NAME PROVIDER CSBWG14, 16th 18thAugust 2023 Data retrieved on May 18, 2023

  5. HOW TO ORGANIZE THE RETRIEVED CSB DATA? 4. CSB Data from S3 Bucket are in CSV format with 8 (undocumented?) fields: a. b. c. d. e. f. g. h. UNIQUE_ID FILE_UUID LON always DD? which accuracy? LAT always DD? which accuracy? DEPTH in meters? which accuracy? TIME always ISO 8601? which accuracy? PLATFORM_NAME PROVIDER Timestamped x, y, z DCDB answer on 31/05/2023: Language will be updated to reflect that all fields are from the original, as-provided files, which are consistent with B-12 guidance related to units, formats, etc. CSBWG14, 16th 18thAugust 2023 Data retrieved on May 18, 2023

  6. HOW TO ORGANIZE THE RETRIEVED CSB DATA? 4. CSB Data from S3 Bucket are in CSV format with 8 (undocumented?) fields: a. b. c. d. e. f. g. h. UNIQUE_ID FILE_UUID LON LAT DEPTH TIME PLATFORM_NAME PROVIDER How to use these 4 fields to retrieve: All the CSB data for a specific vessel? Vessel names are not unique! Group CSB data by vessel journey? Journey provides context for data validation! Retrieve the corresponding journey metadata? - - - DCDB answer on 31/05/2023: The CSB objects (i.e. files) in the S3 bucket are not optimized for access by criteria other than date but we are looking at options to enhance the flexibility (see previous response). Regarding the issue of unique vessel names, the uniqueID is included in the CSV files available for download via the NODD bucket. Will include this information in a future FAQ document. Metadata associated with a given file is not currently available via web-based access. Metadata for a journey would first require identifying the individual files associated with a journey . CSBWG14, 16th 18thAugust 2023 Data retrieved on May 18, 2023

  7. HOW TO ORGANIZE THE RETRIEVED CSB DATA? 4. CSB Data from S3 Bucket are in CSV format with 8 (undocumented?) fields: a. b. c. d. e. f. g. h. UNIQUE_ID FILE_UUID LON LAT DEPTH TIME PLATFORM_NAME PROVIDER Is this field actually required? CSBWG14, 16th 18thAugust 2023 Data retrieved on May 18, 2023

  8. WHAT DOES ANONYMOUS PLATFORM MEAN? 4. CSB Data from S3 Bucket are in CSV format with 8 (undocumented?) fields: a. b. c. d. e. f. g. h. UNIQUE_ID FILE_UUID LON LAT DEPTH TIME PLATFORM_NAME Anonymous PROVIDER CSBWG14, 16th 18thAugust 2023 Data retrieved on May 18, 2023

  9. A NEED FOR ANONYMITY CHECKS? DCDB answer on 31/05/2023: In the example shown, AIDACARA is included in the unique_ID, which our system then includes in the FILE_UUID. It is important to protect the trust that the DCDB and Trusted Nodes have earned. This unique_ID is set by trusted nodes and is intentionally beyond the control of the DCDB. We leverage whatever is provided to us. CSBWG14, 16th 18thAugust 2023 Data retrieved on May 18, 2023

  10. HOW DO WE INTERPRET ZERO DEPTH? There may be different reasons for zero depth: e.g., the sonar has lost the bottom in deep waters. DCDB answer on 31/05/2023: B-12 defines depth as 'The distance from the vertical reference point to the seafloor.' With this, zero depth would mean the vertical reference point is at the seafloor. CSBWG14, 16th 18thAugust 2023 Data retrieved on May 23, 2023 What is the percentage of zero-depth entries in DCDB's CSB database? Should entire tracklines with only zero-depth values be removed?

  11. SHOULD SIGNIFICANT DIGITS BE ENFORCED? Having 3 decimal digits for latitude and 0 decimal digits for depth is not ideal. CSBWG14, 16th 18thAugust 2023 Data retrieved on May 23, 2023 What is the submission policy about significant digits? Is rounding and/or truncation applied?

  12. ADDITIONAL INFO FROM DCDB'S CSB AWS 5. AWS S3 Explorer: https://noaa-dcdb-bathymetry-pds.s3.amazonaws.com/index.html Landing Page Docs Page Accessed 25 July 2023 Example of daily CSV Page ? DCDB answer on 31/05/2023: parquet, h3, and mb in the README.md are forward- looking and will be removed from documentation until those additional formats are ready. CSBWG14, 16th 18thAugust 2023

  13. AWS S3 README.HTML 6. https://noaa-dcdb-bathymetry-pds.s3.amazonaws.com/docs/readme.html a. S3 CSV format description i. Should B-12 be mentioned? ii. Is UNIQUE_ID unique for a vessel across trusted nodes? iii. Is FILE_UUID unique for a journey? If not, how to retrieve all the files related to the same journey? iv. What to use the PROVIDER for? v. Is platform and ship used interchangeably? 'Vessel Name' is recommended metadata in B-12 3.3.3 vi. DCDB answer on 31/05/2023: The unique_ID is set by the trusted node and is intentionally outside the scope of the DCDB. If the same vessel were to contribute via multiple trusted nodes, it would likely have a different unique_ID from each. The concept of a cruise or journey is not inherent in the data submissions and what appears on the map as a single continuous track may consist of multiple independent files with separate FILE_IDs. Accessed 25 July 2023 PROVIDER allows filtering to select data from a given trusted node. CSBWG14, 16th 18thAugust 2023

  14. ADDITIONAL INFO FROM THE REGISTRY OF OPEN DATA 7. https://registry.opendata.aws/noaa-dcdb-bathymetry-pds/ a. Update Frequency i. Syncing delay with web interface (?) License i. "There are no restrictions on the use of this data" maps to a well-known license? Is the same as CC0? ii. What about previous CC-BY data? How to honor the attribution requirement? How CSB data are licensed is critical for being used in HO products. Tutorials i. Does the tutorial provide the suggest way to retrieve CSB data? DCDB answer on 31/05/2023: DCDB will update noaa-dcdb-bathymetry-pds documentation as able to reference B-12, standardize the terms ship and platform , and clarify update frequencies b. DCDB answer on 31/05/2023: Language on the registry relating to licensing will be updated, referencing B-12. Conversation needed to discuss prior CCBY license. c. CSBWG14, 16th 18thAugust 2023 Accessed 25 July 2023

  15. CSB VISUALIZATION NOTEBOOK 8. https://github.com/dneufeldcu/notebooks/blob/main/esipCSBfinal.ipynb a. Outdated or future design? DCDB answer on 31/05/2023: The referenced example was contributed and has not been updated to reflect the most recent changes in the archive. It still may be of interest to those building their own access tools for the archive. CSBWG14, 16th 18thAugust 2023 Accessed 25 July 2023

  16. RECOMMENDATIONS 9. The IHO DCDB services for CSB data may be improved by focusing on: a. Easing the retrieval of a complete set of CSB data information: i. Position and depth (i.e., x, y, z) ii. Timestamp iii. Metadata iv. Auxiliary measurements (if available) v. Data license b. Publishing a page collecting official documentation: i. ArcGIS REST & S3 .csv format description ii. Example scripts (e.g., filters by area, time and platform) in a IHO s GitHub repository. DCDB answer on 31/05/2023: The files available via S3 access do not contain metadata, auxiliary measurements, or data license. DCDB answer on 31/05/2023: https://noaa-dcdb-bathymetry- pds.s3.amazonaws.com/docs/readme.ht ml contains information on, and examples for, programmatic access of CSV files based on date. Contributed tutorials and examples are welcome. CSBWG14, 16th 18thAugust 2023

  17. RECOMMENDATIONS 10. The CSBWG is requested to: a. Note the information provided. 11. The IHO DCDB is requested to: a. Take actions, as appropriate, to improve accessibility of the CSB data through improved data services and related documentation/example scripts. b. Engage HOs in the testing and development of enhancements to the DCDB's CSB data interface. DCDB answer on 31/05/2023: Although the pointstore API is in a pre-release state and not intended for any production workflow, we are working on documentation to make it easier to use outside the DCDB Map Viewer. Suggestions for additional filtering options or other enhancements are welcome CSBWG14, 16th 18thAugust 2023

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#