Data Quality Reporting in Distribution Chain
Data Quality Reporting in the distribution chain is crucial for evaluating and ensuring the quality of exchanged data sets. This paper by Raphael Malyankar sponsored by NOAA discusses the importance of reporting data quality results, the stages in the production/distribution chain where reporting is needed, appropriate levels of detail in quality reports, the location and format of quality reports, and the impact of quality reporting on different providers and recipients within the distribution stream. It also proposes conventions for data quality report formats. The paper addresses questions regarding when and how data quality results should be reported and examines the implications of uncertainties at various stages of data exchange.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Reporting Data Quality DQWG15-04.4B Paper by Raphael Malyankar Work sponsored by NOAA
Data Quality reporting DQWG15-04.4A discusses how to evaluate S-1xx exchange set data quality. This paper is about reporting the results. Preliminary question - under what circumstances are the results reported? At which stages in the production/distribution chain is data quality reporting needed? What is the appropriate level of detail in quality reports at different stages in the chain? Where should the data quality report(s) be located? What is the format of the quality report? These questions apply to all data transfer modes: exchange sets as well as online data exchange. This paper addresses the question of format for exchange sets. Reporting for other modes is an open question at this time. The answers may depend on where in the distribution chain the data exchange occurs. DQWG15, Monaco 4-7 February 2020 2
Quality Reporting and the distribution chain 1stlevel Intermediary 2ndlevel Intermediary Producer Distributor End user Onboard users Chart agent Commercial shipping Pilots Recreational Value added reseller Value added reseller distribution Hydrographic Office RENC Other end users Equipment manufacturer Governmental Scientific VTS Hydrographic office distribution How is quality reporting affected by answers to the following questions? 1) Which providers in the distribution stream should publish (or forward) full data quality reports? 2) Which recipients in the distribution stream use complete data and which recipients use only summary reports? Uncertainties are probably needed at all stages. End-users may only need to know that the dataset has satisfactory results for the other measures (including dataset validation checks listed in the product specification). Are the answers different for online data exchange? DQWG15, Monaco 4-7 February 2020 3
Data Quality report format File reference Cites S-97 Part XML report (ISO format) C Optional standalone report (ISO or other format) Not included in this example. Code from S-97 C Codespace as partial MRN This format conforms to the ISO 19115-3 and 19157 XML schemas. The complete report is in the S-100 4.0.0 samples on the S-100 GitHub site: https://github.com/IHO-S100WG This paper proposes conventions for: Name from S-97 C Evaluation method Report content codes, codespace, name, measure reference citation. Report file location in exchange sets. Value of evaluation result DQWG15, Monaco 4-7 February 2020 4
Recommendation 1 Recommendation 1: S-97 should include guidelines for reporting format. Add the following text (elaborated or adapted as necessary) to S-97 Part C, along with Table 1 from the paper. If the results of testing of quality measures specified in S-97 Part C or a Product Specification are included in (or accompany) an exchange set: The report must use the ISO XML format for the data quality report. Additional or more detailed information in non-ISO format may be included as a standalone report in a separate file referenced by a DQ_StandaloneQualityReportInformation element. There must be a separate report file for each dataset or series whose quality is reported. The name of the detailed data quality report must be similar to the ISO 19115 metadata file except that the MD_ prefix mush be replaced by the DQ_ prefix. The content of the report must conform to the structure and conventions described in Table 1. Justification: Conventions are needed to ensure consistent quality reporting across various types of data products. DQWG15, Monaco 4-7 February 2020 5
Recommendations 2, 3, 4 Recommendation 2: Update S-100 Part 4c to make it consistent with S-97 Part C. Justification: S-97 Part C is based on ISO 19157 but S-100 Part 4c references older ISO standards which have since been withdrawn. Recommendation 3: Extend the ISO codelist MD_ScopeCode with codes distinguishing different spatial types (as listed in S100_FC_SpatialPrimitiveType), feature types in general, and information types in general. Justification: S-97 Part C mentions spatial types as evaluation scopes (S-97 Part C Table 7-1). but the ISO MD_ScopeCode codelist does not mention spatial types as possible scopes. Nor does it allow distinguishing S-100 feature types in general from S-100 information types in general. S-122, S-123, S- 127, and other data products make heavy use of information types and will need to distinguish. Recommendation 4: Extend the ISO 19157 model of DQ_MeasureReference and/or DQ_Result to indicate the importance of quality results. Justification: S-100 products classify validation checks as Critical/Error/Warning. Non-ENC data products in particular may still be usable with some issues that raise warnings. DQWG15, Monaco 4-7 February 2020 6
Recommendation 5 Recommendation 5: Determine if quality reporting should be included in S-1xx exchange catalogues ( CATALOG.XML files). Exchange catalogue conceptual structure is defined in S-100 Figure 4a-D-4 and its implementation structure is defined by an XSD file in the S-100 4.0.0 schemas on the S-100 GitHub site. Considerations: Reporting summary measures serves as explicit verification of passing quality checks. Reporting summary measures gives applications a fast way to verify fitness for use. Reporting results down the production/supply chain helps localize points where discrepancies may have been introduced, for both value-added data and data reuse scenarios. Volume reduction especially for restricted bandwidth situations. Product specifications may define meta-features with quality information. Signing and release of a dataset indicates passage of quality checks (a defined subset of checks?). If yes , indicate preferred methods (see slide Revisions to exchange catalogue for details): additional attributes in S100_DatasetDiscoveryMetadata, or A new S100_DatasetQualityInformation class in the exchange catalogue model DQWG15, Monaco 4-7 February 2020 7
Online data exchange Factors and strategies Potential data quality communication strategies One-way Request-response Request-callback Publish-subscribe Broadcast Info. exchange pattern As metadata with each feature Providers publish quality information for data service(s) via a metadata service Communication management Session-oriented Session-less Streaming Publish quality information for service levels, and tag individual features with level ID metadata tags Differences vs. exchange sets Variations in units of data Packaging of report Transfer volume Specialized strategies? DQWG15, Monaco 4-7 February 2020 8
Actions Requested 1. Endorse the request to revise S-100 Part 4c to ensure its conformance with updates to the relevant ISO standards. 2. Draft guidelines addressing the circumstances and scope of data quality reporting in the context of the data production/supply chain from original producer to end user. 3. Discuss what quality information needs to be reported with exchange sets and under what circumstances. Suggest appropriate revisions to the models of S-100 exchange sets and exchange catalogues. 4. Endorse the proposed extensions to S-97 Part C to add conventions for quality reporting, and to the ISO model of data quality (in S-100 Part C), subject to decisions on Actions 2 and 3 in this list. 5. Discuss the question of quality information in connection with online data exchange, and strategies for communicating quality information in online data exchange. DQWG15, Monaco 4-7 February 2020 9
Supplemental slides DQWG15, Monaco 4-7 February 2020 10
Recommendation 5 revisions to exchange catalogue +dataProductSpecificationPassed: Boolean +dataProductSpecificationFailRate: Real S-100 4.0.0 Figure 4a-D-4 New attributes OR New class 0..1 S100_DatasetQualityInformation +dataProductSpecificationPassed : Boolean +dataProductSpecificationFailRate : Real ... add other measures as needed ... DQWG15, Monaco 4-7 February 2020 11
Recommendation 1 revisions to exchange set structure 0..1 S100_19115QualityInformation 0..1 S100_StandaloneQualityInformation S-100 4.0.0 Figure 4a-D-2 DQWG15, Monaco 4-7 February 2020 12
Table 1 XML element/attribute in quality report DQ_DataQuality Value Remarks (including both ISO and S-97 constraints) -- One DQ_DataQuality element is required for each quality result reported. (ISO format rule.) Mandatory. The ISO attributes extent and levelDescription should not be used. Data quality reports is S-100 should apply to the dataset as a whole over its entire coverage. Mandatory. (Recommend extending the codelist to distinguish spatial types and feature/information types.) Mandatory. One report element for each quality result reported in this file. (XML content) is one of the quality elements defined in ISO 19157 (corresponding to column 1 of S-97 Part C, table 7-1). Product Specifications may define additional quality elements. The XML tag names are defined in ISO 19157 and S-100 Part 4c (App. 4c-C). Container element for DQ_MeasureReference >scope >>MD_Scope -- -- >>>level Codelist MD_ScopeCode (ISO 19115). -- >report >>(XML content), e.g., DQ_FormatConsistency -- DQ_CompletenessCommission >>>measure -- DQWG15, Monaco 4-7 February 2020 13
Table 1 (contd.) XML element/attribute in quality report >>>>DQ_MeasureReference Value Remarks (including both ISO and S-97 constraints) If measureIdentification is not provided, then nameOfMeasure shall be provided. (ISO rule) measureIdentification must be used if the measure is listed in S97 Part C or the product specification. Container element for MD_Identifier. Container element for identification information. Required if codeSpace is not populated. Container element for CI_Citation (ISO 19115- 1/3). -- >>>>>measureIdentification >>>>>>MD_Identifier >>>>>>>authority -- -- Citation of either S-97 Part C (for common measures) or Product Specification (product-specific measures), in CI_Citation format. Camel-case code of measure, from S-97 Part C (for common measures) or the Product Specification (product-specific measures or validation checks). >>>>>>code Mandatory. For S-97 Part C, this code is given in column 3 of Table 7-1 (Recommended data quality measures). Conventions for identifying measures or validation checks defined in Product Specifications are TBD (perhaps the check number). Recommended required if authority is not populated. Format: partial URN, structured as described in Note 1. >>>>>>codeSpace Partial or complete MRN that identifies the source of the measure: IHO quality measure register, S-97 Part C, or a Product Specification. DQWG15, Monaco 4-7 February 2020 14
Table 1 (contd., 2) XML element/attribute in quality report >>>evaluationMethod Value Remarks (including both ISO and S-97 constraints) Optional container element; the ISO schemas require that the content must be one of DQ_AggregationDerivation, DQ_DataInspection, DQ_EvaluationMethod, DQ_FullInspection, DQ_SampleBasedInspection, DQ_IndirectEvaluation. Container elements for XML content describing evaluation method. (The use of DQ_CoverageResult, introduced by ISO 19157 Amdt. 1 (2018), should also be investigated when this table is finalized for S-97 Part C.) >>>>DQ_AggregationDerivation >>>>DQ_DataInspection >>>>DQ_EvaluationMethod >>>>DQ_FullInspection >>>>DQ_SampleBasedInspection >>>>DQ_IndirectEvaluation >>>>>(XML content) (As specified in ISO 19157 and S-100 4.x Part 4c Appendix B.) (XML content) is one of the evaluation method descriptions specified in ISO 19157 and S-100 4.x Part 4c Appendix B. At least one element other than dateTime must be populated. Container element. ISO schemas require that the content must be one of DQ_QuantitativeResult, DQ_ConformanceResult, DQ_DescriptiveResult, (DQ_CoverageResult(?)) >>>result DQWG15, Monaco 4-7 February 2020 15