Understanding SDMX Dataflows and Content Constraints

Slide Note
Embed
Share

Explore the structure and significance of SDMX dataflows and content constraints in managing and reporting data effectively. Learn about global dataflows, content constraints, and types of constraints to ensure accurate and compliant reporting of datasets.


Uploaded on Sep 13, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. SDG Dataflows and Content SDG Dataflows and Content Constraints Constraints 15 Nov 2021 Abdulla Gozalov United Nations Statistics Division

  2. SDMX Dataflows Structure that can helps describe, categorize and constrain datasets Can be constrained to a subset of codes in any dimension Can be categorized, i.e. can have categories attached In its simplest form defines any data valid according to a DSD Each dataflow is linked to one DSD. Each DSD may have one or more dataflows linked to it. Multiple dataflows over one DSD help compartmentalize the data to simplify data reporting and dissemination. In the case of dissemination, a dataflow can be thought of as a view on the Data Structure Definition In the case of reporting, it can be thought of as a data transmission channel.

  3. SDG Global Dataflows DF_SDG_GLH Harmonized Global Dataflow. This dataflow is used by the Custodian Agencies to report SDG indicators that are part of the global dataset, regardless of how the data was obtained. This dataflow is also used to disseminate the global dataset at the SDMX API. DF_SDG_GLC Country Global Dataflow. This data is used by countries to report data to UNSD, as well as to disseminate national data in compliance with the SDG Global DSD.

  4. SDMX Content Constraints Content constraints define restrictions on code lists or series. Provide more powerful, granular validation than DSD alone Are also used to report data availability Are often attached to the Dataflow but can also be attached to DSD, Provision Agreement, Data Provider But wherever content constraints are attached, code lists that they restrict are defined in the underlying DSD. Content Constraint DSD Dataflow Provision Agreement Data Provider Codelist Codelist Codelist

  5. Content constraints and related SDMX artefacts Content Constraint DSD Dataflow Provision Agreement Data Provider Codelist Codelist Codelist

  6. Types of Content Constraints Cube Region Define allowed (or disallowed) codes from DSD code lists Series Define allowed (or disallowed) combinations of codes from DSD code lists Cube Region constraints define which codes are allowed (or disallowed). They do not define any relationship between the codes. For example, can be used to constrain a dataflow to ESCWA Member States in the Reference Area code list Series constraints list all combinations of dimension values that are allowed (or disallowed). For example, can be used to state that for series AG_LND_FRST ( Forest area as a proportion of total land area [15.1.1] ), the only valid sex code is _T( Both sexes or no breakdown by sex )

  7. SDG Cube Region Content Constraints CN_SDG_GLC, attached to dataflow DF_SDG_GLC Restricts the dimension REPORTING_TYPE to code N( National ) Ensures that data from countries always have REPORTING_TYPE=N, i.e. the countries always use correct Reporting Type for national dataset. CN_SDG_GLH, attached to dataflow DF_SDG_GLH Restricts the dimension REPORTING_TYPE to code G ( Global ) Ensures that data from custodian agencies always have REPORTING_TYPE=G, i.e. the agencies always use correct Reporting Type for the global dataset.

  8. SDG Series Content Constraints CN_SERIES_SDG_GLC, attached to dataflow DF_SDG_GLC CN_SERIES_SDG_GLH, attached to dataflow DF_SDG_GLH Although separate, the constraints are identical in terms of content For practical reasons and to make them future-proof Provide all valid combinations of SDG dimensions. Can be downloaded from the SDMX Global Registry or SDMX-SDG page. An Excel matrix representing the series content constraints can also be downloaded from the SDMX-SDG page.

  9. Diagram of SDG artefacts Codelists CL_FREQ Concept Scheme DSD SDG SDG_CONCEPTS CL_PRODUCT Dataflow Dataflow DF_SDG_GLC DF_SDG_GLH Content Constraints CN_SDG_GLC CN_SERIES_SDG_GLC CN_SDG_GLH CN_SERIES_SDG_GLH

  10. Validation of SDG datasets Validation against the DSD verifies that all dimensions and mandatory attributes are in place. Does not verify relationships between the dimension values. If you have series Forest area as a proportion of total land area with Sex=Female, it will pass the validation. Validation against a dataflow, in addition, verifies relationships among dimension values. Will flag invalid combinations such as the above, gender indicators with Sex=Male, and similar coding errors Helps validate the dataset before submitting to the SDG Lab, which will reject invalid datasets Countries should always validate against dataflow DF_SDG_GLC

  11. Thank you!

Related