Metadata Cleansing Using SPARQL Update Queries

Slide Note
Embed
Share

Learn how to transform and cleanse RDF metadata using SPARQL Update queries to conform to the ADMS-AP for Joinup. This tutorial provides essential knowledge on converting metadata for interoperability solutions and the main queries involved. Discover how to ensure your metadata is compliant and ready for upload on Joinup.


Uploaded on Sep 25, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. image Introduction to metadata cleansing using SPARQL update queries April 2014 PwC EU Services

  2. Learning objectives By the end of this module, you will have an understanding of: How to transform your metadata using simple SPARQL Update queries How to conform to the ADMS-AP to get your interoperability solutions ready to be shared on Joinup The main types of errors that you could face when uploading metadata of interoperability solutions on Joinup 2

  3. How can this tutorial help you? Since its launch in 2011 Joinup has been steadily growing in popularity. It currently receives more than 60.000 visits per month and is hosting some 130 online communities. Interoperability solutions owners may have the possibility to generate automatically in RDF the descriptive metadata of their solutions. Sometimes, this metadata may not be conform to the ADMS Application Profile for Joinup (ADMS-AP), preventing it from being uploaded on Joinup. This tutorial provides basic knowledge on how to transform and cleanse RDF metadata using SPARQL Update queries in order to conform to the ADMS- AP. SPARQL is the query language for RDF and also allows for creating, updating and deleting RDF triples. ADMS-AP: https://joinup.ec.europa.eu/asset/adms/asset_release/adms- application-profile-joinup 33

  4. Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Metadata cleansing Why? The main queries 3 examples 4. Metadata upload to Joinup 4

  5. What is the ADMS Application Profile for Joinup (ADMS-AP) is a common vocabulary used for all type of interoperability solutions. The Asset Description Metadata Schema Application Profile their solutions and easily upload the descriptions on Joinup. It allows interoperability solutions providers to describe solutions coming from Joinup using a common vocabulary. It allows users to easily discover and re-use interoperability 5

  6. ADMS-AP for describing your interoperability solutions on Joinup Repository Public administrations Explore Find Select Obtain Your ADMS-AP Academic Using the ADMS Application Profile repository Repository Standardisation bodies Repository Businesses 6

  7. Automatic or manual path to generate ADMS-AP https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw Transformation with Open Refine Interoperability solutions https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw Cleansing with SPARQL This tutorial focuses on the automatic path to generate ADMS-AP compliant RDF. See how to transform with Open Refine: https://joinup.ec.europa.eu/svn/adms/trainings/Introduc tion_to_Open_Refine_RDF_tool.pptx 7

  8. SPARQL Protocol and RDF Query Language (SPARQL) SPARQL is the standard language to query graph data represented as RDF triples. o One of the three core standards of the Semantic Web, along with RDF and OWL. o Became a W3C standard January 2008. o SPARQL 1.1 standard as of 2013. 8

  9. The Resource Description Framework (RDF) RDF represents data as (subject, predicate, object) triples. A set of triples is an RDF graph. rdf:type adms:Asset http://myasset.eu/ dct:title My asset name Resources (URIs), often abbreviated Resources Plain literals: Text , Text @en Typed literals: 42 ^^xsd:integer, 2014-01-01 ^^xsd:date 9 NB: subjects and objects may also be blank nodes.

  10. A graph can be represented with different syntaxes RDF/XML required by Joinup <rdf:Description about= http://myasset.eu/ > <rdf:type rdf:resource= http://www.w3.org/ns/adms#Asset /> <dct:title>My asset name</dct:title> <dct:description>Description of the asset</dct:description> <dct:modified rdf:datatype= http://www.w3.org/2001/XMLSchema#dateTime > 2014-01-01T00:00:00Z </dct:modified> </rdf:Description> Turtle used in SPARQL and in this tutorial <http://myasset.eu/> a adms:Asset ; dct:title My asset name ; dct:description Description of the asset ; dct:modified 2014-01-01T00:00:00Z ^^xsd:dateTime . Syntaxes are equivalent. It is easy to transform one into another. 10

  11. SPARQL is a query language for RDF data https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw Query: <http://myasset.eu/> a adms:Asset ; dct:title My asset name ; dct:description Description of the asset ; dct:modified 2014-01-01T00:00:00Z ^^xsd:dateTime . SELECT * WHERE { ?asset a adms:Asset ; dct:title ?title . } <http://yourasset.eu/> a adms:Asset ; dct:title Your asset name ; dct:description Another asset . Graph pattern: an RDF graph with placeholder variables (e.g., ?asset) Results: ?asset ?title <http://myasset.eu/> My asset name <http://yourasset.eu/> Your asset name 11

  12. SPARQL queries have many forms https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw SPARQL SELECT to query data from a graph (not used in this tutorial) https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw SPARQL CONSTRUCT to transform one graph into another (used for creating ADMS-AP from existing RDF) https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSD7t9rvG1hUJtGErT4kL5NnNzCxF4cjK5UjY3KHKJPgDrt96Hq5CDJpw SPARQL Update to modify a graph in place (used to cleanse ADMS-AP metadata) 12

  13. A useful tool to transform RDF files Used to create and edit RDF files and run SPARQL queries over them. A free version is also available. TopBraid Composer is the leading industrial-strength RDF editor and OWL ontology editor, as well as the best SPARQL tool on the market. Source: http://semanticweb.org/ For download: http://www.topquadrant.com/downloads/ 13

  14. Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Cleanse metadata Why? The main queries 3 examples 4. Metadata upload to Joinup 14

  15. Construct ADMS-AP from existing RDF Why? You may already have the metadata description of your interoperability solutions in a RDF file that is not compliant with ADMS-AP (e.g. missing out on mandatory properties or on the use of recommended controlled vocabularies). The following slides help you to create a compliant ADMS-AP RDF graph from your initial RDF. 15

  16. Construct ADMS-AP from existing RDF using a SPARQL CONSTRUCT query CONSTRUCT { ?asset a adms:Asset ; dct:title ?title ; dct:description ?description ; dct:modified ?modified ; dct:type <http://purl.org/adms/assettype/Ontology> ; dct:relation ?related ; dcat:distribution ?d . ?d a adms:AssetDistribution ; dcat:accessURL ?asset . Result graph to construct } WHERE { ?asset a voaf:Vocabulary ; dct:title ?title ; dct:description ?description ; dct:modified ?modified . Graph pattern to query Recommended and optional fields OPTIONAL { ?asset voaf:similar ?related } Construct new URIs using expressions BIND(IRI(CONCAT(STR(?asset), "?type=distribution")) AS ?d) } 16

  17. Construct ADMS-AP from existing RDF the result is a new RDF graph <http://data.lirmm.fr/ontologies/food> a adms:Asset ; dct:title Food Ontology @en ; dct:description This ontology @en ; dct:modified 2013-09-24 ; dct:type <http://purl.org/adms/assettype/Ontology> ; dct:relation <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> ; dcat:distribution <http://data.lirmm.fr/ontologies/food?type=distribution> . <http://data.lirmm.fr/ontologies/food> a voaf:Vocabulary ; dct:title Food Ontology @en ; dct:description This ontology @en ; dct:modified 2013-09-24 ; voaf:similar <http://www.w3.org/TR/2003/PR-owl-guide- 20031215/food> . <http://data.lirmm.fr/ontologies/food?type=distribution> a adms:AssetDistribution ; dcat:accessURL <http://data.lirmm.fr/ontologies/food> . <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> a voaf:Vocabulary ; dct:title Food Ontology in OWL @en ; dct:description Along with @en ; dct:modified 2003-12-15 . <http://www.w3.org/TR/2003/PR-owl-guide-20031215/food> a adms:Asset ; dct:title Food Ontology in OWL @en ; dct:description Along with @en ; dct:modified 2003-12-15 ; dct:type <http://purl.org/adms/assettype/Ontology> ; dcat:distribution <http://www.w3.org/TR/2003/PR-owl-guide- 20031215/food?type=distribution> . <http://www.w3.org/TR/2003/PR-owl-guide- 20031215/food?type=distribution> a adms:AssetDistribution ; dcat:accessURL <http://www.w3.org/TR/2003/PR-owl-guide- 20031215/food> . Example from the Linked Open Vocabulary repository. 17 17

  18. Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Metadata cleansing Why? The main queries 3 examples 4. Metadata upload to Joinup 18

  19. Metadata cleansing Why? You may need to make some small modifications to your RDF graph in order to have it fully compliant to ADMS-AP Only ADMS-AP compliant descriptive metadata can be uploaded on Joinup. Joinup has a built-in ADMS-AP validation feature to help you pinpoint inconsistencies with the standard. 19

  20. Metadata cleansing with SPARQL update queries Add static triples (INSERT DATA) Remove static triples (DELETE DATA) Modify static triples (combine INSERT DATA and DELETE DATA) Add triples based on query results (INSERT) Remove triples based on query results (DELETE) Modify triples based on query results (DELETE/INSERT) For more info: http://www.w3.org/TR/sparql11-update/#graphUpdate https://joinup.ec.europa.eu/community/ods/document/tm13-introduction-rdf-sparql-en 20

  21. Metadata cleansing add static triples Example: add the title of a specific interoperability solution (modelled as an adms:Asset) Query: INSERT DATA { <http://myasset.eu/> dct:title Asset name @en . } Before: After: <http://myasset.eu/> a adms:Asset ; dct:description Description . <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description . 21

  22. Metadata cleansing remove static triples Example: remove an erroneous date of a specific asset Query: DELETE DATA { <http://myasset.eu/> dct:issued 2242-01-01 ^^xsd:date . } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description ; dct:issued 2242-01-01 ^^xsd:date . <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description . 22

  23. Metadata cleansing modify static triples Example: modify the title of a specific asset Query: DELETE DATA { <http://myasset.eu/> dct:title Asset name @en . } INSERT DATA { <http://myasset.eu/> dct:title My asset name @en . } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description . <http://myasset.eu/> a adms:Asset ; dct:title My asset name @en ; dct:description Description . 23

  24. Metadata cleansing add triples based on query results Example: add asset type for all assets whose name contain Schema Query: INSERT { ?asset dct:type <http://purl.org/adms/assettype/Schema> . } WHERE { ?asset a adms:Asset ; dct:title ?title . FILTER(CONTAINS(?title, Schema )) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title My Asset Schema ; dct:description Description . <http://myasset.eu/> a adms:Asset ; dct:title My Asset Schema ; dct:description Description ; dct:type <http://purl.org/adms/assettype/Schema> . <http://yourasset.eu/> a adms:Asset ; dct:title Your Asset Vocabulary . <http://yourasset.eu/> a adms:Asset ; dct:title Your Asset Vocabulary . 24

  25. Metadata cleansing remove triples based on query results Example: remove all asset modification dates in the future Query: DELETE { ?asset dct:modified ?date . } WHERE { ?asset a adms:Asset ; dct:modified ?date . FILTER(?date > NOW()) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:modified 2242-01-01T00:00:00Z ^^xsd:dateTime . <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description . <http://yourasset.eu/> a adms:Asset ; dct:title Your Asset Vocabulary ; dct:modified 2000-08-12T11:42:22Z ^^xsd:dateTime . <http://yourasset.eu/> a adms:Asset ; dct:title Your Asset Vocabulary ; dct:modified 2000-08-12T11:42:22Z ^^xsd:dateTime . 25

  26. Metadata cleansing modify triples based on query results Example: replace a word in all asset titles Query: DELETE { ?asset dct:title ?title . } INSERT { ?asset dct:title ?newtitle . } WHERE { ?asset a adms:Asset ; dct:title ?title . BIND(REPLACE(?title, grt , great ) AS ?newtitle) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title My grt asset . <http://myasset.eu/> a adms:Asset ; dct:title My great asset . <http://yourasset.eu/> a adms:Asset ; dct:title Your asset . <http://yourasset.eu/> a adms:Asset ; dct:title Your asset . 26

  27. Metadata cleansing Proposed fixes for 3 common issues Ensure all text fields have a language tag Transform date strings into xsd:dateTime values Add missing asset modification dates 27

  28. Metadata cleansing Ensure all text fields have a language tag Query: DELETE { ?s ?p ?o . } INSERT { ?s ?p ?olang . } WHERE { ?s ?p ?o . FILTER(?p IN (foaf:name, dct:title, dct:description)) FILTER(LANG(?o) = ) BIND(STRLANG(?o, en ) AS ?olang) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title Asset name ; dct:description Description @en . <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description @en . 28

  29. Metadata cleansing Transform YYYY-MM-DD strings into xsd:dateTime values Query: DELETE { ?s dct:modified ?str . } INSERT { ?s dct:modified ?date . } WHERE { ?s dct:modified ?str . BIND(xsd:dateTime(CONCAT(?str, T00:00:00Z )) AS ?date) } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description @en ; dct:modified 2014-02-24 . <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:description Description @en ; dct:modified 2014-02-24T00:00:00Z ^^xsd:dateTime . 29

  30. Metadata cleansing Add missing asset modification dates, copying the creation date Query: INSERT { ?asset dct:modified ?date . } WHERE { ?asset a adms:Asset ; dct:issued ?date . FILTER NOT EXISTS { ?asset dct:modified ?modified } } Before: After: <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:issued 2014-02-24T00:00:00Z ^^xsd:dateTime . <http://myasset.eu/> a adms:Asset ; dct:title Asset name @en ; dct:issued 2014-02-24T00:00:00Z ^^xsd:dateTime ; dct:modified 2014-02-24T00:00:00Z ^^xsd:dateTime . <http://yourasset.eu/> a adms:Asset ; dct:title Your asset @en ; dct:issued 2012-01-01T00:00:00Z ^^xsd:dateTime ; dct:modified 2014-03-04T00:00:00Z ^^xsd:dateTime . <http://yourasset.eu/> a adms:Asset ; dct:title Your asset @en ; dct:issued 2012-01-01T00:00:00Z ^^xsd:dateTime ; dct:modified 2014-03-04T00:00:00Z ^^xsd:dateTime . 30

  31. Outline 1. The context ADMS-AP for describing your interoperability solutions About SPARQL About RDF 2. Construct ADMS-AP compliant RDF Why? Construct queries 3. Metadata cleansing Why? The main queries 3 examples 4. Metadata upload to Joinup 31

  32. Metadata upload to Joinup Upload an RDF/XML file to Joinup 1. On your repository page, click on Upload metadata 2. Select the RDF/XML file 3. Click on Upload the metadata file 2 3 1 32

  33. Metadata upload to Joinup Get the upload status 1. Log in with your account 2. Go to the repository page 3. Click on Report file 33

  34. Metadata upload to Joinup Reading the upload log Lines have the format: 2013-08-30 17:36:02 INFO - Treatment of the repository Timestamp Level Message INFO WARN ERROR Information message Warning (you may ignore it) Error (you should fix it) 34

  35. Related learning resources Introduction to ADMS-AP How to import and export ADMS-AP conform metadata of interoperability solutions on Joinup Introduction to the Open Refine RDF tool Using Joinup as catalogue for interoperability solutions Introduction to the advanced search functionality of EFIR 35

  36. Disclaimers 1. The views expressed in this presentation are purely those of the authors and may not, in any circumstances, be interpreted as stating an official position of the European Commission. The European Commission does not guarantee the accuracy of the information included in this presentation, nor does it accept any responsibility for any use thereof. Reference herein to any specific products, specifications, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favouring by the European Commission. All care has been taken by the author to ensure that s/he has obtained, where necessary, permission to use any parts of manuscripts including illustrations, maps, and graphs, on which intellectual property rights already exist from the titular holder(s) of such rights or from her/his or their legal representative. 2. This presentation has been carefully compiled by PwC, but no representation is made or warranty given (either express or implied) as to the completeness or accuracy of the information it contains. PwC is not liable for the information in this presentation or any decision or consequence based on the use of it. PwC will not be liable for any damages arising from the use of the information contained in this presentation. The information contained in this presentation is of a general nature and is solely for guidance on matters of general interest. This presentation is not a substitute for professional advice on any particular matter. No reader should act on the basis of any matter contained in this publication without considering appropriate professional advice.

  37. Szabolcs.SZEKACS@ec.europa.eu Project Officer Contractors Nikolaos.Loutas@be.pwc.com Joan.Bremers@be.pwc.com Get involved Visit our initiatives ADMS. SW ADMS_logo.png http://www.deviantart.com/download/306854489/new_twitter_logo_by_ockre-d52oyft.png Follow @Joinup_EU on Twitter CISR COMMUNITY OF INTEROPERABILITY SOLUTION REPOSITORIES Join the CISR community on Joinup Joinup and ADMS are funded by the ISA Programme 37

Related


More Related Content