Metadata and RDF in Data Management

undefined
 
RDF and
triplestores
 
CMSC 461
Michael Wilson
 
Reasoning
 
Relational databases allow us to reason
about data that is organized in a specific
way
Data that models specific relationships
Data that is very cleanly structured
What other reasoning methods are
available to us?
 
Metadata
 
“Data about data”
Data that describes other data
Gives context
Example metadata:
Image EXIT data (geolocation, rotation,
etc.)
User statistics
Last saved information in a file
 
What’s so important?
 
The context that we gather from
metadata often allows us to understand a
much greater picture
Can correlate and tie metadata together
Calculate statistics on metadata
Understand trends
Infinite possibilities
 
The depth of metadata
 
Many systems have their own way of
storing metadata
Database tables may be organized to
house specific metadata
This does not lend itself well to discovering
new types of metadata
Person may have age, DOB
Later want to add new types (friends,
Facebook ID, Twitter ID, etc.)
 
Metadata structures
 
RDF
Resource Description Framework
OWL
Web Ontology Language
Ontology – established vocabulary to
describe knowledge within a domain
RDF is more widely used
 
Schemas
 
RDF and other structured metadata  formats
allow us to establish a common language to
describe different sorts of metadata
We can make schemas that describe
Social media
Physical location
Job details
Moreover, we can tie them all to one subject
Doesn’t require database reorganization
 
Why is that cool?
 
What this means is that we can tie any
arbitrary sets of data together with very
little work on our part
We make a schema that describes a new
domain, and staple that information onto
an existing subject
 
Triples
 
Within these schemas, data is
conceptually organized as
<subject> <predicate> <object>
Subject
The subject of the expression
Predicate
The relationship between the subject and object
Object
The direct object of the expression
These expressions are called “triples”
 
Triple examples
 
Examples?
 
Storing triples
 
Since we are often interesting in large
amounts of data, we need to think on
how to store these
Triplestores
Pretty obvious
What do these give us over doing
something like storing the information in a
database?
 
Triplestore querying
 
Triplestores can also be queried
SQL is more limited for the kinds of queries
we’d like to be able to make
SPARQL
The acronym stands for:
SPARQL Protocol and RDF Query Language
 
SPARQL
 
SPARQL is a SQL-like query language
Allows us to query on the various schemas
we have assigned to our subjects
SPARQL queries can look surprisingly
readable
 
SPARQL example
 
PREFIX abc:
<http://example.com/exampleOntology#>
SELECT ?capital ?country
WHERE {
  ?x abc:cityname ?capital ;
     abc:isCapitalOf ?y .
  ?y abc:countryname ?country ;
     abc:isInContinent abc:Africa .
 
Querying power
 
Using SPARQL, you can make extremely
deep, powerful queries and reason very
intuitively on the data present in a
triplestore
Organizing data this way allows
computers to actually be able to reason
on data as well
 
Caveats
 
All this tech is SUPER new
All tied very heavily into the Semantic Web
Basically introduce a system like this into the
web at large
Metadata stored about web pages,
computers can reason about them
Much of this is a moving target
Not a whole lot of production applications
using this stuff yet
 
Tools
 
There are a few triplestore servers and
other tools you can use
Jena
Apache project
Framework that allows for Semantic Web
concepts to be employed
Can query using SPARQL
Jena can use Postgres in the background
 
More tools
 
RDFLib
https://github.com/RDFLib
Python library for RDF
Can run entirely in memory
Good for experimentation purposes and
more
Slide Note
Embed
Share

Explore the significance of metadata in data management, the use of RDF and triple stores in organizing data, different reasoning methods available, and the importance of metadata structures. Learn how schemas in RDF allow for easy integration of diverse data types without requiring database reorganization.

  • Metadata
  • RDF
  • Triple stores
  • Data management
  • Reasoning

Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. RDF and triplestores CMSC 461 Michael Wilson

  2. Reasoning Relational databases allow us to reason about data that is organized in a specific way Data that models specific relationships Data that is very cleanly structured What other reasoning methods are available to us?

  3. Metadata Data about data Data that describes other data Gives context Example metadata: Image EXIT data (geolocation, rotation, etc.) User statistics Last saved information in a file

  4. Whats so important? The context that we gather from metadata often allows us to understand a much greater picture Can correlate and tie metadata together Calculate statistics on metadata Understand trends Infinite possibilities

  5. The depth of metadata Many systems have their own way of storing metadata Database tables may be organized to house specific metadata This does not lend itself well to discovering new types of metadata Person may have age, DOB Later want to add new types (friends, Facebook ID, Twitter ID, etc.)

  6. Metadata structures RDF Resource Description Framework OWL Web Ontology Language Ontology established vocabulary to describe knowledge within a domain RDF is more widely used

  7. Schemas RDF and other structured metadata formats allow us to establish a common language to describe different sorts of metadata We can make schemas that describe Social media Physical location Job details Moreover, we can tie them all to one subject Doesn t require database reorganization

  8. Why is that cool? What this means is that we can tie any arbitrary sets of data together with very little work on our part We make a schema that describes a new domain, and staple that information onto an existing subject

  9. Triples Within these schemas, data is conceptually organized as <subject> <predicate> <object> Subject The subject of the expression Predicate The relationship between the subject and object Object The direct object of the expression These expressions are called triples

  10. Triple examples Examples?

  11. Storing triples Since we are often interesting in large amounts of data, we need to think on how to store these Triplestores Pretty obvious What do these give us over doing something like storing the information in a database?

  12. Triplestore querying Triplestores can also be queried SQL is more limited for the kinds of queries we d like to be able to make SPARQL The acronym stands for: SPARQL Protocol and RDF Query Language

  13. SPARQL SPARQL is a SQL-like query language Allows us to query on the various schemas we have assigned to our subjects SPARQL queries can look surprisingly readable

  14. SPARQL example PREFIX abc: <http://example.com/exampleOntology#> SELECT ?capital ?country WHERE { ?x abc:cityname ?capital ; abc:isCapitalOf ?y . ?y abc:countryname ?country ; abc:isInContinent abc:Africa .

  15. Querying power Using SPARQL, you can make extremely deep, powerful queries and reason very intuitively on the data present in a triplestore Organizing data this way allows computers to actually be able to reason on data as well

  16. Caveats All this tech is SUPER new All tied very heavily into the Semantic Web Basically introduce a system like this into the web at large Metadata stored about web pages, computers can reason about them Much of this is a moving target Not a whole lot of production applications using this stuff yet

  17. Tools There are a few triplestore servers and other tools you can use Jena Apache project Framework that allows for Semantic Web concepts to be employed Can query using SPARQL Jena can use Postgres in the background

  18. More tools RDFLib https://github.com/RDFLib Python library for RDF Can run entirely in memory Good for experimentation purposes and more

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#