Metadata and RDF in Data Management

undefined

RDF and

triplestores

CMSC 461

Michael Wilson

Reasoning



Relational databases allow us to reason

about data that is organized in a specific

way



Data that models specific relationships



Data that is very cleanly structured



What other reasoning methods are

available to us?

Metadata



“Data about data”



Data that describes other data



Gives context



Example metadata:



Image EXIT data (geolocation, rotation,

etc.)



User statistics



Last saved information in a file

What’s so important?



The context that we gather from

metadata often allows us to understand a

much greater picture



Can correlate and tie metadata together



Calculate statistics on metadata



Understand trends



Infinite possibilities

The depth of metadata



Many systems have their own way of

storing metadata



Database tables may be organized to

house specific metadata



This does not lend itself well to discovering

new types of metadata



Person may have age, DOB



Later want to add new types (friends,

Facebook ID, Twitter ID, etc.)

Metadata structures



RDF



Resource Description Framework



OWL



Web Ontology Language



Ontology – established vocabulary to

describe knowledge within a domain



RDF is more widely used

Schemas



RDF and other structured metadata  formats

allow us to establish a common language to

describe different sorts of metadata



We can make schemas that describe



Social media



Physical location



Job details



Moreover, we can tie them all to one subject



Doesn’t require database reorganization

Why is that cool?



What this means is that we can tie any

arbitrary sets of data together with very

little work on our part



We make a schema that describes a new

domain, and staple that information onto

an existing subject

Triples



Within these schemas, data is

conceptually organized as



<subject> <predicate> <object>



Subject



The subject of the expression



Predicate



The relationship between the subject and object



Object



The direct object of the expression



These expressions are called “triples”

Triple examples



Examples?

Storing triples



Since we are often interesting in large

amounts of data, we need to think on

how to store these



Triplestores



Pretty obvious



What do these give us over doing

something like storing the information in a

database?

Triplestore querying



Triplestores can also be queried



SQL is more limited for the kinds of queries

we’d like to be able to make



SPARQL



The acronym stands for:



SPARQL Protocol and RDF Query Language

SPARQL



SPARQL is a SQL-like query language



Allows us to query on the various schemas

we have assigned to our subjects



SPARQL queries can look surprisingly

readable

SPARQL example

PREFIX abc:

<http://example.com/exampleOntology#>

SELECT ?capital ?country

WHERE {

  ?x abc:cityname ?capital ;

     abc:isCapitalOf ?y .

  ?y abc:countryname ?country ;

     abc:isInContinent abc:Africa .

Querying power



Using SPARQL, you can make extremely

deep, powerful queries and reason very

intuitively on the data present in a

triplestore



Organizing data this way allows

computers to actually be able to reason

on data as well

Caveats



All this tech is SUPER new



All tied very heavily into the Semantic Web



Basically introduce a system like this into the

web at large



Metadata stored about web pages,

computers can reason about them



Much of this is a moving target



Not a whole lot of production applications

using this stuff yet

Tools



There are a few triplestore servers and

other tools you can use



Jena



Apache project



Framework that allows for Semantic Web

concepts to be employed



Can query using SPARQL



Jena can use Postgres in the background

More tools



RDFLib



https://github.com/RDFLib



Python library for RDF



Can run entirely in memory



Good for experimentation purposes and

more

Slide Note

Embed Share

Download

Explore the significance of metadata in data management, the use of RDF and triple stores in organizing data, different reasoning methods available, and the importance of metadata structures. Learn how schemas in RDF allow for easy integration of diverse data types without requiring database reorganization.

pik_neu Follow

Uploaded on Sep 24, 2024 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

RDF and triplestores CMSC 461 Michael Wilson

Reasoning Relational databases allow us to reason about data that is organized in a specific way Data that models specific relationships Data that is very cleanly structured What other reasoning methods are available to us?

Metadata Data about data Data that describes other data Gives context Example metadata: Image EXIT data (geolocation, rotation, etc.) User statistics Last saved information in a file

Whats so important? The context that we gather from metadata often allows us to understand a much greater picture Can correlate and tie metadata together Calculate statistics on metadata Understand trends Infinite possibilities

The depth of metadata Many systems have their own way of storing metadata Database tables may be organized to house specific metadata This does not lend itself well to discovering new types of metadata Person may have age, DOB Later want to add new types (friends, Facebook ID, Twitter ID, etc.)

Metadata structures RDF Resource Description Framework OWL Web Ontology Language Ontology established vocabulary to describe knowledge within a domain RDF is more widely used

Schemas RDF and other structured metadata formats allow us to establish a common language to describe different sorts of metadata We can make schemas that describe Social media Physical location Job details Moreover, we can tie them all to one subject Doesn t require database reorganization

Why is that cool? What this means is that we can tie any arbitrary sets of data together with very little work on our part We make a schema that describes a new domain, and staple that information onto an existing subject

Triples Within these schemas, data is conceptually organized as <subject> <predicate> <object> Subject The subject of the expression Predicate The relationship between the subject and object Object The direct object of the expression These expressions are called triples

Triple examples Examples?

Storing triples Since we are often interesting in large amounts of data, we need to think on how to store these Triplestores Pretty obvious What do these give us over doing something like storing the information in a database?

Triplestore querying Triplestores can also be queried SQL is more limited for the kinds of queries we d like to be able to make SPARQL The acronym stands for: SPARQL Protocol and RDF Query Language

SPARQL SPARQL is a SQL-like query language Allows us to query on the various schemas we have assigned to our subjects SPARQL queries can look surprisingly readable

SPARQL example PREFIX abc: <http://example.com/exampleOntology#> SELECT ?capital ?country WHERE { ?x abc:cityname ?capital ; abc:isCapitalOf ?y . ?y abc:countryname ?country ; abc:isInContinent abc:Africa .

Querying power Using SPARQL, you can make extremely deep, powerful queries and reason very intuitively on the data present in a triplestore Organizing data this way allows computers to actually be able to reason on data as well

Caveats All this tech is SUPER new All tied very heavily into the Semantic Web Basically introduce a system like this into the web at large Metadata stored about web pages, computers can reason about them Much of this is a moving target Not a whole lot of production applications using this stuff yet

Tools There are a few triplestore servers and other tools you can use Jena Apache project Framework that allows for Semantic Web concepts to be employed Can query using SPARQL Jena can use Postgres in the background