Enhancing Relevance with Global Scores for DBpedia Facts

undefined
Assigning Global Relevance Scores
to DBpedia Facts
Philipp Langer, Patrick Schulze, Stefan George,
Tobias Metzke, 
Ziawasch Abedjan
, Gjergji
Kasneci
DESWeb 03/31/2014
Structured Data
 
Advantages of structured data over
unstructured data:
Search for explicit facts
Summarization of possibly
interesting information
Automated knowledge discovery
Google Knowledge Graph
 
 
 
RDF Knowledge bases
DBpedia, YAGO/NAGA
Assigning Global Relevance Scores to DBpedia Facts
1
A handful of salient facts
about the query entity.
Asking for classes to which Albert Einstein belongs
Assigning Global Relevance Scores to DBpedia Facts
2
Querying YAGO
Asking for classes to which Albert Einstein belongs
Assigning Global Relevance Scores to DBpedia Facts
3
Querying DBpedia
Challenge
 
Assigning Global Relevance Scores to DBpedia Facts
4
select distinct ?p, ?o where
{ dbpedia:Barack_Obama ?p ?o}
Web Documents
Challenges
 
Assigning Global Relevance Scores to DBpedia Facts
5
B
i
g
 
D
a
t
a
DBpedia 3.8,
ClueWeb corpus
A
r
c
h
i
t
e
c
t
u
r
e
Text extraction, score
computation/ranking, query
processing
E
v
a
l
u
a
t
i
o
n
Conduction of user studies
R
a
n
k
i
n
g
 
S
t
r
a
t
e
g
i
e
s
Imrove the ranking results
Overview
Assigning Global Relevance Scores to DBpedia Facts
6
Languages
Python
Java
SPARQL
JavaScript
Frameworks:
Django
Lucene
Web application (Django)
DBpedia
Endpoint
(Apache Jena)
Application Data
(Postgres)
Web corpus
(Lucene Index)
User
Studies
Querying
Ranking strategies
Intra
DBpedia
strategies
Web
Corpus
strategies
6
Ranking Facts
 
Query types:
Subject queries - return all physicists
 
Property queries - return all facts related to Einstein
 
 
Ranking strategies
Ranking by frequency and document frequency
Ranking by information diversity
Random walk
Web-based co-occurrence statistics
Assigning Global Relevance Scores to DBpedia Facts
7
SELECT 
?p 
?o
 
{ 
Albert_Einstein
 
?p 
?o
 }
SELECT 
?s 
{ 
?s
 
type 
Physicist 
}
Ranking by frequency and document
frequency
Assigning Global Relevance Scores to DBpedia Facts
8
<Albert_Einstein>
<
topic
>
 
<Nobel_laureates>;
<
topic
>
 
<Theoretical_physicists>;
<
topic
>
 
<German_physicists>;
<
topic
>
 
<American_inventors>;
<type>
 
<Scientist>;
<type>
 
<Person>;
<type>
 
<Thing>;
<residence>
 
"Switzerland";
<residence>
 
"Austria-Hungary";
<residence>
 
"German Empire";
<spouse>
 
"Mileva Maric";
...
 
subject document of 
„Albert Einstein“
<Newton> <
topic
> <Theoretical_physicists>.
<Newton> <
topic
> <Nobel_laureates>.
<Newton> <
topic
> <Mathematicians>.
<Newton> <
topic
> <Optical_physicists>.
<Newton> <
topic
> <History_of_calculus>.
<Newton> <
topic
> <English_alchemists>.
<Einstein>
 
<
topic
> <Theoretical_physicists>.
<Einstein>
 
<
topic
> <Nobel_laureates>.
<Einstein>
 
<
topic
> <German_physicists>.
<Einstein>
 
<
topic
> <American_inventors>.
 
predicate document of
„topic“
<Isaac_Newton>   <
topic
> <Theoretical_physicists>.
<Albert_Einstein>
 
<
topic
> <Theoretical_physicists>.
<Bruno_Coppi>     <
topic
> <Theoretical_physicists>.
<Ravi_Gomatam> <
topic
> <Theoretical_physicists>.
 ...
 
object document of
„Theoretical physicists“
[Shady et al ESWC’11]
Ranking by frequency and document
frequency
Subject queries:
Global relevance
Assigning Global Relevance Scores to DBpedia Facts
9
Isaac Newton
academicAdvisor ...;
birthDate ...;
birthPlace ...;
comment ...;
ethnicity ...;
field ...;
influenced ...;
influencedBy ...;
knownFor ...;
label ...;
notableStudent ...;
subject ...;
subject ...;
type ...;
Ravi Gomatam
subject  ...;
subject  ...;
subject  ...;
subject  ...;
subject  ...;
Limitations for Property Queries
Property queries:
Global relevant but distinctive to the given subject
type
 
Person
 vs. 
type
 
Scientist
Assigning Global Relevance Scores to DBpedia Facts
10
Ranking by diversity
 
Following a probabilistic model
 Property queries:
Properties and objects that are as discriminative as
possible
 
 
Subject queries:
Assigning Global Relevance Scores to DBpedia Facts
11
Random Walk Model
 
 
Consider the knowledge base as a directed graph
Already applied in [Kasneci CIKM’09]
Problem: literals have no outgoing link
Use Wiki Pagelinks and Infobox Property Mappings
Entities with high indegree, such as countries, are favored
Good for subject queries
Bad for property queries
Assigning Global Relevance Scores to DBpedia Facts
12
Web Documents
Co-occurrence statistics
 
Lemur Project Clueweb09 Category-B web corpus
50 million web documents (1.5 TB)
Only English-language documents
Includes approx. 2.7 million Wikipedia articles
Create an inverted index
Consider different word distance limits as documents
Rank subject-object pairs
„Albert Einstein“ and „Physicist“
Store only pairwise co-occurrence:
Compute frequency of s:
Assigning Global Relevance Scores to DBpedia Facts
13
Evaluation
 
User study 1
8 queries
all results
12 users
19 approaches/
configurations
 
1-4: irrelevant- highly relevant
 
 
 
User study 2
8+20 queries
top-10 results of best 4
approaches side-by-side 10
users
Best 3 approaches from
user study 1
 
Assigning Global Relevance Scores to DBpedia Facts
14
Top 4 Approaches in User study 1
Assigning Global Relevance Scores to DBpedia Facts
15
User study 2
Assigning Global Relevance Scores to DBpedia Facts
16
Results Example:Theoretical
Physicists
Assigning Global Relevance Scores to DBpedia Facts
17
DBpedia
Random Walk Model
Results Example: Albert Einstein
DBpedia                                         Co-occurrence statistics
Assigning Global Relevance Scores to DBpedia Facts
18
Conclusions
Investigated multiple approaches to rank DBpedia facts
Information theory, statistical reasoning, random walk, and
co-occurrence statistics in web documents
DBpedia Knowledge base already provides enough information to
improve the ranking of results
Improvement of property queries through web-based co-
occurrence statistics
We provide the annotated datasets at
https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/ 
Assigning Global Relevance Scores to DBpedia Facts
19
Slide Note
Embed
Share

Explore the advantages of structured data over unstructured data, learn about querying YAGO and DBpedia for specific classes, and discover challenges and strategies in text extraction and ranking within the DBpedia framework. The overview delves into web applications, ranking strategies, and user studies using diverse technologies.

  • DBpedia
  • Structured Data
  • Relevance Scores
  • YAGO
  • Querying

Uploaded on Dec 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb 03/31/2014

  2. Structured Data 1 Advantages of structured data over unstructured data: Search for explicit facts Summarization of possibly interesting information Automated knowledge discovery Google Knowledge Graph A handful of salient facts about the query entity. RDF Knowledge bases DBpedia, YAGO/NAGA Assigning Global Relevance Scores to DBpedia Facts

  3. Querying YAGO 2 Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts

  4. Querying DBpedia 3 Asking for classes to which Albert Einstein belongs predicate object rdf:type owl:Thing rdf:type dbpedia:Agent rdf:type dbpedia:Person rdf:type dbpedia:Scientist rdf:type umbel:Scientist rdf:type schema:Person rdf:type yago:Astronomer109818343 rdf:type foaf:Person rdf:type 19th-centuryAmericanPeople rdf:type 19th-centuryGermanPeople Assigning Global Relevance Scores to DBpedia Facts

  5. Challenge 4 select distinct ?p, ?o where { dbpedia:Barack_Obama ?p ?o} Web Documents p c p c owl:orderInOffice President of the United States rdf:type owl:Thing rdf:type dbpedia:Person dbpedia:type dbpedia:Politician rdf:type yago:Person100007846 dbpedia:spouse dbpedia:Michelle_Obama ... ... owl:birthPlace dbpedia:Honolulu rdf:type dbpedia:Politician dbpprop:residence dbpedia:White_House ... ... .... ..... dbpedia:spouse dbpedia:Michelle_Obama rdf:type owl:Thing Assigning Global Relevance Scores to DBpedia Facts

  6. Challenges 5 Architecture Text extraction, score computation/ranking, query processing Big Data DBpedia 3.8, ClueWeb corpus Ranking Strategies Imrove the ranking results Evaluation Conduction of user studies Assigning Global Relevance Scores to DBpedia Facts

  7. Overview 6 Web application (Django) Languages Python Ranking strategies Ranking strategies Java Intra DBpedia strategies Web Corpus strategies SPARQL JavaScript Frameworks: User Studies Querying Django Lucene DBpedia Endpoint (Apache Jena) Web corpus (Lucene Index) Application Data (Postgres) 6 Assigning Global Relevance Scores to DBpedia Facts

  8. Ranking Facts 7 Query types: Subject queries - return all physicists SELECT ?s { ?s type Physicist } Property queries - return all facts related to Einstein SELECT ?p ?o { Albert_Einstein ?p ?o } Ranking strategies Ranking by frequency and document frequency Ranking by information diversity Random walk Web-based co-occurrence statistics Assigning Global Relevance Scores to DBpedia Facts

  9. Ranking by frequency and document frequency 8 ds={(s,x,y)|(s,x,y) GKB} [Shady et al ESWC 11] dp={(x,p,y)|(x,p,y) GKB} subject document of Albert Einstein predicate document of topic <Albert_Einstein> do={(x,y,o)|(x,y,o) GKB} freq(s;p)=|{(s, p,x)|(s, p,x) GKB}| <topic> <topic> <topic> <topic> <type> <type> <type> <residence> "Switzerland"; <residence> "Austria-Hungary"; <residence> "German Empire"; <spouse> "Mileva Maric"; ... <Nobel_laureates>; <Theoretical_physicists>; <German_physicists>; <American_inventors>; <Scientist>; <Person>; <Thing>; <Newton> <topic> <Theoretical_physicists>. <Newton> <topic> <Nobel_laureates>. <Newton> <topic> <Mathematicians>. <Newton> <topic> <Optical_physicists>. <Newton> <topic> <History_of_calculus>. <Newton> <topic> <English_alchemists>. |dp| object document of Theoretical physicists dfreq(s)=|{p|(s, p,x) ds}|+|{o|(s,x,o) ds}| |D| <Isaac_Newton> <topic> <Theoretical_physicists>. <Albert_Einstein> <topic> <Theoretical_physicists>. <Bruno_Coppi> <topic> <Theoretical_physicists>. <Ravi_Gomatam> <topic> <Theoretical_physicists>. ... <Einstein> <topic> <Theoretical_physicists>. <Einstein> <topic> <Nobel_laureates>. <Einstein> <topic> <German_physicists>. <Einstein> <topic> <American_inventors>. Assigning Global Relevance Scores to DBpedia Facts

  10. Ranking by frequency and document frequency Isaac Newton academicAdvisor ...; birthDate ...; birthPlace ...; comment ...; ethnicity ...; field ...; influenced ...; influencedBy ...; knownFor ...; label ...; notableStudent ...; subject ...; subject ...; type ...; 9 Ravi Gomatam Subject queries: subject ...; subject ...; subject ...; subject ...; subject ...; Global relevance Score(s;p,o)= freq(s;p) freq(s;o) log(1+dfreq(s)) Assigning Global Relevance Scores to DBpedia Facts

  11. Limitations for Property Queries 10 Property queries: Global relevant but distinctive to the given subject type Person vs. type Scientist freq(p;s) freq(o;s) log(1+dfreq(p)) log(1+dfreq(o)) Score(p,o;s)= Assigning Global Relevance Scores to DBpedia Facts

  12. Ranking by diversity 11 Following a probabilistic model Property queries: Properties and objects that are as discriminative as possible P(p,o|s) aP(o| p) P(p|s)+(1-a)P(p|o) P(o|s) Subject queries: P(s| p,o) P(s) P(p,o|s) Assigning Global Relevance Scores to DBpedia Facts

  13. Random Walk Model 12 Consider the knowledge base as a directed graph Already applied in [Kasneci CIKM 09] Problem: literals have no outgoing link Use Wiki Pagelinks and Infobox Property Mappings Entities with high indegree, such as countries, are favored Good for subject queries Bad for property queries Assigning Global Relevance Scores to DBpedia Facts

  14. Co-occurrence statistics 13 Web Documents Lemur Project Clueweb09 Category-B web corpus 50 million web documents (1.5 TB) Only English-language documents Includes approx. 2.7 million Wikipedia articles Create an inverted index Consider different word distance limits as documents Rank subject-object pairs Albert Einstein and Physicist # par(s,o) Store only pairwise co-occurrence: Compute frequency of s: # par(s)= # par(s,o) (s,x,o) G Assigning Global Relevance Scores to DBpedia Facts

  15. Evaluation 14 User study 1 User study 2 8 queries 8+20 queries all results top-10 results of best 4 approaches side-by-side 10 users 12 users 19 approaches/ configurations Best 3 approaches from user study 1 1-4: irrelevant- highly relevant Assigning Global Relevance Scores to DBpedia Facts

  16. Top 4 Approaches in User study 1 15 Assigning Global Relevance Scores to DBpedia Facts

  17. User study 2 16 Assigning Global Relevance Scores to DBpedia Facts

  18. Results Example:Theoretical Physicists 17 DBpedia Random Walk Model Subject Albert Einstein Isaac Newton Galileo Galilei James Clerk Maxwell Richard Feynman Stephen Hawking Max Planck Enrico Fermi Werner Heisenberg Pierre-Simon Laplace Assigning Global Relevance Scores to DBpedia Facts

  19. Results Example: Albert Einstein 18 DBpedia Co-occurrence statistics predicate object predicate object rdf:type owl:Thing fields Physics rdf:type dbpedia:Agent field Physics rdf:type dbpedia:Person deathPlace United States rdf:type dbpedia:Scientist placeOfDeath United States rdf:type umbel:Scientist shortDescription Physicists rdf:type schema:Person description Physicist rdf:type yago:Astronomer109818343 type Scientist ethnicity Jewish rdf:type foaf:Person subject Einstein family rdf:type 19th-centuryAmericanPeople residence Switzerland rdf:type 19th-centuryGermanPeople Assigning Global Relevance Scores to DBpedia Facts

  20. Conclusions 19 Investigated multiple approaches to rank DBpedia facts Information theory, statistical reasoning, random walk, and co-occurrence statistics in web documents DBpedia Knowledge base already provides enough information to improve the ranking of results Improvement of property queries through web-based co- occurrence statistics We provide the annotated datasets at https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/ Assigning Global Relevance Scores to DBpedia Facts

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#