Enhancing Relevance with Global Scores for DBpedia Facts
Explore the advantages of structured data over unstructured data, learn about querying YAGO and DBpedia for specific classes, and discover challenges and strategies in text extraction and ranking within the DBpedia framework. The overview delves into web applications, ranking strategies, and user studies using diverse technologies.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb 03/31/2014
Structured Data 1 Advantages of structured data over unstructured data: Search for explicit facts Summarization of possibly interesting information Automated knowledge discovery Google Knowledge Graph A handful of salient facts about the query entity. RDF Knowledge bases DBpedia, YAGO/NAGA Assigning Global Relevance Scores to DBpedia Facts
Querying YAGO 2 Asking for classes to which Albert Einstein belongs Assigning Global Relevance Scores to DBpedia Facts
Querying DBpedia 3 Asking for classes to which Albert Einstein belongs predicate object rdf:type owl:Thing rdf:type dbpedia:Agent rdf:type dbpedia:Person rdf:type dbpedia:Scientist rdf:type umbel:Scientist rdf:type schema:Person rdf:type yago:Astronomer109818343 rdf:type foaf:Person rdf:type 19th-centuryAmericanPeople rdf:type 19th-centuryGermanPeople Assigning Global Relevance Scores to DBpedia Facts
Challenge 4 select distinct ?p, ?o where { dbpedia:Barack_Obama ?p ?o} Web Documents p c p c owl:orderInOffice President of the United States rdf:type owl:Thing rdf:type dbpedia:Person dbpedia:type dbpedia:Politician rdf:type yago:Person100007846 dbpedia:spouse dbpedia:Michelle_Obama ... ... owl:birthPlace dbpedia:Honolulu rdf:type dbpedia:Politician dbpprop:residence dbpedia:White_House ... ... .... ..... dbpedia:spouse dbpedia:Michelle_Obama rdf:type owl:Thing Assigning Global Relevance Scores to DBpedia Facts
Challenges 5 Architecture Text extraction, score computation/ranking, query processing Big Data DBpedia 3.8, ClueWeb corpus Ranking Strategies Imrove the ranking results Evaluation Conduction of user studies Assigning Global Relevance Scores to DBpedia Facts
Overview 6 Web application (Django) Languages Python Ranking strategies Ranking strategies Java Intra DBpedia strategies Web Corpus strategies SPARQL JavaScript Frameworks: User Studies Querying Django Lucene DBpedia Endpoint (Apache Jena) Web corpus (Lucene Index) Application Data (Postgres) 6 Assigning Global Relevance Scores to DBpedia Facts
Ranking Facts 7 Query types: Subject queries - return all physicists SELECT ?s { ?s type Physicist } Property queries - return all facts related to Einstein SELECT ?p ?o { Albert_Einstein ?p ?o } Ranking strategies Ranking by frequency and document frequency Ranking by information diversity Random walk Web-based co-occurrence statistics Assigning Global Relevance Scores to DBpedia Facts
Ranking by frequency and document frequency 8 ds={(s,x,y)|(s,x,y) GKB} [Shady et al ESWC 11] dp={(x,p,y)|(x,p,y) GKB} subject document of Albert Einstein predicate document of topic <Albert_Einstein> do={(x,y,o)|(x,y,o) GKB} freq(s;p)=|{(s, p,x)|(s, p,x) GKB}| <topic> <topic> <topic> <topic> <type> <type> <type> <residence> "Switzerland"; <residence> "Austria-Hungary"; <residence> "German Empire"; <spouse> "Mileva Maric"; ... <Nobel_laureates>; <Theoretical_physicists>; <German_physicists>; <American_inventors>; <Scientist>; <Person>; <Thing>; <Newton> <topic> <Theoretical_physicists>. <Newton> <topic> <Nobel_laureates>. <Newton> <topic> <Mathematicians>. <Newton> <topic> <Optical_physicists>. <Newton> <topic> <History_of_calculus>. <Newton> <topic> <English_alchemists>. |dp| object document of Theoretical physicists dfreq(s)=|{p|(s, p,x) ds}|+|{o|(s,x,o) ds}| |D| <Isaac_Newton> <topic> <Theoretical_physicists>. <Albert_Einstein> <topic> <Theoretical_physicists>. <Bruno_Coppi> <topic> <Theoretical_physicists>. <Ravi_Gomatam> <topic> <Theoretical_physicists>. ... <Einstein> <topic> <Theoretical_physicists>. <Einstein> <topic> <Nobel_laureates>. <Einstein> <topic> <German_physicists>. <Einstein> <topic> <American_inventors>. Assigning Global Relevance Scores to DBpedia Facts
Ranking by frequency and document frequency Isaac Newton academicAdvisor ...; birthDate ...; birthPlace ...; comment ...; ethnicity ...; field ...; influenced ...; influencedBy ...; knownFor ...; label ...; notableStudent ...; subject ...; subject ...; type ...; 9 Ravi Gomatam Subject queries: subject ...; subject ...; subject ...; subject ...; subject ...; Global relevance Score(s;p,o)= freq(s;p) freq(s;o) log(1+dfreq(s)) Assigning Global Relevance Scores to DBpedia Facts
Limitations for Property Queries 10 Property queries: Global relevant but distinctive to the given subject type Person vs. type Scientist freq(p;s) freq(o;s) log(1+dfreq(p)) log(1+dfreq(o)) Score(p,o;s)= Assigning Global Relevance Scores to DBpedia Facts
Ranking by diversity 11 Following a probabilistic model Property queries: Properties and objects that are as discriminative as possible P(p,o|s) aP(o| p) P(p|s)+(1-a)P(p|o) P(o|s) Subject queries: P(s| p,o) P(s) P(p,o|s) Assigning Global Relevance Scores to DBpedia Facts
Random Walk Model 12 Consider the knowledge base as a directed graph Already applied in [Kasneci CIKM 09] Problem: literals have no outgoing link Use Wiki Pagelinks and Infobox Property Mappings Entities with high indegree, such as countries, are favored Good for subject queries Bad for property queries Assigning Global Relevance Scores to DBpedia Facts
Co-occurrence statistics 13 Web Documents Lemur Project Clueweb09 Category-B web corpus 50 million web documents (1.5 TB) Only English-language documents Includes approx. 2.7 million Wikipedia articles Create an inverted index Consider different word distance limits as documents Rank subject-object pairs Albert Einstein and Physicist # par(s,o) Store only pairwise co-occurrence: Compute frequency of s: # par(s)= # par(s,o) (s,x,o) G Assigning Global Relevance Scores to DBpedia Facts
Evaluation 14 User study 1 User study 2 8 queries 8+20 queries all results top-10 results of best 4 approaches side-by-side 10 users 12 users 19 approaches/ configurations Best 3 approaches from user study 1 1-4: irrelevant- highly relevant Assigning Global Relevance Scores to DBpedia Facts
Top 4 Approaches in User study 1 15 Assigning Global Relevance Scores to DBpedia Facts
User study 2 16 Assigning Global Relevance Scores to DBpedia Facts
Results Example:Theoretical Physicists 17 DBpedia Random Walk Model Subject Albert Einstein Isaac Newton Galileo Galilei James Clerk Maxwell Richard Feynman Stephen Hawking Max Planck Enrico Fermi Werner Heisenberg Pierre-Simon Laplace Assigning Global Relevance Scores to DBpedia Facts
Results Example: Albert Einstein 18 DBpedia Co-occurrence statistics predicate object predicate object rdf:type owl:Thing fields Physics rdf:type dbpedia:Agent field Physics rdf:type dbpedia:Person deathPlace United States rdf:type dbpedia:Scientist placeOfDeath United States rdf:type umbel:Scientist shortDescription Physicists rdf:type schema:Person description Physicist rdf:type yago:Astronomer109818343 type Scientist ethnicity Jewish rdf:type foaf:Person subject Einstein family rdf:type 19th-centuryAmericanPeople residence Switzerland rdf:type 19th-centuryGermanPeople Assigning Global Relevance Scores to DBpedia Facts
Conclusions 19 Investigated multiple approaches to rank DBpedia facts Information theory, statistical reasoning, random walk, and co-occurrence statistics in web documents DBpedia Knowledge base already provides enough information to improve the ranking of results Improvement of property queries through web-based co- occurrence statistics We provide the annotated datasets at https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/ Assigning Global Relevance Scores to DBpedia Facts