Classifying Entities into an Incomplete Ontology: Exploratory EM Approach

Slide Note
Embed
Share

The research discusses methods for hierarchical classification of entities into incomplete ontologies. It explores the challenges of evolving web-scale datasets and the need for classifying entities in an incomplete ontology structure. The Hierarchical Exploratory EM model is detailed, providing insights into initializing models, prediction steps, creating new classes, and updating model parameters. By iteratively discovering new classes and adhering to class constraints, the model assists in labeling all data points effectively.


Uploaded on Sep 24, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CLASSIFYING ENTITIES INTO AN INCOMPLETE ONTOLOGY Bhavana Dalvi, William W. Cohen, Jamie Callan School of Computer Science, Carnegie Mellon University

  2. Motivation Existing Techniques Semi-supervised Hierarchical Classification: Carlson WSDM 10 Extending knowledge bases: Finding new relations or attributes of existing concepts Mohamed et al. EMNLP 11 Unsupervised ontology discovery: Adams et al. NIPS 10, Blei et al. JACM 10, Reisinger et al. ACL 09 Evolving Web-scale datasets Billions of entities and hundreds of thousands of concepts Difficult to create a complete ontology Hierarchical classification of entities into incomplete ontologies is needed

  3. Contributions Hierarchical Exploratory EM Adds new instances to the existing classes Discovers new classes and adds them at appropriate places in the ontology Class constraints: Inclusion: Every entity that is Mammal is also an Animal Mutual Exclusion: If an entity is Electronic Device then its not Mammal

  4. Problem Definition Input Large set of data-points : ?1 ?? Some known classes : ?1 ?? Class constraints ?? between ? classes Small number of seeds per known class: |seeds| n Output Labels for all data-points ?? Discover new classes from data: ??+? ??+? k +? ? Updated class constraints: Zk ??+?

  5. Review: Exploratory EM [Dalvi et al. ECML 2013] Initialize model with few seeds per class Iterate till convergence (Data likelihood and # classes) E step: Predict labels for unlabeled points If P(Cj | Xi) is nearly-uniform for a data-point Xi, j=1 to k Create a new class Ck+1, assign Xi to it Classification/clustering KMeans, NBayes, VMF Max/Min ratio JS Divergence M step: Recompute model parameters using seeds + predicted labels for unlabeled points Number of classes might increase in each iteration Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1 AIC, BIC, AICc

  6. Hierarchical Exploratory EM Initialize model with few seeds per class Iterate till convergence (Data likelihood and # classes) E step: Predict labels for unlabeled points Assign a consistent bit vector of labels for each unlabeled datapoint If ? ????????????) is nearly-uniform for a data-point ?? Create a new class ????, assign ?? to it Update class constraints accordingly M step: Recompute model parameters using seeds + predicted labels for unlabeled points Number of classes might increase in each iteration Since the E step follows class constraints this step need not be modified Check if model selection criterion is satisfied If not, revert to model in Iteration `t-1

  7. Divide-And-Conquer Exploratory EM Level 1 Assumptions: Classes are arranged in a tree- structured hierarchy. Classes at any level of the hierarchy are mutually exclusive. Root Level 2 Location Food Inclusion Level 3 Condiment Vegetable Country State E.g. Spinach, Potato, Pepper Mutual ExcIusion

  8. Divide-And-Conquer Exploratory EM California 1.0 Root Location Food Condiment Vegetable Country State

  9. Divide-And-Conquer Exploratory EM California 1.0 Root 0.9 Location 0.1 Food Condiment Vegetable Country State

  10. Divide-And-Conquer Exploratory EM California 1.0 Root 0.9 Location 0.1 Food Condiment Vegetable Country State 0.8 0.2 1 0 0 1 1 0 0

  11. Divide-And-Conquer Exploratory EM Coke 1.0 Root Location Food Condiment Vegetable Country State

  12. Divide-And-Conquer Exploratory EM Coke 1.0 Root Location Food 0.1 0.9 Condiment Vegetable Country State

  13. Divide-And-Conquer Exploratory EM Coke 1.0 Root Location Food 0.1 0.9 Condiment Vegetable Country State 0.55 0.45

  14. Divide-And-Conquer Exploratory EM Coke 1.0 Root Location Food 0.1 0.9 C8 Condiment Vegetable Country State Coke 0.55 0.45 1 1 0 0 0 0 0 1

  15. Divide-And-Conquer Exploratory EM Coke 1.0 Root Adds to class constraints ?8 ???? ?8 ????????? = ? Location Food 0.1 0.9 ?? Condiment Vegetable Country State Coke 0.55 0.45 1 1 1 0 0 0 0 0

  16. Divide-And-Conquer Exploratory EM Adds to class constraints ?9 ???? ?9 ???? = ? Cat 1.0 Root C9 Location 0.45 0.55 Food Cat C8 Condiment Country Vegetable State 0 0 0 1 0 0 0 0 1

  17. What are we trying to optimize? Objective Function : Maximize { Log Data Likelihood Model Penalty } m: #clusters, Params{C1 Cm} subject to Class constraints: Zm

  18. Datasets Ontology 2 Ontology 1 Clueweb09 Corpus + Subsets of NELL Dataset #Classes #Levels #NELL entities #Contexts DS-1 11 3 2.5K 3.4M DS-2 39 4 12.9K 6.7M

  19. Results Dataset #Train /Test Points DS-1 335/ 2.2K DS-2 1.5K/ 11.4K

  20. Results Dataset #Train Level #Seed/ #Ideal Classes /Test Points DS-1 335/ 2.2K 2 2/3 3 4/7 DS-2 1.5K/ 11.4K 2 3.9/4 3 9.4/24 4 2.4/10

  21. Results Dataset #Train Level #Seed/ #Ideal Classes Macro-averaged Seed Class F1 /Test Points FLAT SemisupEM ExploratoryEM DS-1 335/ 2.2K 2 2/3 43.2 78.7 * 3 4/7 34.4 42.6 * DS-2 1.5K/ 11.4K 2 3.9/4 64.3 53.40 3 9.4/24 31.3 33.7 * 4 2.4/10 27.5 38.9 *

  22. Results Dataset #Train Level #Seed/ #Ideal Classes Macro-averaged Seed Class F1 /Test Points FLAT DAC SemisupEM ExploratoryEM SemisupEM ExploratoryEM DS-1 335/ 2.2K 2 2/3 43.2 78.7 * 69.5 77.2 * 3 4/7 34.4 42.6 * 31.3 44.4 * DS-2 1.5K/ 11.4K 2 3.9/4 64.3 53.40 65.4 68.9 * 3 9.4/24 31.3 33.7 * 34.9 41.7 * 4 2.4/10 27.5 38.9 * 43.2 42.40

  23. Conclusions Hierarchical Exploratory EM works with incomplete class hierarchy and few seed instances to extend the existing knowledge base. Encouraging preliminary results Hierarchical classification Flat classification Exploratory Learning Semi-supervised Learning Future work: Incorporate arbitrary class constraints Evaluate the newly added clusters

  24. Thank You Questions?

  25. Extra Slides

  26. Class Creation Criterion Given ? ?? ??) , ? = 1 ? & ????????= [1 ? 1 ?] MinMax ratio: ???(? ????))/ ???(? ?? ??) < 2 Jensen-Shannon divergence: JS- Div(? ?? ??),????????) <1 ?

  27. Model Selection Extended Akaike Information Criterion AICc(g) = -2*L(g) + 2*v + 2*v*(v+1)/(n v -1) Here g: model being evaluated, L(g): log-likelihood of data given g, v: number of free parameters of the model, n: number of data-points.

Related