Exploring Product and Knowledge Graphs for Enhanced Information Retrieval

Slide Note
Embed
Share

Dive into the world of product and knowledge graphs, uncovering the journey to a rich product graph, examples of knowledge graphs for songs, and the mission to provide comprehensive information on products and related knowledge. Discover use cases ranging from information provision to enhancing search and recommendation systems. Understand the differences and applications between knowledge graphs and product graphs for effective data organization and retrieval.


Uploaded on Mar 20, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Zero to One Billion Zero to One Billion: : The Journey to a Rich Product Graph The Journey to a Rich Product Graph XIN LUNA DONG AMAZON 6/2021

  2. Knowledge Graph Example for 2 Songs name Pop Entity name mid127 Dance-pop genre name Taylor Alison Swift mid345 name Shake it off artist name Taylor Swift mid128 type artist Recording type song_writer 12/13/1989 birth_date Love Story mid346 name name mid129 Country pop genre Entity type type Relationship Genre

  3. Product Graph Example for 2 Products Amazon Confidential

  4. Product Graph Mission: To answer any question about products and related knowledge in the world

  5. Use Case I: Providing Information

  6. Use Case II: Providing Choices

  7. Use Case III: Improving Search

  8. Use Case III: Improving Search

  9. Use Case III: Improving Search

  10. Use Case IV: Improving Recommendation AMAZON CONFIDENTIAL

  11. Product Graph vs. Knowledge Graph (A) (B) (C) Generic KG Generic KG Generic KG PG PG PG

  12. Knowledge Graph vs. Product Graph (A) (B) (C) Generic KG Generic KG Generic KG Movie, Music, Book, etc. Product PG Graph (Hardline, softline, consumables, etc.) PG

  13. AutoKnow: Self-Driving Product Knowledge Collection Taxonomy Product KG Grocery Grocery Drinks Snacks Drinks Snacks Candy Pretzels Candy User logs AutoKnow Catalog hasType Product Type Flavor Color Prod. 2 Prod. 1 Prod. 3 Product 1 Snacks Cherry color flavor flavor Product 2 Candy ? ? Gold Choc. Chocolate synonym Product 3 Candy Choc. Gold Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.

  14. AutoKnow: Self-Driving Product Knowledge Collection Taxonomy Product KG Grocery Grocery Drinks New Type Snacks Drinks Snacks Candy Pretzels Candy User logs AutoKnow Catalog hasType Product Type Flavor Color Prod. 2 Prod. 1 Prod. 3 Corrected Value Product 1 Snacks Cherry color flavor flavor Product 2 Candy ? ? Gold Choc. Chocolate synonym Product 3 Candy Choc. Gold New Value Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.

  15. AutoKnow: Self-Driving Product Knowledge Collection Taxonomy Product KG Grocery Grocery Drinks Snacks Drinks Snacks Candy Pretzels Candy User logs AutoKnow Catalog hasType Product Type Flavor Color Prod. 2 Prod. 1 Prod. 3 #Types 3X Defect rate up to 68 percent points Product 1 Snacks Cherry color flavor flavor Product 2 Candy ? ? Gold Choc. Chocolate synonym Product 3 Candy Choc. Gold Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.

  16. Generic KG Movie, Music, Book, etc. Product Graph (Hardline, softline, consumables, etc.) But, Is The Problem Harder? But, Is The Problem Harder?

  17. Challenges in Building Product Graph I Sparse and noisy structured data

  18. Challenges in Building Product Graph II Extremely complex domains How to identify the millions of product types? How to organize types into a taxonomy tree? Buyers view Sellers view

  19. Challenges in Building Product Graph III Big variety across product types Different attributes apply to different product types Different value vocabularies and different patterns

  20. Challenges in Building Product Graph III Big variety across product types Different attributes apply to different product types Different value vocabularies and different patterns

  21. Scale Up in 3 Dimensions Millions of categories Thousands of attributes Big challenge: Limited training labels for large-scale, rich data Hundreds of languages

  22. A 100-Year Project

  23. Deliver the Data Business 1000000000 , , ,

  24. Deliver the Data Business 1 High precision models

  25. Deliver the Data Business 1000 , High precision models to reduce modeling cost E2E pipeline + AutoML

  26. Deliver the Data Business 1000000 , High precision models to reduce modeling cost , 1000scategories E2E pipeline + AutoML 10s languages 100s attributes Scale-up to reduce #models

  27. Deliver the Data Business 1000000000 , , High precision models Scale-up to reduce #models modeling cost , 1000scategories E2E pipeline + AutoML to reduce 10s languages 100s attributes Higher yield from multi-modal models

  28. From 0 to 1: High From 0 to 1: High- -Quality Data Quality Data

  29. From Zero to One: A Core Algorithm 1 High precision models

  30. OpenTag Extraction from Product Profiles Zheng et al., OpenTag: Open attribute value extraction from product profiles, KDD 2018.

  31. OpenTag Extraction from Product Profiles

  32. OpenTag Extraction from Product Profiles CRF Attention Bi-LSTM Word Embedding

  33. OpenTag Extraction from Product Profiles Unknown values Random values Extraction on new values is comparable to already known values BiLSTM+CRF+Attention obtains best results Zheng et al., OpenTag: Open attribute value extraction from product profiles, KDD 2018.

  34. OpenTag in Practice Train and fine-tune models OpenTag Understand domain and attributes, and generate LOTS OF training data Postprocess extraction results to further improve data quality Pre-publish evaluation as gatekeeper to guarantee high quality data

  35. Still the Origin Point Millions of categories Thousands of attributes Hundreds of languages

  36. From 1 to 1K: From 1 to 1K: Reducing Modeling Cost Reducing Modeling Cost

  37. From 1 to 1K: E2E AutoML Pipeline 1000 , High precision models to reduce modeling cost E2E pipeline + AutoML

  38. An End-to-End Pipeline Train and fine-tune models OpenTag Understand domain and attributes, and generate LOTS OF training data Postprocess extraction results to further improve data quality Pre-publish evaluation as gatekeeper to guarantee high quality data

  39. An End-to-End Pipeline Train and fine-tune models Automatic Training Data Generation OpenTag Postprocess extraction results to further improve data quality Pre-publish evaluation as gatekeeper to guarantee high quality data Distant supervision, Data programming Benchmarking

  40. An End-to-End Pipeline Train and fine-tune models Automatic Training Data Generation Deep Learning Data Cleaning OpenTag Pre-publish evaluation as gatekeeper to guarantee high quality data Distant supervision, Data programming Postprocess extraction results to further improve data quality Benchmarking

  41. An End-to-End Pipeline Train and fine-tune models Automatic Training Data Generation Deep Learning Data Cleaning OpenTag Scale-up pre-publish evaluation w. lower labeling needs Distant supervision, Data programming Postprocess extraction results to further improve data quality Benchmarking

  42. An End-to-End Pipeline AutoML Automatic Training Data Generation Deep Learning Data Cleaning OpenTag Scale-up pre-publish evaluation w. lower labeling needs Distant supervision, Data programming Benchmarking

  43. Transformer-Based Anomaly Detection Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  44. Transformer-Based Anomaly Detection Is the flavor Pink ? Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  45. Transformer-Based Anomaly Detection Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  46. Transformer-Based Anomaly Detection Category as input for model training Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  47. Transformer-Based Anomaly Detection Identify 1.77MM incorrect values for Flavor and Scent for Consumables with 90% precision Product Attr Value Love of Candy Bulk Candy - Pink Mint Chocolate Lentils - 6lb Bag Flavor Pink Scott's Cakes Dark Chocolate Fruit & Nut Cream Filling Candies with Burgandy Foils in a 1 Pound Snowflake Box Flavor 1 lb. snowflake box Lucky Baby - Baby Blanket Envelope Swaddle Winter Wrap Coral Fleece Newborn Blanket Sleeper Infant Stroller Wrap Toddlers Baby Sleeping Bag (color 1) Flavor color 1 ASUTRA Himalayan Sea Salt Body Scrub Exfoliator + Body Brush (Vitamin C), 12 oz | Ultra Hydrating, Gentle, Moisturizing | All Natural & Organic Jojoba, Sweet Almond, Argan Oils vitamin c body scrub - 12oz & body brush Scent 2Packages (Breakfast Blend, 31.1 oz) Folgers Simply Smooth Ground Coffee, 2 Count (Medium Roast), 31.1 Ounce Scent

  48. From 1K to 1M: Scaling Up From 1K to 1M: Scaling Up

  49. From 1K to 1M: One Size Fits All 1000000 , High precision models to reduce modeling cost , 1000scategories E2E pipeline + AutoML 10s languages 100s attributes Scale-up to reduce #models

  50. Scale up for Millions of Categories Millions of categories Thousands of attributes Hundreds of languages

Related


More Related Content