Product and Knowledge Graphs for Enhanced Information Retrieval

undefined
 
Z
e
r
o
 
t
o
 
O
n
e
 
B
i
l
l
i
o
n
:
T
h
e
 
J
o
u
r
n
e
y
 
t
o
 
a
 
R
i
c
h
 
P
r
o
d
u
c
t
 
G
r
a
p
h
 
XIN LUNA DONG
AMAZON
6/2021
 
Knowledge Graph Example for 2 Songs
 
artist
 
mid345
 
mid346
 
mid127
 
mid129
 
mid128
 
genre
 
 
song_writer
 
name
 
name
 
name
 
name
 
name
 
“Shake it off”
 
“Love Story”
 
“Taylor Alison Swift”
 
“Taylor Swift”
 
“Country pop”
 
artist
 
“Dance-pop”
 
“Pop”
 
name
 
name
 
12/13/1989
 
birth_date
Recording
 
type
 
type
Genre
 
type
Entity type
Entity
Relationship
 
genre
 
Product Graph Example for 2 Products
 
Amazon Confidential
 
Product Graph
 
Mission: To answer any question about products and related knowledge in
the world
 
Use Case I: Providing Information
 
Use Case II: Providing Choices
 
Use Case III: Improving Search
 
Use Case III: Improving Search
 
Use Case III: Improving Search
 
Use Case IV: Improving Recommendation
 
AMAZON CONFIDENTIAL
Product Graph vs. Knowledge Graph
Generic KG
PG
Generic KG
PG
Generic KG
PG
 
(A)
 
(B)
 
(C)
Knowledge Graph vs. Product Graph
Generic KG
PG
Generic KG
PG
(A)
(B)
(C)
 
AutoKnow: Self-Driving Product Knowledge Collection
AutoKnow
User logs
Grocery
Snacks
Drinks
Candy
 
Product
KG
Grocery
Snacks
Drinks
Candy
Pretzels
Catalog
Taxonomy
Prod. 1
Prod. 2
Gold
 
color
Prod. 3
Choc.
Chocolate
 
synonym
 
flavor
 
flavor
 
hasType
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
 
AutoKnow: Self-Driving Product Knowledge Collection
AutoKnow
User logs
Grocery
Snacks
Drinks
Candy
 
Product
KG
Grocery
Snacks
Drinks
Candy
Pretzels
 
Catalog
 
Taxonomy
Prod. 1
Prod. 2
Gold
 
color
Prod. 3
Choc.
Chocolate
 
synonym
 
flavor
 
flavor
 
hasType
 
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
New Type
New Value
Corrected
Value
 
AutoKnow: Self-Driving Product Knowledge Collection
AutoKnow
User logs
Grocery
Snacks
Drinks
Candy
 
Product
KG
Grocery
Snacks
Drinks
Candy
Pretzels
 
● #Types ↑ 3X
● Defect rate ↓
up to 68 percent
points
 
Catalog
 
Taxonomy
Prod. 1
Prod. 2
Gold
 
color
Prod. 3
Choc.
Chocolate
 
synonym
 
flavor
 
flavor
 
hasType
 
Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.
undefined
 
B
u
t
,
 
I
s
 
T
h
e
 
P
r
o
b
l
e
m
 
H
a
r
d
e
r
?
 
Challenges in Building Product Graph I
 
Sparse and noisy structured data
Extremely complex domains
How to identify the millions of product types?
How to organize types into a taxonomy tree?
 
Sellers’ view
 
Buyers’ view
Challenges in Building Product Graph II
Big variety across product types
Different attributes apply to different product types
Different value vocabularies and different patterns
Challenges in Building Product Graph III
Big variety across product types
Different attributes apply to different product types
Different value vocabularies and different patterns
Challenges in Building Product Graph III
Scale Up in 3 Dimensions
 
Big challenge: Limited training
labels for large-scale, rich data
 
A 100-Year Project
 
Deliver the Data Business
 
Deliver the Data Business
 
High
precision
models
 
Deliver the Data Business
 
High
precision
models
 
E2E pipeline
+ AutoML
to reduce
modeling cost
 
Deliver the Data Business
 
High
precision
models
 
Scale-up to
reduce #models
 
E2E pipeline
+ AutoML
to reduce
modeling cost
 
Deliver the Data Business
 
High
precision
models
 
Scale-up to
reduce #models
 
Higher yield from
multi-modal models
 
E2E pipeline
+ AutoML
to reduce
modeling cost
undefined
 
F
r
o
m
 
0
 
t
o
 
1
:
 
H
i
g
h
-
Q
u
a
l
i
t
y
 
D
a
t
a
 
From Zero to One: A Core Algorithm
 
High
precision
models
OpenTag Extraction from Product Profiles
Zheng et al., OpenTag: Open attribute value extraction from product profiles, KDD 2018.
 
OpenTag Extraction from Product Profiles
Bi-LSTM
Attention
CRF
Word Embedding
 
OpenTag Extraction from Product Profiles
Unknown
values
Random
values
BiLSTM+CRF+Attention obtains best results
Extraction on new values is comparable
to already known values
Zheng et al., OpenTag: Open attribute value extraction from product profiles, KDD 2018.
OpenTag Extraction from Product Profiles
 
OpenTag in Practice
 
Understand domain
and attributes, and
generate LOTS OF
training data
 
Postprocess
extraction results
to further improve
data quality
 
Pre-publish evaluation as
gatekeeper to guarantee
high quality data
 
Train and fine-tune models
 
Still the Origin Point
undefined
 
F
r
o
m
 
1
 
t
o
 
1
K
:
R
e
d
u
c
i
n
g
 
M
o
d
e
l
i
n
g
 
C
o
s
t
 
From 1 to 1K: E2E AutoML Pipeline
 
High
precision
models
 
E2E pipeline
+ AutoML
to reduce
modeling cost
 
An End-to-End Pipeline
 
Understand domain
and attributes, and
generate LOTS OF
training data
 
Postprocess
extraction results
to further improve
data quality
 
Pre-publish evaluation as
gatekeeper to guarantee
high quality data
 
Train and fine-tune models
 
An End-to-End Pipeline
 
Postprocess
extraction results
to further improve
data quality
 
Pre-publish evaluation as
gatekeeper to guarantee
high quality data
 
Train and fine-tune models
 
Distant supervision,
Data programming
 
Benchmarking
 
An End-to-End Pipeline
 
Postprocess
extraction results
to further improve
data quality
 
Pre-publish evaluation as
gatekeeper to guarantee
high quality data
 
Train and fine-tune models
 
Distant supervision,
Data programming
 
Benchmarking
 
An End-to-End Pipeline
 
Postprocess
extraction results
to further improve
data quality
 
Scale-up pre-publish
evaluation w. lower
labeling needs
 
Train and fine-tune models
 
Distant supervision,
Data programming
 
Benchmarking
 
An End-to-End Pipeline
 
Scale-up pre-publish
evaluation w. lower
labeling needs
 
Distant supervision,
Data programming
 
Benchmarking
 
AutoML
 
Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD’20
 
Transformer-Based Anomaly Detection
 
Transformer-Based Anomaly Detection
Is the flavor “Pink”?
 
Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD’20
 
Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD’20
 
Transformer-Based Anomaly Detection
 
Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD’20
Category as input for model
training
 
Transformer-Based Anomaly Detection
 
Identify 
1.77MM
 incorrect values for Flavor and Scent for
Consumables with 
90% precision
 
 
 
 
 
 
Transformer-Based Anomaly Detection
undefined
 
F
r
o
m
 
1
K
 
t
o
 
1
M
:
 
S
c
a
l
i
n
g
 
U
p
 
From 1K to 1M: One Size Fits All
 
High
precision
models
 
Scale-up to
reduce #models
 
E2E pipeline
+ AutoML
to reduce
modeling cost
 
Scale up for Millions of Categories
 
Scale-up I: Millions of Categories
 
Option 1. Train a single model?
  
Train/Test Distribution shift -> Invalid predictions
 
Karamanolakis et al., TXtract: Taxonomy-aware knowledge extraction for thousands of product categories, ACL 2020.
 
Store/orchestrate 100K+ OpenTag models
 
Option 1. Train a single model?
  
Train/Test Distribution shift -> Invalid predictions
 
Option 2. Train a model for each category?
 
Most categories
are very sparse
 
Karamanolakis et al., TXtract: Taxonomy-aware knowledge extraction for thousands of product categories, ACL 2020.
 
Scale-up I: Millions of Categories
 
Karamanolakis et al., TXtract: Taxonomy-aware knowledge extraction for thousands of product categories, ACL 2020.
 
Scale-up I: Millions of Categories
 
Karamanolakis et al., TXtract: Taxonomy-aware knowledge extraction for thousands of product categories, ACL 2020.
 
Scale-up I: Millions of Categories
Attention conditioned on
category representation
 
Karamanolakis et al., TXtract: Taxonomy-aware knowledge extraction for thousands of product categories, ACL 2020.
 
Scale-up I: Millions of Categories
Attention conditioned on
category representation
 
Train one model on 4K categories, and improve state-of-
the-art by 10.4% in F1, and by 11.7% in coverage
 
 
 
 
 
 
Karamanolakis et al., TXtract: Taxonomy-aware knowledge extraction for thousands of product categories, ACL 2020.
 
Scale-up I: Millions of Categories
 
Scale up for Thousands of Attributes
 
Yan et al., AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding, ACL 2021.
 
Scale-up II: Thousands of Attributes
 
Yan et al., AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding, ACL 2021.
 
Scale-up II: Thousands of Attributes
 
Yan et al., AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding, ACL 2021.
 
Scale-up II: Thousands of Attributes
 
Yan et al., AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding, ACL 2021.
 
Scale-up II: Thousands of Attributes
 
Yan et al., AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding, ACL 2021.
 
Scale-up II: Thousands of Attributes
 
Train one model for 32 attributes, obtaining higher quality
than single-attribute models
 
 
 
 
 
 
Yan et al., AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding, ACL 2021.
 
Scale-up II: Thousands of Attributes
One model for multiple
attributes
Higher F1 than one model
per attribute
undefined
 
F
r
o
m
 
1
M
 
t
o
 
1
B
:
I
n
c
r
e
a
s
i
n
g
 
t
h
e
 
Y
i
e
l
d
 
From 1M to 1B: Multi-Modal Extraction
 
High
precision
models
 
Scale-up to
reduce #models
 
Higher yield from
multi-modal models
 
E2E pipeline
+ AutoML
to reduce
modeling cost
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
 
Multi-Modal Signals
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
 
Multi-Modal Extraction
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
 
Multi-Modal Extraction
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
 
Multi-Modal Extraction
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
 
Multi-Modal Extraction
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
 
Multi-Modal Extraction
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
 
Improve over OpenTag (text-only) by 11% on F-measure
 
 
 
 
 
 
Multi-Modal Extraction
 
Lin et al., PAM: Understanding product images in cross product category attribute extraction, KDD 2021.
Text still plays the most
important role
 
Web Extraction
 
ItemForm
 
Scent
 
Dong, Ceres: Harvesting Knowledge from Semi-Structured Web, Keynote at CIKM, 2020.
SUCCESS
Entity linkage
Knowledge extraction (ClosedIE)
Knowledge cleaning
Knowledge-based QA
 
NOT-YET SUCCESS
 
Schema mapping
Small scale: Manual
Large scale: “Replaced” by ClosedIE
OpenIE
“Replaced” by Reading comprehension
Knowledge fusion
Not fully needed yet
Knowledge inference
Quality not high enough yet
 
Successful vs. Not-Yet-Successful Fields
in Industry
1.
One cannot live without it
2.
The techniques are ready
 
Take Aways
 
We are building an authoritative product
knowledge graph for millions of categories,
thousands of attributes, and hundreds of languages
High-accuracy modeling is the first step for
building an authoritative knowledge graph
AutoML E2E pipelines, one-size-fits-all solutions,
and multi-modal models are critical for enriching
the knowledge
undefined
 
T
h
a
n
k
 
Y
o
u
!
Slide Note
Embed
Share

Dive into the world of product and knowledge graphs, uncovering the journey to a rich product graph, examples of knowledge graphs for songs, and the mission to provide comprehensive information on products and related knowledge. Discover use cases ranging from information provision to enhancing search and recommendation systems. Understand the differences and applications between knowledge graphs and product graphs for effective data organization and retrieval.

  • Product Graphs
  • Knowledge Graphs
  • Information Retrieval
  • Search Improvement
  • Recommendation Systems

Uploaded on Mar 20, 2024 | 3 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Zero to One Billion Zero to One Billion: : The Journey to a Rich Product Graph The Journey to a Rich Product Graph XIN LUNA DONG AMAZON 6/2021

  2. Knowledge Graph Example for 2 Songs name Pop Entity name mid127 Dance-pop genre name Taylor Alison Swift mid345 name Shake it off artist name Taylor Swift mid128 type artist Recording type song_writer 12/13/1989 birth_date Love Story mid346 name name mid129 Country pop genre Entity type type Relationship Genre

  3. Product Graph Example for 2 Products Amazon Confidential

  4. Product Graph Mission: To answer any question about products and related knowledge in the world

  5. Use Case I: Providing Information

  6. Use Case II: Providing Choices

  7. Use Case III: Improving Search

  8. Use Case III: Improving Search

  9. Use Case III: Improving Search

  10. Use Case IV: Improving Recommendation AMAZON CONFIDENTIAL

  11. Product Graph vs. Knowledge Graph (A) (B) (C) Generic KG Generic KG Generic KG PG PG PG

  12. Knowledge Graph vs. Product Graph (A) (B) (C) Generic KG Generic KG Generic KG Movie, Music, Book, etc. Product PG Graph (Hardline, softline, consumables, etc.) PG

  13. AutoKnow: Self-Driving Product Knowledge Collection Taxonomy Product KG Grocery Grocery Drinks Snacks Drinks Snacks Candy Pretzels Candy User logs AutoKnow Catalog hasType Product Type Flavor Color Prod. 2 Prod. 1 Prod. 3 Product 1 Snacks Cherry color flavor flavor Product 2 Candy ? ? Gold Choc. Chocolate synonym Product 3 Candy Choc. Gold Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.

  14. AutoKnow: Self-Driving Product Knowledge Collection Taxonomy Product KG Grocery Grocery Drinks New Type Snacks Drinks Snacks Candy Pretzels Candy User logs AutoKnow Catalog hasType Product Type Flavor Color Prod. 2 Prod. 1 Prod. 3 Corrected Value Product 1 Snacks Cherry color flavor flavor Product 2 Candy ? ? Gold Choc. Chocolate synonym Product 3 Candy Choc. Gold New Value Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.

  15. AutoKnow: Self-Driving Product Knowledge Collection Taxonomy Product KG Grocery Grocery Drinks Snacks Drinks Snacks Candy Pretzels Candy User logs AutoKnow Catalog hasType Product Type Flavor Color Prod. 2 Prod. 1 Prod. 3 #Types 3X Defect rate up to 68 percent points Product 1 Snacks Cherry color flavor flavor Product 2 Candy ? ? Gold Choc. Chocolate synonym Product 3 Candy Choc. Gold Dong et al., AutoKnow: Self-driving knowledge collection for products of thousands of types, SigKDD, 2020.

  16. Generic KG Movie, Music, Book, etc. Product Graph (Hardline, softline, consumables, etc.) But, Is The Problem Harder? But, Is The Problem Harder?

  17. Challenges in Building Product Graph I Sparse and noisy structured data

  18. Challenges in Building Product Graph II Extremely complex domains How to identify the millions of product types? How to organize types into a taxonomy tree? Buyers view Sellers view

  19. Challenges in Building Product Graph III Big variety across product types Different attributes apply to different product types Different value vocabularies and different patterns

  20. Challenges in Building Product Graph III Big variety across product types Different attributes apply to different product types Different value vocabularies and different patterns

  21. Scale Up in 3 Dimensions Millions of categories Thousands of attributes Big challenge: Limited training labels for large-scale, rich data Hundreds of languages

  22. A 100-Year Project

  23. Deliver the Data Business 1000000000 , , ,

  24. Deliver the Data Business 1 High precision models

  25. Deliver the Data Business 1000 , High precision models to reduce modeling cost E2E pipeline + AutoML

  26. Deliver the Data Business 1000000 , High precision models to reduce modeling cost , 1000scategories E2E pipeline + AutoML 10s languages 100s attributes Scale-up to reduce #models

  27. Deliver the Data Business 1000000000 , , High precision models Scale-up to reduce #models modeling cost , 1000scategories E2E pipeline + AutoML to reduce 10s languages 100s attributes Higher yield from multi-modal models

  28. From 0 to 1: High From 0 to 1: High- -Quality Data Quality Data

  29. From Zero to One: A Core Algorithm 1 High precision models

  30. OpenTag Extraction from Product Profiles Zheng et al., OpenTag: Open attribute value extraction from product profiles, KDD 2018.

  31. OpenTag Extraction from Product Profiles

  32. OpenTag Extraction from Product Profiles CRF Attention Bi-LSTM Word Embedding

  33. OpenTag Extraction from Product Profiles Unknown values Random values Extraction on new values is comparable to already known values BiLSTM+CRF+Attention obtains best results Zheng et al., OpenTag: Open attribute value extraction from product profiles, KDD 2018.

  34. OpenTag in Practice Train and fine-tune models OpenTag Understand domain and attributes, and generate LOTS OF training data Postprocess extraction results to further improve data quality Pre-publish evaluation as gatekeeper to guarantee high quality data

  35. Still the Origin Point Millions of categories Thousands of attributes Hundreds of languages

  36. From 1 to 1K: From 1 to 1K: Reducing Modeling Cost Reducing Modeling Cost

  37. From 1 to 1K: E2E AutoML Pipeline 1000 , High precision models to reduce modeling cost E2E pipeline + AutoML

  38. An End-to-End Pipeline Train and fine-tune models OpenTag Understand domain and attributes, and generate LOTS OF training data Postprocess extraction results to further improve data quality Pre-publish evaluation as gatekeeper to guarantee high quality data

  39. An End-to-End Pipeline Train and fine-tune models Automatic Training Data Generation OpenTag Postprocess extraction results to further improve data quality Pre-publish evaluation as gatekeeper to guarantee high quality data Distant supervision, Data programming Benchmarking

  40. An End-to-End Pipeline Train and fine-tune models Automatic Training Data Generation Deep Learning Data Cleaning OpenTag Pre-publish evaluation as gatekeeper to guarantee high quality data Distant supervision, Data programming Postprocess extraction results to further improve data quality Benchmarking

  41. An End-to-End Pipeline Train and fine-tune models Automatic Training Data Generation Deep Learning Data Cleaning OpenTag Scale-up pre-publish evaluation w. lower labeling needs Distant supervision, Data programming Postprocess extraction results to further improve data quality Benchmarking

  42. An End-to-End Pipeline AutoML Automatic Training Data Generation Deep Learning Data Cleaning OpenTag Scale-up pre-publish evaluation w. lower labeling needs Distant supervision, Data programming Benchmarking

  43. Transformer-Based Anomaly Detection Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  44. Transformer-Based Anomaly Detection Is the flavor Pink ? Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  45. Transformer-Based Anomaly Detection Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  46. Transformer-Based Anomaly Detection Category as input for model training Wang et al., Automatic validation of textual attribute values in eCommerce Catalog by learning with limited labeled data, KDD 20

  47. Transformer-Based Anomaly Detection Identify 1.77MM incorrect values for Flavor and Scent for Consumables with 90% precision Product Attr Value Love of Candy Bulk Candy - Pink Mint Chocolate Lentils - 6lb Bag Flavor Pink Scott's Cakes Dark Chocolate Fruit & Nut Cream Filling Candies with Burgandy Foils in a 1 Pound Snowflake Box Flavor 1 lb. snowflake box Lucky Baby - Baby Blanket Envelope Swaddle Winter Wrap Coral Fleece Newborn Blanket Sleeper Infant Stroller Wrap Toddlers Baby Sleeping Bag (color 1) Flavor color 1 ASUTRA Himalayan Sea Salt Body Scrub Exfoliator + Body Brush (Vitamin C), 12 oz | Ultra Hydrating, Gentle, Moisturizing | All Natural & Organic Jojoba, Sweet Almond, Argan Oils vitamin c body scrub - 12oz & body brush Scent 2Packages (Breakfast Blend, 31.1 oz) Folgers Simply Smooth Ground Coffee, 2 Count (Medium Roast), 31.1 Ounce Scent

  48. From 1K to 1M: Scaling Up From 1K to 1M: Scaling Up

  49. From 1K to 1M: One Size Fits All 1000000 , High precision models to reduce modeling cost , 1000scategories E2E pipeline + AutoML 10s languages 100s attributes Scale-up to reduce #models

  50. Scale up for Millions of Categories Millions of categories Thousands of attributes Hundreds of languages

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#