WEB-SOBA: Ontology Building for Aspect-Based Sentiment Classification

WEB-SOBA: Word Embeddings-Based Semi-automatic
Ontology Building for Aspect-Based Sentiment Classification
Fenna ten Haaf, Christopher Claassen,
R
u
b
e
n
 
E
s
c
h
a
u
z
i
e
r
,
 
J
o
a
n
n
e
 
T
j
a
n
,
 
D
a
n
i
e
l
 
B
u
i
j
s
,
Flavius Frasincar, and Kim Schouten
Erasmus University Rotterdam, the Netherlands
 
1
C
o
n
t
e
n
t
Motivation
Related Work
Data
Methodology
Evaluation
Conclusion
 
2
M
o
t
i
v
a
t
i
o
n
Growing number of text data available
More specifically online reviews
Growing importance of reviews:
80% of the consumers read online reviews
75% of the consumers consider reviews important
Business can use the information to improve
Identify key strengths and weaknesses
Yelp alone features more than 200 million reviews
 
3
M
o
t
i
v
a
t
i
o
n
Automatic analysis of data required to generate
 insights
We focus on sentiment within reviews
S
e
n
t
i
m
e
n
t
 
m
i
n
i
n
g
 
i
s
 
d
e
f
i
n
e
d
 
a
s
 
t
h
e
 
a
u
t
o
m
a
t
i
c
 
a
s
s
e
s
s
m
e
n
t
 
o
f
 
t
h
e
 
s
e
n
t
i
m
e
n
t
e
x
p
r
e
s
s
e
d
 
i
n
 
t
e
x
t
 
(
i
n
 
o
u
r
 
c
a
s
e
 
b
y
 
c
o
n
s
u
m
e
r
s
 
i
n
 
p
r
o
d
u
c
t
 
r
e
v
i
e
w
s
)
Different types of sentiment mining
Review-level
Sentence-level
A
s
p
e
c
t
-
l
e
v
e
l
 
(
f
o
r
 
p
r
o
d
u
c
t
s
,
 
a
s
p
e
c
t
s
 
a
r
e
 
a
l
s
o
 
r
e
f
e
r
r
e
d
 
t
o
 
a
s
 
p
r
o
d
u
c
t
 
f
e
a
t
u
r
e
s
)
 
4
M
o
t
i
v
a
t
i
o
n
Aspect-Based Sentiment Analysis (ABSA) aims to determine sentiment
polarity regarding aspects of products
ABSA has two phases
A
s
p
e
c
t
 
m
i
n
i
n
g
 
a
t
t
e
m
p
t
s
 
t
o
 
e
x
t
r
a
c
t
 
a
s
p
e
c
t
s
 
m
e
n
t
i
o
n
e
d
 
i
n
 
t
e
x
t
There are two different aspect types
E
x
p
l
i
c
i
t
 
a
s
p
e
c
t
 
m
e
n
t
i
o
n
s
 
w
h
e
r
e
 
t
h
e
 
a
s
p
e
c
t
 
i
s
 
e
x
p
l
i
c
i
t
l
y
 
m
e
n
t
i
o
n
e
d
I
m
p
l
i
c
i
t
 
a
s
p
e
c
t
 
m
e
n
t
i
o
n
s
,
 
w
h
e
r
e
 
t
h
e
 
a
s
p
e
c
t
 
i
s
 
i
m
p
l
i
e
d
 
b
y
 
t
h
e
 
s
e
n
t
e
n
c
e
Three main approaches
K
n
o
w
l
e
d
g
e
 
R
e
p
r
e
s
e
n
t
a
t
i
o
n
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
H
y
b
r
i
d
:
 
c
u
r
r
e
n
t
 
s
t
a
t
e
-
o
f
-
t
h
e
-
a
r
t
,
 
e
.
g
.
,
 
A
 
H
y
b
r
i
d
 
A
p
p
r
o
a
c
h
 
f
o
r
 
A
s
p
e
c
t
-
B
a
s
e
d
 
S
e
n
t
i
m
e
n
t
A
n
a
l
y
s
i
s
 
(
H
A
A
B
S
A
)
 
p
r
o
p
o
s
e
d
 
b
y
 
W
a
l
l
a
a
r
t
 
a
n
d
 
F
r
a
s
i
n
c
a
r
 
(
2
0
1
9
)
 
a
t
 
E
S
W
C
 
2
0
1
9
 
5
M
o
t
i
v
a
t
i
o
n
HAABSA uses an ontology to predict sentiment
If that fails, use machine learning model as back-up
Ontology is hand crafted
Semi-automatic ontology building reduces user time spent
Word embeddings promising candidate to use as word representation
We propose a method for building ontologies semi-automatically based on
word embeddings
 
6
R
e
l
a
t
e
d
 
W
o
r
k
O
n
t
o
l
o
g
y
+
B
o
W
 
a
p
p
r
o
a
c
h
 
i
s
 
i
n
t
r
o
d
u
c
e
d
 
i
n
 
S
c
h
o
u
t
e
n
 
a
n
d
 
F
r
a
s
i
n
c
a
r
 
(
2
0
1
7
)
Only ontology achieves 74.2% accuracy on SemEval 2016
Only BoW SVM achieves 82.0% accuracy
Ontology + BoW SVM achieves 86.0% accuracy
H
A
A
B
S
A
 
c
o
m
b
i
n
e
s
 
a
n
 
o
n
t
o
l
o
g
y
 
w
i
t
h
 
a
t
t
e
n
t
i
o
n
a
l
 
n
e
u
r
a
l
 
m
o
d
e
l
L
e
f
t
-
C
e
n
t
e
r
-
R
i
g
h
t
 
s
e
p
a
r
a
t
e
d
 
n
e
u
r
a
l
 
n
e
t
w
o
r
k
 
w
i
t
h
 
R
o
t
a
t
o
r
y
 
a
t
t
e
n
t
i
o
n
 
a
n
d
 
m
u
l
t
i
-
h
o
p
s
(
L
C
R
-
R
o
t
-
h
o
p
)
 
b
y
 
W
a
l
l
a
a
r
t
 
a
n
d
 
F
r
a
s
i
n
c
a
r
 
(
2
0
1
9
)
 
u
s
e
s
 
l
e
f
t
,
 
t
a
r
g
e
t
 
a
n
d
 
r
i
g
h
t
 
c
o
n
t
e
x
t
 
t
o
c
o
m
p
u
t
e
 
a
t
t
e
n
t
i
o
n
 
s
c
o
r
e
s
 
i
n
 
a
n
 
i
t
e
r
a
t
i
v
e
 
a
p
p
r
o
a
c
h
 
t
o
 
m
i
n
e
 
s
e
n
t
i
m
e
n
t
.
Ontology + LCR-Rot-hop achieves state-of-the-art 88.0% accuracy on SamEval 2016
 
7
R
e
l
a
t
e
d
 
W
o
r
k
Buitelaar et al. (2005) propose a framework for creating ontologies
Three ingredients are needed
T
e
r
m
s
:
 
W
o
r
d
s
 
a
n
d
 
t
h
e
i
r
 
s
y
n
o
n
y
m
s
 
t
h
a
t
 
p
o
r
t
r
a
y
 
e
i
t
h
e
r
 
s
e
n
t
i
m
e
n
t
 
o
r
 
a
s
p
e
c
t
s
 
t
h
a
t
 
a
r
e
s
p
e
c
i
f
i
c
 
t
o
 
o
u
r
 
d
o
m
a
i
n
 
o
r
 
a
r
e
 
u
s
e
d
 
f
o
r
 
g
e
n
e
r
a
l
 
u
s
e
C
o
n
c
e
p
t
s
:
 
T
e
r
m
s
 
t
h
a
t
 
a
r
e
 
r
e
l
a
t
e
d
 
t
o
 
e
a
c
h
 
o
t
h
e
r
 
n
e
e
d
 
t
o
 
b
e
 
l
i
n
k
e
d
 
w
i
t
h
i
n
 
t
h
e
 
o
n
t
o
l
o
g
y
 
t
o
f
o
r
m
 
c
o
n
c
e
p
t
s
C
o
n
c
e
p
t
 
H
i
e
r
a
r
c
h
i
e
s
:
 
U
s
e
 
c
o
n
c
e
p
t
 
h
i
e
r
a
r
c
h
y
 
l
e
a
r
n
i
n
g
 
t
o
 
f
o
r
m
 
h
i
e
r
a
r
c
h
i
e
s
 
w
i
t
h
i
n
c
o
n
c
e
p
t
s
We adapt these ingredients to fit in a word embedding-based method
 
8
R
e
l
a
t
e
d
 
W
o
r
k
Zhuang, Schouten and Frasincar (2020) introduce SOBA
Semi-automated ontology builder
Terms and their associated synsets produce concepts
Uses word frequencies in domain corpora
Dera, et al. (2020) propose SASOBUS
Semi-automatic sentiment domain ontology building using synsets
Synsets are also used during concept hierarchy learning
Achieves better accuracy on SemEval 2016 dataset
 
9
D
a
t
a
D
o
m
a
i
n
 
c
o
r
p
u
s
Yelp Dataset Challenge dataset
Keep only restaurant reviews
5,508,394 domain-specific reviews of over 500,000 restaurants
C
o
n
t
r
a
s
t
i
n
g
 
c
o
r
p
u
s
Pre-trained word2vec model
Google-news-300
 
10
D
a
t
a
 
11
D
a
t
a
Example of sentence in in SemEval 2016 dataset in XML format
Aspect categories: FOOD, AMBIENCE, DRINKS, LOCATION,
RESTAURANT, and SERVICE
Aspect attributes: PRICES, QUALITY, STYLE&OPTIONS, GENERAL, and
MISCELLANEOUS
 
12
D
a
t
a
 
13
Positive sentiment occurs most often
Food quality mentioned most often
M
e
t
h
o
d
o
l
o
g
y
W
E
B
-
S
O
B
A
:
 
W
o
r
d
 
E
m
b
e
d
d
i
n
g
s
-
B
a
s
e
d
 
S
e
m
i
-
a
u
t
o
m
a
t
i
c
 
O
n
t
o
l
o
g
y
 
B
u
i
l
d
i
n
g
f
o
r
 
A
s
p
e
c
t
-
B
a
s
e
d
 
S
e
n
t
i
m
e
n
t
 
C
l
a
s
s
i
f
i
c
a
t
i
o
n
Implementation in Java: 
https://github.com/RubenEschauzier/WEB-SOBA
G
e
n
s
i
m
 
(
P
y
t
h
o
n
)
 
t
o
 
t
r
a
i
n
 
w
o
r
d
2
v
e
c
 
m
o
d
e
l
 
o
n
 
Y
e
l
p
 
d
a
t
a
s
e
t
W
o
r
d
 
E
m
b
e
d
d
i
n
g
 
R
e
f
i
n
e
 
(
u
s
i
n
g
 
t
h
e
 
i
m
p
l
e
m
e
n
t
a
t
i
o
n
 
i
n
 
P
y
t
h
o
n
:
h
t
t
p
s
:
/
/
g
i
t
h
u
b
.
c
o
m
/
w
a
n
g
j
i
n
0
8
1
8
/
w
o
r
d
_
e
m
b
e
d
d
i
n
g
_
r
e
f
i
n
e
)
 
t
o
 
m
a
k
e
 
w
o
r
d
e
m
b
e
d
d
i
n
g
s
 
s
e
n
t
i
m
e
n
t
 
a
w
a
r
e
 
i
n
t
r
o
d
u
c
e
d
 
b
y
 
Y
u
 
e
t
 
a
l
.
 
(
2
0
1
7
)
S
t
a
n
f
o
r
d
 
C
o
r
e
N
L
P
 
(
J
a
v
a
)
 
f
o
r
 
t
o
k
e
n
i
z
a
t
i
o
n
,
 
l
e
m
m
a
t
i
z
a
t
i
o
n
,
 
a
n
d
 
p
a
r
t
-
o
f
-
s
p
e
e
c
h
 
t
a
g
g
i
n
g
The domain ontology is represented in OWL
 
14
M
e
t
h
o
d
o
l
o
g
y
 
 
O
n
t
o
l
o
g
y
 
S
t
r
u
c
t
u
r
e
The ontology consists of two main classes:
M
e
n
t
i
o
n
S
e
n
t
i
m
e
n
t
V
a
l
u
e
SentimentValue consists of two subclasses
P
o
s
i
t
i
v
e
N
e
g
a
t
i
v
e
There are three types of sentiment words
T
y
p
e
-
1
:
 
G
e
n
e
r
i
c
 
s
e
n
t
i
m
e
n
t
 
a
l
w
a
y
s
 
h
a
s
 
t
h
e
 
s
a
m
e
 
p
o
l
a
r
i
t
y
 
r
e
g
a
r
d
l
e
s
s
 
o
f
 
c
o
n
t
e
x
t
,
 
a
l
w
a
y
s
 
a
s
u
b
c
l
a
s
s
 
o
f
 
G
e
n
e
r
i
c
P
o
s
i
t
i
v
e
 
o
r
 
G
e
n
e
r
i
c
N
e
g
a
t
i
v
e
 
c
l
a
s
s
T
y
p
e
-
2
:
 
S
e
n
t
i
m
e
n
t
 
w
o
r
d
 
t
h
a
t
 
o
n
l
y
 
a
p
p
l
i
e
s
 
t
o
 
s
p
e
c
i
f
i
c
 
c
a
t
e
g
o
r
y
 
o
f
 
a
s
p
e
c
t
s
 
(
e
.
g
.
,
 
d
e
l
i
c
i
o
u
s
a
p
p
l
i
e
s
 
t
o
 
f
o
o
d
 
a
n
d
 
d
r
i
n
k
s
,
 
b
u
t
 
n
o
t
 
t
o
 
s
e
r
v
i
c
e
)
T
y
p
e
-
3
:
 
S
e
n
t
i
m
e
n
t
 
w
o
r
d
 
t
h
a
t
 
c
h
a
n
g
e
s
 
i
t
s
 
p
o
l
a
r
i
t
y
 
b
a
s
e
d
 
o
n
 
w
h
a
t
 
a
s
p
e
c
t
 
i
t
 
b
e
l
o
n
g
s
 
t
o
(
e
.
g
.
,
 
c
o
l
d
 
b
e
e
r
 
i
s
 
p
o
s
i
t
i
v
e
,
 
w
h
i
l
e
 
c
o
l
d
 
f
o
o
d
 
i
s
 
n
e
g
a
t
i
v
e
)
 
15
M
e
t
h
o
d
o
l
o
g
y
 
 
O
n
t
o
l
o
g
y
 
S
t
r
u
c
t
u
r
e
The Mention class has three subclasses:
ActionMention
: represents verbs
EntityMention
: represents nouns
PropertyMentio
n: represents adjectives
The skeletal ontology structure is defined with a number of Mention
subclasses based on entities and attributes within a domain
Predefined entities: Ambience, Location, Service, Experience, Restaurants, Food, Drinks
Predefined attributes: General, Misc, Prices, Quality, Style&Options
Entity#Attribute pairs make up categories, like Food#Quality or Drinks#Prices
 
16
M
e
t
h
o
d
o
l
o
g
y
 
 
O
n
t
o
l
o
g
y
 
S
t
r
u
c
t
u
r
e
Each 
Mention
 class has two subclasses (where <
Type>
 denotes 
Action
,
Entity
, or 
Property
):
GenericPositive<Type>
:
 
also a subclass of 
Positive
GenericNegative<Type>
: also a subclass of 
Negative
Two types of mentions (both linked to lexical representations):
Aspect mentions (do not have associated sentiment)
Sentiment mentions (do have associated sentiment)
Sentiment mentions have associated aspects
 
17
M
e
t
h
o
d
o
l
o
g
y
 
 
W
o
r
d
 
E
m
b
e
d
d
i
n
g
s
We create our word embeddings using Word2Vec with Continuous Bag of
Words (CBOW) architecture
Predict target word based on context of the word
During training the model learns semantic information
Leverage the model weights to create a dense vector
representation of an input word
Alternative is Skip-Gram, performs similarly
 
18
M
e
t
h
o
d
o
l
o
g
y
 
 
W
o
r
d
 
E
m
b
e
d
d
i
n
g
s
 
19
M
e
t
h
o
d
o
l
o
g
y
 
 
T
e
r
m
 
S
e
l
e
c
t
i
o
n
Extract all adjectives, nouns and verbs from Yelp data
W
e
 
c
a
l
c
u
l
a
t
e
 
a
 
T
e
r
m
S
c
o
r
e
 
(
T
S
)
 
f
o
r
 
e
a
c
h
 
w
o
r
d
 
w
h
i
c
h
 
i
s
 
c
o
m
p
o
s
e
d
 
o
f
 
a
D
o
m
a
i
n
S
i
m
i
l
a
r
i
t
y
 
(
D
S
)
 
a
n
d
 
M
e
n
t
i
o
n
C
l
a
s
s
S
i
m
i
l
a
r
i
t
y
 
(
M
C
S
)
 
s
c
o
r
e
 
20
M
e
t
h
o
d
o
l
o
g
y
 
 
T
e
r
m
 
S
e
l
e
c
t
i
o
n
 
21
M
e
t
h
o
d
o
l
o
g
y
 
 
T
e
r
m
 
S
e
l
e
c
t
i
o
n
 
22
M
e
t
h
o
d
o
l
o
g
y
 
 
T
e
r
m
 
S
e
l
e
c
t
i
o
n
 
23
M
e
t
h
o
d
o
l
o
g
y
 
 
T
e
r
m
 
S
e
l
e
c
t
i
o
n
When a term exceeds the threshold, it is suggested to the user
If the term is accepted and a noun or verb, the user decides if it refers to an
aspect or sentiment
If the term belongs to a sentiment the user must decide whether it is a Type-1
sentiment mention or not, if so, the user also decides the polarity
We select all words that have cosine similarity of 0.7 or higher to the
accepted term
 
24
M
e
t
h
o
d
o
l
o
g
y
 
 
S
e
n
t
i
m
e
n
t
 
T
e
r
m
 
C
l
u
s
t
e
r
i
n
g
We want to both determine the polarity of our sentiment terms and what base
aspect mention class(es) the term belongs to
Calculate cosine similarity of our term to base aspect mention classes and
rank them in descending order
 
25
M
e
t
h
o
d
o
l
o
g
y
 
 
S
e
n
t
i
m
e
n
t
 
T
e
r
m
 
C
l
u
s
t
e
r
i
n
g
 
26
M
e
t
h
o
d
o
l
o
g
y
 
 
S
e
n
t
i
m
e
n
t
 
T
e
r
m
 
C
l
u
s
t
e
r
i
n
g
User is recommended the highest ranked base aspect mention class for each
term
If accepted, user confirms if the predicted polarity is correct, otherwise the
opposite polarity is selected
Keep recommending base aspect mention classes until one is declined, then
go to next term
 
27
M
e
t
h
o
d
o
l
o
g
y
 
 
H
i
e
r
a
r
c
h
i
c
a
l
 
C
l
u
s
t
e
r
i
n
g
 
28
E
v
a
l
u
a
t
i
o
n
Both the Manual and SOBA ontology are more extensive than our ontology
Possibly due to a stricter requirement of relevance
 
29
E
v
a
l
u
a
t
i
o
n
WEB-SOBA requires significantly less human time spent than both the SOBA
and Manual method
Computing time is higher due to our datasets being orders of magnitude
larger
Computing time is front-loaded
 
30
E
v
a
l
u
a
t
i
o
n
 
31
Accuracies for Manual ontology is highest without back-up model
With back-up LCR model WEB-SOBA performs best
C
o
n
c
l
u
s
i
o
n
W
e
 
p
r
o
p
o
s
e
d
 
W
E
B
-
S
O
B
A
:
 
W
o
r
d
 
E
m
b
e
d
d
i
n
g
s
-
B
a
s
e
d
 
S
e
m
i
-
a
u
t
o
m
a
t
i
c
O
n
t
o
l
o
g
y
 
B
u
i
l
d
i
n
g
 
f
o
r
 
A
s
p
e
c
t
-
B
a
s
e
d
 
S
e
n
t
i
m
e
n
t
 
C
l
a
s
s
i
f
i
c
a
t
i
o
n
Embedding Learning
Term Selection
Sentiment Clustering
Hierarchical Clustering
We have used two restaurant datasets in our evaluation:
Yelp Dataset Challenge data for ontology building
SemEval 2016, Task 5, Subtask 1, Slot 3 data for measuring performance
 
32
C
o
n
c
l
u
s
i
o
n
Similar performance to other semi-automatically built ontologies when used
without back-up model
Increased accuracy by ~2% when used with state-of-the-art LCR-Rot-hop model
compared to manually made ontology (cross-validation)
Increase accuracy by ~4% when used with state-of-the-art LCR-Rot-hop model
compared to other semi-automatically built ontologies (cross-validation)
Significantly reduces human time spent
More efficient in computing than previous works, but requires more data
Future work:
Use deep contextualized word embeddings (e.g. BERT)
Exploit both synsets and word embeddings
 
33
R
e
f
e
r
e
n
c
e
s
Buitelaar, P., Cimiano, P., & Magnini, B. (2005). Ontology learning from text:
An overview. 
Ontology learning from text: Methods, evaluation and
applications
,
 IOS Press
Dera, E., Frasincar, F., Schouten, K., & Zhuang, L. (2020). SASOBUS: Semi-
automatic sentiment domain ontology building using synsets. In 17th
Extended Semantic Web Conference
 (pp. 105-120). Springer
Schouten, K., Frasincar, F., & de Jong, F. (2017). Ontology-enhanced
aspect-based sentiment analysis. In 
International Conference on Web
Engineering
 (pp. 302-320). Springer
Truşcǎ, M. M., Wassenberg, D., Frasincar, F., & Dekker, R. (2020). A hybrid
approach for aspect-based sentiment analysis using deep contextual word
embeddings and hierarchical attention. In 
International Conference on Web
Engineering
 (pp. 365-380). Springer
 
34
R
e
f
e
r
e
n
c
e
s
Yu, L. C., Wang, J., Lai, K. R., & Zhang, X. (2017). Refining word
embeddings for sentiment analysis. In 
2017 
C
onference on 
E
mpirical
M
ethods in 
N
atural 
L
anguage 
P
rocessing
 (pp. 534-539), ACL.
Zhuang, L., Schouten, K., & Frasincar, F. (2020). SOBA: Semi-automated
ontology builder for aspect-based sentiment analysis. 
Journal of Web
Semantics
60
, 100544
 
35
Slide Note

My name is Ruben Eschauzier and I will present today WEB-SOBA, a Word Emebeddings-Based Ontology Building Approach for Aspect-Based Sentiment Classification. This is joint work with Fenna ten Haaf, Christopher Claassen, Joanne Tjan, Daniel Buijs, Flavius Frasincar, and Kim Schouten, all from Erasmus University Rotterdam, in the Netherlands.

Embed
Share

This study introduces WEB-SOBA, a method for semi-automatically building ontologies using word embeddings for aspect-based sentiment analysis. With the growing importance of online reviews, the focus is on sentiment mining to extract insights from consumer feedback. The motivation behind the research lies in the need for automatic analysis of sentiment within reviews, particularly focusing on aspect-based sentiment analysis (ABSA). The proposed approach, HAABSA, combines ontology and machine learning to predict sentiment, with word embeddings playing a significant role in ontology building.


Uploaded on Sep 10, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. WEB-SOBA: Word Embeddings-Based Semi-automatic Ontology Building for Aspect-Based Sentiment Classification Fenna ten Haaf, Christopher Claassen, Ruben Eschauzier, Joanne Tjan, Daniel Buijs, Flavius Frasincar, and Kim Schouten Erasmus University Rotterdam, the Netherlands 1

  2. Content Motivation Related Work Data Methodology Evaluation Conclusion 2

  3. Motivation Growing number of text data available More specifically online reviews Growing importance of reviews: 80% of the consumers read online reviews 75% of the consumers consider reviews important Business can use the information to improve Identify key strengths and weaknesses Yelp alone features more than 200 million reviews 3

  4. Motivation Automatic analysis of data required to generate insights We focus on sentiment within reviews Sentiment miningis defined as the automatic assessment of the sentiment expressed in text (in our case by consumers in product reviews) Different types of sentiment mining Review-level Sentence-level Aspect-level (for products, aspects are also referred to as product features) 4

  5. Motivation Aspect-Based Sentiment Analysis (ABSA) aims to determine sentiment polarity regarding aspects of products ABSA has two phases Aspect mining attempts to extract aspects mentioned in text There are two different aspect types Explicit aspect mentions where the aspect is explicitly mentioned Implicit aspect mentions, where the aspect is implied by the sentence Three main approaches Knowledge Representation Machine Learning Hybrid: current state-of-the-art, e.g., A Hybrid Approach for Aspect-Based Sentiment Analysis (HAABSA) proposed by Wallaart and Frasincar (2019) at ESWC 2019 5

  6. Motivation HAABSA uses an ontology to predict sentiment If that fails, use machine learning model as back-up Ontology is hand crafted Semi-automatic ontology building reduces user time spent Word embeddings promising candidate to use as word representation We propose a method for building ontologies semi-automatically based on word embeddings 6

  7. Related Work Ontology+BoW approach is introduced in Schouten and Frasincar (2017) Only ontology achieves 74.2% accuracy on SemEval 2016 Only BoW SVM achieves 82.0% accuracy Ontology + BoW SVM achieves 86.0% accuracy HAABSA combines an ontology with attentional neural model Left-Center-Right separated neural network with Rotatory attention and multi-hops (LCR-Rot-hop) by Wallaart and Frasincar (2019) uses left, target and right context to compute attention scores in an iterative approach to mine sentiment. Ontology + LCR-Rot-hop achieves state-of-the-art 88.0% accuracy on SamEval 2016 7

  8. Related Work Buitelaar et al. (2005) propose a framework for creating ontologies Three ingredients are needed Terms: Words and their synonyms that portray either sentiment or aspects that are specific to our domain or are used for general use Concepts: Terms that are related to each other need to be linked within the ontology to form concepts Concept Hierarchies: Use concept hierarchy learning to form hierarchies within concepts We adapt these ingredients to fit in a word embedding-based method 8

  9. Related Work Zhuang, Schouten and Frasincar (2020) introduce SOBA Semi-automated ontology builder Terms and their associated synsets produce concepts Uses word frequencies in domain corpora Dera, et al. (2020) propose SASOBUS Semi-automatic sentiment domain ontology building using synsets Synsets are also used during concept hierarchy learning Achieves better accuracy on SemEval 2016 dataset 9

  10. Data Domain corpus Yelp Dataset Challenge dataset Keep only restaurant reviews 5,508,394 domain-specific reviews of over 500,000 restaurants Contrasting corpus Pre-trained word2vec model Google-news-300 10

  11. Data SemEval 2016 Task 5 Restaurant data for sentence-level analysis Training contains 2,000 sentences Test 676 sentences Each sentence contains opinions related to specific targets within the sentence Category of target is also annotated Consists of entity E (e.g., restaurant) and attribute A (e.g., prices) Each pair E#A is annotated with polarity p {Negative, Neutral, Positive} Implicit aspects are removed leaving 1879 explicit aspects 11

  12. Data Example of sentence in in SemEval 2016 dataset in XML format Aspect categories: FOOD, AMBIENCE, DRINKS, LOCATION, RESTAURANT, and SERVICE Aspect attributes: PRICES, QUALITY, STYLE&OPTIONS, GENERAL, and MISCELLANEOUS 12

  13. Data Positive sentiment occurs most often Food quality mentioned most often 13

  14. Methodology WEB-SOBA: Word Embeddings-Based Semi-automatic Ontology Building for Aspect-Based Sentiment Classification Implementation in Java: https://github.com/RubenEschauzier/WEB-SOBA Gensim (Python) to train word2vec model on Yelp dataset Word Embedding Refine (using the implementation in Python: https://github.com/wangjin0818/word_embedding_refine) to make word embeddings sentiment aware introduced by Yu et al. (2017) Stanford CoreNLP (Java) for tokenization, lemmatization, and part-of- speech tagging The domain ontology is represented in OWL 14

  15. Methodology Ontology Structure The ontology consists of two main classes: Mention SentimentValue SentimentValue consists of two subclasses Positive Negative There are three types of sentiment words Type-1: Generic sentiment always has the same polarity regardless of context, always a subclass of GenericPositive or GenericNegative class Type-2: Sentiment word that only applies to specific category of aspects (e.g., delicious applies to food and drinks, but not to service) Type-3: Sentiment word that changes its polarity based on what aspect it belongs to (e.g., cold beer is positive, while cold food is negative) 15

  16. Methodology Ontology Structure The Mention class has three subclasses: ActionMention: represents verbs EntityMention: represents nouns PropertyMention: represents adjectives The skeletal ontology structure is defined with a number of Mention subclasses based on entities and attributes within a domain Predefined entities: Ambience, Location, Service, Experience, Restaurants, Food, Drinks Predefined attributes: General, Misc, Prices, Quality, Style&Options Entity#Attribute pairs make up categories, like Food#Quality or Drinks#Prices 16

  17. Methodology Ontology Structure Each Mention class has two subclasses (where <Type> denotes Action, Entity, or Property): GenericPositive<Type>:also a subclass of Positive GenericNegative<Type>: also a subclass of Negative Two types of mentions (both linked to lexical representations): Aspect mentions (do not have associated sentiment) Sentiment mentions (do have associated sentiment) Sentiment mentions have associated aspects 17

  18. Methodology Word Embeddings We create our word embeddings using Word2Vec with Continuous Bag of Words (CBOW) architecture Predict target word based on context of the word During training the model learns semantic information Leverage the model weights to create a dense vector representation of an input word Alternative is Skip-Gram, performs similarly 18

  19. Methodology Word Embeddings Yu et al. (2017) refine vectors to include sentiment by using the Extended version of Affective Norms of English Words (E-ANEW) as a sentiment lexicon For each word, rank the ? words closest the sentiment of the target word Also rank the ? words based on cosine similarity to target word Use the above rankings to adjust target word s vector 19

  20. Methodology Term Selection Extract all adjectives, nouns and verbs from Yelp data We calculate a TermScore (TS) for each word which is composed of a DomainSimilarity (DS) and MentionClassSimilarity (MCS) score 20

  21. Methodology Term Selection DS represents how domain specific a given word is, we compute it using cosine similarity Lower is better ??,? ??,? , ???= ??,? ??,? where: ??,?is the vector of word i in the domain-related word embedding model ??,? the vector of word i in the general word embedding model 21

  22. Methodology Term Selection MSC represents if a certain word is related to one of the base aspect mention classes of our skeletal ontology Higher is better ?? ?? ?? | ??|, ????= max ? ? where: ? is the set of base aspect mention classes 22

  23. Methodology Term Selection The TS is the harmonic mean of the DS and MSC 2 ???= ??? +???? 1 We select all words with a TS above a certain threshold ???? 2 , ????= max ? 1 ????????+ ???? ???????? where: ? is the number of suggested terms ???????? is the number of accepted terms 23

  24. Methodology Term Selection When a term exceeds the threshold, it is suggested to the user If the term is accepted and a noun or verb, the user decides if it refers to an aspect or sentiment If the term belongs to a sentiment the user must decide whether it is a Type-1 sentiment mention or not, if so, the user also decides the polarity We select all words that have cosine similarity of 0.7 or higher to the accepted term 24

  25. Methodology Sentiment Term Clustering We want to both determine the polarity of our sentiment terms and what base aspect mention class(es) the term belongs to Calculate cosine similarity of our term to base aspect mention classes and rank them in descending order 25

  26. Methodology Sentiment Term Clustering Predict polarity of sentiment words by calculating both positive and negative scores ?? ?? ?? ?? , ?? ?? ?? ?? , ???= max ???= max ? ? ? ? where: ? is a collection of negative words with different intensities ? is a collection of positive words with different intensities N contains bad , awful , horrible , terrible , poor , lousy , shitty , horrid P contains good , decent , great , tasty , fantastic , solid , yummy , terrific 26

  27. Methodology Sentiment Term Clustering User is recommended the highest ranked base aspect mention class for each term If accepted, user confirms if the predicted polarity is correct, otherwise the opposite polarity is selected Keep recommending base aspect mention classes until one is declined, then go to next term 27

  28. Methodology Hierarchical Clustering First cluster aspect mentions based on what base aspect mention class they belong to Done using adjusted k-means The k means are fixed and are the word embeddings for the base aspect mention classes We build a hierarchy for each subclass Use agglomerative hierarchical clustering All terms start in a single cluster and 2 clusters are merged in each step Merging based on Average Linkage Clustering (ALC) between cluster ?1 and ?2 1 ?1 |?2| ? ?1 ? ?2?(?,?), where ?(?,?) is the Euclidean distance ??? ?1,?2 = 28

  29. Evaluation Manual SOBA WEB-SOBA Classes 365 470 376 Lexicalizations 750 2175 348 Synonyms 422 1766 37 Quality Phrases 16 6 0 Both the Manual and SOBA ontology are more extensive than our ontology Possibly due to a stricter requirement of relevance 29

  30. Evaluation Manual 420 SOBA 90 WEB-SOBA 40 User time (minutes) Computing time (minutes) 0 90 30 (+300) WEB-SOBA requires significantly less human time spent than both the SOBA and Manual method Computing time is higher due to our datasets being orders of magnitude larger Computing time is front-loaded 30

  31. Evaluation Out-of-Sample Accuracy 78.31% 76.62% 77.08% 77.08% In-Sample Accuracy 75.31% 73.82% 74.56% 72.11% Cross-Validation Accuracy 74.10% 70.69% 71.71% 70.05% St. dev. Manual SASOBUS SOBA WEB-SOBA 0.044 0.049 0.061 0.050 Manual + LCR-Rot-hop SASOBUS + LCR-Rot-hop SOBA + LCR-Rot-hop WEB-SOBA + + LCR-Rot-hop 86.65% 84.76% 86.23% 87.16% 87.96% 83.38% 85.93% 88.87% 82.76% 80.20% 80.15% 84.72% 0.022 0.031 0.039 0.043 Accuracies for Manual ontology is highest without back-up model With back-up LCR model WEB-SOBA performs best 31

  32. Conclusion We proposed WEB-SOBA: Word Embeddings-Based Semi-automatic Ontology Building for Aspect-Based Sentiment Classification Embedding Learning Term Selection Sentiment Clustering Hierarchical Clustering We have used two restaurant datasets in our evaluation: Yelp Dataset Challenge data for ontology building SemEval 2016, Task 5, Subtask 1, Slot 3 data for measuring performance 32

  33. Conclusion Similar performance to other semi-automatically built ontologies when used without back-up model Increased accuracy by ~2% when used with state-of-the-art LCR-Rot-hop model compared to manually made ontology (cross-validation) Increase accuracy by ~4% when used with state-of-the-art LCR-Rot-hop model compared to other semi-automatically built ontologies (cross-validation) Significantly reduces human time spent More efficient in computing than previous works, but requires more data Future work: Use deep contextualized word embeddings (e.g. BERT) Exploit both synsets and word embeddings 33

  34. References Buitelaar, P., Cimiano, P., & Magnini, B. (2005). Ontology learning from text: An overview. Ontology learning from text: Methods, evaluation and applications, IOS Press Dera, E., Frasincar, F., Schouten, K., & Zhuang, L. (2020). SASOBUS: Semi- automatic sentiment domain ontology building using synsets. In 17th Extended Semantic Web Conference (pp. 105-120). Springer Schouten, K., Frasincar, F., & de Jong, F. (2017). Ontology-enhanced aspect-based sentiment analysis. In International Conference on Web Engineering (pp. 302-320). Springer Tru c , M. M., Wassenberg, D., Frasincar, F., & Dekker, R. (2020). A hybrid approach for aspect-based sentiment analysis using deep contextual word embeddings and hierarchical attention. In International Conference on Web Engineering (pp. 365-380). Springer 34

  35. References Yu, L. C., Wang, J., Lai, K. R., & Zhang, X. (2017). Refining word embeddings for sentiment analysis. In 2017 Conference on Empirical Methods in Natural Language Processing (pp. 534-539), ACL. Zhuang, L., Schouten, K., & Frasincar, F. (2020). SOBA: Semi-automated ontology builder for aspect-based sentiment analysis. Journal of Web Semantics, 60, 100544 35

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#