Conceptual grounding

Conceptual grounding

Nisheeth

th

 March 2019

The similarity function

•

Objects represented as

points in some

coordinate space

•

Metric distances

between points reflect

observed similarities

•

But what reason do we

have to believe the

similarity space is

endowed with a metric

distance?

https://pdfs.semanticscholar.org/9bc7/f2f2dd8ca2d5bdc03c82db264cbe0f2c0449.pdf

Near

Far

What makes a measure metric?

•

Minimality

–

D(a,b) ≥ D(a,a) = 0

•

Symmetry

–

D(a,b) = D(b,a)

•

Triangle inequality

–

D(a,b) + D(b,c) ≥ D(c,a)

•

Do similarity judgments satisfy any of these

properties?

Tversky’s set-theoretic similarity

•

Assumptions

–

Matching: s(a,b) = f(a∩b, a-b, b-a)

–

Monotonicity: s(a,b) ≥ s(a,c) whenever a∩c is a subset

of a∩b, a - b is a subset of a - c, and b - a is a subset of

c –a

–

Independence: the joint effect on similarity of any two

feature components is unaffected by the impact of

other components

•

Satisfying model = add up matching features,

subtract out distinct features

–

Feature definition unspecified

What Gives Concepts Their Meaning?

•

Goldstone and Rogosky (2002)

–

External grounding

: a concept’s meaning comes from

its connection to the external world

–

Conceptual web

: a concept’s meaning comes from its

connections to other concepts in the same conceptual

system

•

Examples of “conceptual web” approach:

–

Semantic Networks

–

Probabilistic language models

Semantic Networks

Language Model

•

Unigram language model

–

probability distribution over the words in a

language

–

generation of text consists of pulling words out of

a “bucket” according to the probability

distribution and replacing them

•

N-gram language model

–

some applications use bigram and trigram

language models where probabilities depend on

previous words

Language Model

•

topic

 in a document or query can be

represented as a language model

–

i.e., words that tend to occur often when discussing a

topic will have high probabilities in the corresponding

language model

–

The basic assumption is that words cluster in semantic

space

•

Multinomial

distribution over words

–

text is modeled as a finite sequence of words, where

there are

possible words at each point in the

sequence

–

commonly used, but not only possibility

–

doesn’t model

burstiness

Language models in information

retrieval

•

3 possibilities:

–

probability of generating the query text from a

document language model

–

probability of generating the document text from

a query language model

–

comparing the language models representing the

query and document topics

•

Commonly used in NLP applications

Implicit psychological premise

Query

Document

Document

Document

Document

Rank documents by the closeness of the topics they represent in semantic space to the

topic represented by the search query

Query-Likelihood Model

•

Rank documents by the probability that the

query could be generated by the document

model (i.e. same topic)

•

Given query, start with P(D|Q)

•

Using Bayes’ Rule

•

Assuming prior is uniform, unigram model

Estimating Probabilities

•

Obvious estimate for unigram probabilities is

•

Maximum likelihood estimate

–

makes the observed value of

;D

most likely

•

If query words are missing from document,

score will be zero

–

Missing 1 out of 4 query words same as missing 3

out of 4

Smoothing

•

Document texts are a

sample

 from the

language model

–

Missing words should not have zero probability of

occurring

•

Smoothing

 is a technique for estimating

probabilities for missing (or unseen) words

–

lower (or

discount

) the probability estimates for

words that are seen in the document text

–

assign that “left-over” probability to the estimates

for the words that are not seen in the text

Estimating Probabilities

•

Estimate for unseen words is

α

–

) is the probability for query word

in the

collection

 language model for collection

(background probability)

–

α

 is a parameter

•

Estimate for words that occur is

(1 −

α

) +

α

•

Different forms of estimation come from

different

α

Dirichlet Smoothing

•

α

depends on document length

•

Gives probability estimation of

•

and document score

Take home question: what is Dirichlet about this smoothing method?

Query Likelihood Example

•

For the term “president”

–

qi,D

= 15,

qi

= 160,000

•

For the term “lincoln”

–

qi,D

= 25,

qi

= 2,400

•

document |d| is assumed to be 1,800 words long

•

collection is assumed to be 10

 words long

–

500,000 documents times an average of 2,000 words

•

μ

 = 2,000

Query Likelihood Example

•

  Negative number because summing logs

of small numbers

Query Likelihood Example

Extension: Google Distance

Strong correlation with human

similarity ratings

Human similarity ratings

Scaled NGD

Can operationalize creativity

Slide Note

Embed Share

Download

Conceptual, Grounding, Semantic Networks

isrra Follow

Uploaded on Feb 28, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Conceptual grounding Nisheeth 26thMarch 2019

The similarity function Objects represented as points in some coordinate space Metric distances between points reflect observed similarities But what reason do we have to believe the similarity space is endowed with a metric distance? Near Far https://pdfs.semanticscholar.org/9bc7/f2f2dd8ca2d5bdc03c82db264cbe0f2c0449.pdf

What makes a measure metric? Minimality D(a,b) D(a,a) = 0 Symmetry D(a,b) = D(b,a) Triangle inequality D(a,b) + D(b,c) D(c,a) Do similarity judgments satisfy any of these properties?

Tverskys set-theoretic similarity Assumptions Matching: s(a,b) = f(a b, a-b, b-a) Monotonicity: s(a,b) s(a,c) whenever a c is a subset of a b, a - b is a subset of a - c, and b - a is a subset of c a Independence: the joint effect on similarity of any two feature components is unaffected by the impact of other components Satisfying model = add up matching features, subtract out distinct features Feature definition unspecified

What Gives Concepts Their Meaning? Goldstone and Rogosky (2002) External grounding: a concept s meaning comes from its connection to the external world Conceptual web: a concept s meaning comes from its connections to other concepts in the same conceptual system Examples of conceptual web approach: Semantic Networks Probabilistic language models

Semantic Networks NORTHSOUTH EAST WEST HOSPITAL PATIENT NURSE DOCTOR LAWYER WEALTHY Hofstadter. Godel, Escher, Bach.

Language Model Unigram language model probability distribution over the words in a language generation of text consists of pulling words out of a bucket according to the probability distribution and replacing them N-gram language model some applications use bigram and trigram language models where probabilities depend on previous words

Language Model A topic in a document or query can be represented as a language model i.e., words that tend to occur often when discussing a topic will have high probabilities in the corresponding language model The basic assumption is that words cluster in semantic space Multinomial distribution over words text is modeled as a finite sequence of words, where there are t possible words at each point in the sequence commonly used, but not only possibility doesn t model burstiness

Language models in information retrieval 3 possibilities: probability of generating the query text from a document language model probability of generating the document text from a query language model comparing the language models representing the query and document topics Commonly used in NLP applications

Implicit psychological premise Document Document Query Document Document Hofstadter. Godel, Escher, Bach. Rank documents by the closeness of the topics they represent in semantic space to the topic represented by the search query

Query-Likelihood Model Rank documents by the probability that the query could be generated by the document model (i.e. same topic) Given query, start with P(D|Q) Using Bayes Rule Assuming prior is uniform, unigram model

Estimating Probabilities Obvious estimate for unigram probabilities is Maximum likelihood estimate makes the observed value of fqi;D most likely If query words are missing from document, score will be zero Missing 1 out of 4 query words same as missing 3 out of 4

Smoothing Document texts are a sample from the language model Missing words should not have zero probability of occurring Smoothing is a technique for estimating probabilities for missing (or unseen) words lower (or discount) the probability estimates for words that are seen in the document text assign that left-over probability to the estimates for the words that are not seen in the text

Estimating Probabilities Estimate for unseen words is DP(qi|C) P(qi|C) is the probability for query word i in the collection language model for collection C (background probability) D is a parameter Estimate for words that occur is (1 D) P(qi|D) + D P(qi|C) Different forms of estimation come from different D

Dirichlet Smoothing D depends on document length Gives probability estimation of and document score Take home question: what is Dirichlet about this smoothing method?

Query Likelihood Example For the term president fqi,D= 15, cqi = 160,000 For the term lincoln fqi,D= 25, cqi= 2,400 document |d| is assumed to be 1,800 words long collection is assumed to be 109 words long 500,000 documents times an average of 2,000 words = 2,000