Machine Translation Concluded

Machine Translation
Concluded
Philipp Koehn
USC/Information Sciences Institute
USC/Computer Science Department
School of Informatics
University of Edinburgh
Some slides adapted from
David Kauchak
CS159 – Fall 2020
Kevin Knight
Computer Science Department
UC Berkeley
Dan Klein
Admin
Assignment 5b
Assignment 6 available
Quiz 3: 11/10
Language translation
¡
Hola!
Word models: IBM Model 1
Mary  did  not  slap the green witch
Maria no d
i
ó
 una botefada a la bruja verde
Each foreign word is aligned to exactly one English word
T
h
i
s
 
i
s
 
t
h
e
 
O
N
L
Y
 
t
h
i
n
g
 
w
e
 
m
o
d
e
l
!
NULL
p(verde | green)
Training without alignments
Initially assume a p(f|e) are equally probable
Repeat:
Enumerate all possible alignments
Calculate how probable the alignments are under
the current model (i.e. p(f|e))
Recalculate p(f|e) using counts from 
all
alignments, 
weighted
 by how probable they are
(Note: theoretical algorithm)
EM alignment
E-step
Enumerate all possible alignments
Calculate 
how probable the alignments 
are
under the current model (i.e. p(f|e))
M-step
Recalculate 
p(f|e)
 using counts from 
all
alignments, 
weighted
 by how probable they are
(Note: theoretical algorithm)
green house
casa  verde
green house
casa  verde
green house
casa  verde
green house
casa  verde
the house
la         casa
the house
la         casa
the house
la         casa
the house
la         casa
E-step: Given p(F|E), calculate p(A,F|E)
green house
casa  verde
green house
casa  verde
green house
casa  verde
green house
casa  verde
the house
la         casa
the house
la         casa
the house
la         casa
the house
la         casa
1/8
1/4
1/8
1/4
1/4
1/4
1/8
1/8
Calculate unnormalized counts
E-step: Given p(F|E), calculate p(A,F|E)
green house
casa  verde
green house
casa  verde
green house
casa  verde
green house
casa  verde
the house
la         casa
the house
la         casa
the house
la         casa
the house
la         casa
1/8
1/4
1/8
1/4
1/4
1/4
1/8
1/8
 
s
u
m
 
=
 
(
3
/
4
)
 
s
u
m
 
=
 
(
3
/
4
)
Normalize by the sum
E-step: Given p(F|E), calculate p(A,F|E)
green house
casa  verde
green house
casa  verde
green house
casa  verde
green house
casa  verde
the house
la         casa
the house
la         casa
the house
la         casa
the house
la         casa
1/6
1/3
1/6
1/3
1/3
1/3
1/6
1/6
s
u
m
 
=
 
(
3
/
4
)
s
u
m
 
=
 
(
3
/
4
)
Normalize by the sum
E-step: Given p(F|E), calculate p(A,F|E)
green house
casa  verde
green house
casa  verde
green house
casa  verde
green house
casa  verde
the house
la         casa
the house
la         casa
the house
la         casa
the house
la         casa
1/6
1/3
1/6
1/3
1/3
1/3
1/6
1/6
c(casa,green) = 
1/6+1/3 = 3/6
c(verde,green) = 
1/3+1/3 = 4/6
c(la, green) = 
0
c(casa,house) = 
1/3+1/6+
 
  
         1/3+1/6 = 6/6
c(verde,house) = 
1/6+1/6 = 2/6
c(la,house) = 
1/6+1/6 = 2/6
c(casa,the) = 
1/6+1/3 = 3/6
c(verde,the) = 
0
c(la,the) = 
1/3+1/3 = 4/6
M-step: calculate unnormalized counts for p(f|e) given the alignments
green house
casa  verde
green house
casa  verde
green house
casa  verde
green house
casa  verde
the house
la         casa
the house
la         casa
the house
la         casa
the house
la         casa
c(casa,green) = 1/6+1/3 = 3/6
c(verde,green) = 1/3+1/3 = 4/6
c(la, green) = 0
c(casa,house) = 1/3+1/6+
 
  
         1/3+1/6 = 6/6
c(verde,house) = 1/6+1/6 = 2/6
c(la,house) = 1/6+1/6 = 2/6
c(casa,the) = 1/6+1/3 = 3/6
c(verde,the) = 0
c(la,the) = 1/3+1/3 = 4/6
Then, calculate the probabilities by normalizing the counts
M-step: normalize
1/6
1/3
1/6
1/3
1/3
1/3
1/6
1/6
Implementation details
Repeat:
E-step
Enumerate all possible alignments
Calculate 
how probable the alignments 
are under the current
model (i.e. p(f|e))
M-step
Recalculate 
p(f|e)
 using counts from 
all
 alignments, 
weighted
 by
how probable they are
For |E| English words and |F| foreign words, how many
alignments are there?
Implementation details
Repeat:
E-step
Enumerate all possible alignments
Calculate 
how probable the alignments 
are under the current
model (i.e. p(f|e))
M-step
Recalculate 
p(f|e)
 using counts from 
all
 alignments, 
weighted
 by
how probable they are
Each foreign word can be aligned to any
of the English words (or NULL)
(
|
E
|
+
1
)
|
F
|
Thought experiment
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
His wife talks to him.
Su mujer habla con 
é
l.
The sharks await.
 Los tiburones esperan.
 
p(el | the) = 0.5
p(Los | the) = 0.5
If we had the alignments
Input: corpus of English/Foreign sentence pairs along
with alignment
for (E, F) in corpus:
 
for aligned words (e, f) in pair (E,F):
  
count(e,f) += 1
  
count(e) += 1
for all (e,f) in count:
 
p(f|e) = count(e,f) / count(e)
  
If we had the alignments
Input: corpus of English/Foreign sentence pairs along with alignment
for (E, F) in corpus:
 
for e in E:
  
for f in F:
   
if f aligned-to e:
    
count(e,f) += 1
    
count(e) += 1
for all (e,f) in count:
 
p(f|e) = count(e,f) / count(e)
  
If we had the alignments
Input: corpus of English/Foreign sentence pairs along with alignment
for (E, F) in corpus:
 
for aligned words (e, f) in pair (E,F):
  
count(e,f) += 1
  
count(e) += 1
for all (e,f) in count:
 
p(f|e) = count(e,f) / count(e)
  
for (E, F) in corpus
     for e in E:
          for f in F:
               if f aligned-to e:
                    count(e,f) += 1
                    count(e) += 1
Are these equivalent?
Thought experiment #2
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
80 annotators
20 annotators
Use partial counts:
-
count(viejo | man) 0.8
-
count(viejo | old) 0.2
Without the alignments
K
e
y
:
 
u
s
e
 
e
x
p
e
c
t
e
d
 
c
o
u
n
t
s
 
(
i
.
e
.
,
 
h
o
w
 
l
i
k
e
l
y
 
b
a
s
e
d
o
n
 
t
h
e
 
c
u
r
r
e
n
t
 
m
o
d
e
l
)
,
 
r
a
t
h
e
r
 
t
h
a
n
 
a
c
t
u
a
l
 
c
o
u
n
t
s
if f aligned-to e:
                    count(e,f) += 1
                    count(e) += 1
Without alignments
a b c
y z
Without alignments
a b c
y z
p(y | a)
Of all things that 
y
 could align to, how likely is it to be 
a
:
Does that do it?
 
No! p(y | a) is how likely y is to align to a over the whole data set.
Without alignments
a b c
y z
p(y | a)
Of all things that 
y
 could align to, how likely is it to be 
a
:
p(y | a) + p(y | b) + p(y | c)
Without the alignments
EM: without the alignments
EM: without the alignments
EM: without the alignments
Where are the E and M steps?
EM: without the alignments
Calculate 
how probable the alignments 
are under the current
model (i.e. p(f|e))
EM: without the alignments
R
e
c
a
l
c
u
l
a
t
e
 
p
(
f
|
e
)
 
u
s
i
n
g
 
c
o
u
n
t
s
 
f
r
o
m
 
a
l
l
 
a
l
i
g
n
m
e
n
t
s
,
 
w
e
i
g
h
t
e
d
b
y
 
h
o
w
 
p
r
o
b
a
b
l
e
 
t
h
e
y
 
a
r
e
NULL
Sometimes foreign words don’t have a direct correspondence to
an English word
Adding a NULL word allows for p(f | NULL), i.e. words that
appear, but are not associated explicitly with an English word
Implementation: add “NULL” (
or some unique string
representing NULL
) to each of the English sentences, often at
the beginning
 
of the sentence
Benefits of word-level model
Rarely used in practice for modern MT system
Mary  did  not  slap the green witch
Maria no d
i
ó
 una botefada a la bruja verde
Two key side effects of training a word-level model:
Word-level alignment
p(f | e): translation dictionary
How do I get this?
Word alignment
100 iterations
green house
casa  verde
the house
la         casa
How should these be aligned?
Word alignment
100 iterations
green house
casa  verde
the house
la         casa
Why?
Word-level alignment
Which for IBM model 1 is:
 
Given a model (i.e. trained p(f|e)), how do we find this?
 
Align each foreign word (f in F) to the English
word (e in E) with highest p(f|e)
Word-alignment Evaluation
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
How good of an alignment is this?
How can we quantify this?
Word-alignment Evaluation
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
System:
How can we quantify this?
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
Human
Word-alignment Evaluation
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
System:
Human
Precision and recall!
Word-alignment Evaluation
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
The old man is happy. He has fished many times.
El viejo est
á
 feliz porque ha pescado muchos veces.
System:
Human
Precision:
Recall:
6
7
6
10
Problems for Statistical MT
Preprocessing
Language modeling
Translation modeling
Decoding
Parameter optimization
Evaluation
What kind of Translation Model?
Mary  did  not  slap the green witch
Maria no d
i
ó
 una botefada a la bruja verde
W
o
r
d
-
l
e
v
e
l
 
m
o
d
e
l
s
P
h
r
a
s
a
l
 
m
o
d
e
l
s
S
y
n
t
a
c
t
i
c
 
m
o
d
e
l
s
S
e
m
a
n
t
i
c
 
m
o
d
e
l
s
Phrasal translation model
The models define probabilities over inputs
Morgen
fliege
ich
nach Kanada
zur Konferenz
1. Sentence is divided into phrases
Phrasal translation model
The models define probabilities over inputs
Morgen
fliege
ich
nach Kanada
zur Konferenz
1.
Sentence is divided into phrases
2.
Phrases are translated (avoids a lot of weirdness from
word-level model)
Tomorrow
I
will fly
to the conference
In Canada
Phrasal translation model
The models define probabilities over inputs
Morgen
fliege
ich
nach Kanada
zur Konferenz
Tomorrow
I
will fly
to the conference
In Canada
1.
Sentence is divided into phrases
2.
Phrase are translated (avoids a lot of weirdness from
word-level model)
3.
Phrases are reordered
Phrase table
natuerlich
Phrase table
den Vorschlag
Phrasal translation model
The models define probabilities over inputs
Morgen
fliege
ich
nach Kanada
zur Konferenz
Tomorrow
I
will fly
to the conference
In Canada
Advantages?
Advantages of Phrase-Based
Many-to-many mappings can handle non-
compositional phrases
Easy to understand
Local context is very useful for disambiguating
“Interest rate” 
 …
“Interest in”  …
The more data, the longer the learned phrases
Sometimes whole sentences!
These
7
people
include
astronauts
coming
from
France
and
Russia
.
DT
CD
VBP
NNS
IN
NNP
CC
NNP
PUNC
NP
NP
NP
VP
NP
VP
S
NNS
VBG
PP
NP
NP
Syntax-based models
Benefits?
Syntax-based models
Benefits
Can use syntax to motivate word/phrase movement
Could ensure grammaticality
Two main types:
p(foreign 
string
 | English parse tree)
p(foreign 
parse tree 
| English parse tree)
Tree to string rule
Tree to tree example
Problems for Statistical MT
Preprocessing
Language modeling
Translation modeling
Decoding
Parameter optimization
Evaluation
MT Evaluation
How do we do it?
What data might be useful?
MT Evaluation
Source only
Manual:
SSER (subjective sentence error rate)
Correct/Incorrect
Error categorization
Extrinsic:
Objective usage testing
Automatic:
WER (word error rate)
BLEU (Bilingual Evaluation Understudy)
NIST
Automatic Evaluation
Common NLP/machine learning/AI approach
All sentence
pairs
Training
sentence
pairs
Testing
sentence
pairs
Automatic Evaluation
R
e
f
e
r
e
n
c
e
 
(
h
u
m
a
n
)
 
t
r
a
n
s
l
a
t
i
o
n
:
T
h
e
 
U
.
S
.
 
i
s
l
a
n
d
 
o
f
 
G
u
a
m
 
i
s
m
a
i
n
t
a
i
n
i
n
g
 
a
 
h
i
g
h
 
s
t
a
t
e
 
o
f
 
a
l
e
r
t
 
a
f
t
e
r
t
h
e
 
G
u
a
m
 
a
i
r
p
o
r
t
 
a
n
d
 
i
t
s
 
o
f
f
i
c
e
s
 
b
o
t
h
r
e
c
e
i
v
e
d
 
a
n
 
e
-
m
a
i
l
 
f
r
o
m
 
s
o
m
e
o
n
e
c
a
l
l
i
n
g
 
h
i
m
s
e
l
f
 
t
h
e
 
S
a
u
d
i
 
A
r
a
b
i
a
n
O
s
a
m
a
 
b
i
n
 
L
a
d
e
n
 
a
n
d
 
t
h
r
e
a
t
e
n
i
n
g
 
a
b
i
o
l
o
g
i
c
a
l
/
c
h
e
m
i
c
a
l
 
a
t
t
a
c
k
 
a
g
a
i
n
s
t
p
u
b
l
i
c
 
p
l
a
c
e
s
 
s
u
c
h
 
a
s
 
t
h
e
 
a
i
r
p
o
r
t
 
.
M
a
c
h
i
n
e
 
t
r
a
n
s
l
a
t
i
o
n
:
T
h
e
 
A
m
e
r
i
c
a
n
 
[
?
]
 
i
n
t
e
r
n
a
t
i
o
n
a
l
 
a
i
r
p
o
r
t
a
n
d
 
i
t
s
 
t
h
e
 
o
f
f
i
c
e
 
a
l
l
 
r
e
c
e
i
v
e
s
 
o
n
e
 
c
a
l
l
s
s
e
l
f
 
t
h
e
 
s
a
n
d
 
A
r
a
b
 
r
i
c
h
 
b
u
s
i
n
e
s
s
 
[
?
]
 
a
n
d
s
o
 
o
n
 
e
l
e
c
t
r
o
n
i
c
 
m
a
i
l
 
,
 
w
h
i
c
h
 
s
e
n
d
s
 
o
u
t
 
;
T
h
e
 
t
h
r
e
a
t
 
w
i
l
l
 
b
e
 
a
b
l
e
 
a
f
t
e
r
 
p
u
b
l
i
c
 
p
l
a
c
e
a
n
d
 
s
o
 
o
n
 
t
h
e
 
a
i
r
p
o
r
t
 
t
o
 
s
t
a
r
t
 
t
h
e
b
i
o
c
h
e
m
i
s
t
r
y
 
a
t
t
a
c
k
 
,
 
[
?
]
 
h
i
g
h
l
y
 
a
l
e
r
t
s
a
f
t
e
r
 
t
h
e
 
m
a
i
n
t
e
n
a
n
c
e
.
M
a
c
h
i
n
e
 
t
r
a
n
s
l
a
t
i
o
n
 
2
:
United States Office of the Guam International
Airport and were received by a man claiming to
be Saudi Arabian businessman Osama bin
Laden, sent emails, threats to airports and other
public places will launch a biological or chemical
attack, remain on high alert in Guam.
Ideas?
R
e
f
e
r
e
n
c
e
 
(
h
u
m
a
n
)
 
t
r
a
n
s
l
a
t
i
o
n
:
T
h
e
 
U
.
S
.
 
i
s
l
a
n
d
 
o
f
 
G
u
a
m
 
i
s
m
a
i
n
t
a
i
n
i
n
g
 
a
 
h
i
g
h
 
s
t
a
t
e
 
o
f
 
a
l
e
r
t
a
f
t
e
r
 
t
h
e
 
G
u
a
m
 
a
i
r
p
o
r
t
 
a
n
d
 
i
t
s
o
f
f
i
c
e
s
 
b
o
t
h
 
r
e
c
e
i
v
e
d
 
a
n
 
e
-
m
a
i
l
f
r
o
m
 
s
o
m
e
o
n
e
 
c
a
l
l
i
n
g
 
h
i
m
s
e
l
f
 
t
h
e
S
a
u
d
i
 
A
r
a
b
i
a
n
 
O
s
a
m
a
 
b
i
n
 
L
a
d
e
n
a
n
d
 
t
h
r
e
a
t
e
n
i
n
g
 
a
b
i
o
l
o
g
i
c
a
l
/
c
h
e
m
i
c
a
l
 
a
t
t
a
c
k
 
a
g
a
i
n
s
t
p
u
b
l
i
c
 
p
l
a
c
e
s
 
s
u
c
h
 
a
s
 
t
h
e
 
a
i
r
p
o
r
t
 
.
M
a
c
h
i
n
e
 
t
r
a
n
s
l
a
t
i
o
n
:
T
h
e
 
A
m
e
r
i
c
a
n
 
[
?
]
 
i
n
t
e
r
n
a
t
i
o
n
a
l
a
i
r
p
o
r
t
 
a
n
d
 
i
t
s
 
t
h
e
 
o
f
f
i
c
e
 
a
l
l
r
e
c
e
i
v
e
s
 
o
n
e
 
c
a
l
l
s
 
s
e
l
f
 
t
h
e
 
s
a
n
d
A
r
a
b
 
r
i
c
h
 
b
u
s
i
n
e
s
s
 
[
?
]
 
a
n
d
 
s
o
 
o
n
e
l
e
c
t
r
o
n
i
c
 
m
a
i
l
 
,
 
w
h
i
c
h
 
s
e
n
d
s
 
o
u
t
 
;
T
h
e
 
t
h
r
e
a
t
 
w
i
l
l
 
b
e
 
a
b
l
e
 
a
f
t
e
r
 
p
u
b
l
i
c
p
l
a
c
e
 
a
n
d
 
s
o
 
o
n
 
t
h
e
 
a
i
r
p
o
r
t
 
t
o
 
s
t
a
r
t
t
h
e
 
b
i
o
c
h
e
m
i
s
t
r
y
 
a
t
t
a
c
k
 
,
 
[
?
]
 
h
i
g
h
l
y
a
l
e
r
t
s
 
a
f
t
e
r
 
t
h
e
 
m
a
i
n
t
e
n
a
n
c
e
.
BLEU Evaluation Metric
(Papineni et al, ACL-2002)
Basic idea:
Combination of n-gram precisions of varying
size
What percentage of machine n-grams can be
found in the reference translation?
Multiple Reference Translations
N-gram precision example
Candidate 1: 
It is a guide to action which ensures that the military always
obey the commands of the party.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
What percentage of machine n-grams can be found in
the reference translations?  Do unigrams, bigrams and
trigrams.
N-gram precision example
Candidate 1: 
It is a guide to action which ensures that the military
always
 obey 
the commands of the party
.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
Unigrams: 17/18
N-gram precision example
Candidate 1: 
It is a guide to action 
which
 ensures that the military
 
always
 obey 
the commands of the party
.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
Unigrams: 17/18
Bigrams: 10/17
N-gram precision example
Candidate 1: 
It is a guide to action 
which
 ensures that the military
 
always
 obey 
the commands of the party
.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
Unigrams: 17/18
Bigrams: 10/17
Trigrams: 7/16
N-gram precision example 2
Candidate 2: 
It is to ensure the army forever hearing the directions guide
that party commands.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
N-gram precision example 2
Candidate 2: 
It is to
 ensure 
the army forever 
hearing 
the directions guide
that party commands
.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
Unigrams: 12/14
N-gram precision example 2
Candidate 2: 
It is to
 ensure 
the army 
forever
 
hearing the directions
 guide that
 party commands
.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
Unigrams: 12/14
Bigrams: 4/13
N-gram precision example 2
Candidate 2: 
It is to
 ensure the army forever hearing the directions
 guide that party commands.
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
Unigrams: 12/14
Bigrams: 4/13
Trigrams: 1/12
N-gram precision
Candidate 1: 
It is a guide to action which ensures that the
military always obey the commands of the party.
Unigrams: 17/18
Bigrams: 10/17
Trigrams: 7/16
Candidate 2: 
It is to ensure the army forever hearing the
directions guide that party commands.
Unigrams: 12/14
Bigrams: 4/13
Trigrams: 1/12
Any problems/concerns?
N-gram precision example
Candidate 3: the
Candidate 4: It is a
Reference 1: 
It is a guide to action that ensures that the military will
forever heed Party commands. 
Reference 2: 
It is the guiding principle which guarantees the military forces
always being under the command of the Party.
Reference 3: 
It is the practical guide for the army always to heed
directions of the party.
What percentage of machine n-grams can be found in the
reference translations?  Do unigrams, bigrams and trigrams.
R
e
f
e
r
e
n
c
e
 
(
h
u
m
a
n
)
 
t
r
a
n
s
l
a
t
i
o
n
:
T
h
e
 
U
.
S
.
 
i
s
l
a
n
d
 
o
f
 
G
u
a
m
 
i
s
m
a
i
n
t
a
i
n
i
n
g
 
a
 
h
i
g
h
 
s
t
a
t
e
 
o
f
 
a
l
e
r
t
a
f
t
e
r
 
t
h
e
 
G
u
a
m
 
a
i
r
p
o
r
t
 
a
n
d
 
i
t
s
o
f
f
i
c
e
s
 
b
o
t
h
 
r
e
c
e
i
v
e
d
 
a
n
 
e
-
m
a
i
l
f
r
o
m
 
s
o
m
e
o
n
e
 
c
a
l
l
i
n
g
 
h
i
m
s
e
l
f
 
t
h
e
S
a
u
d
i
 
A
r
a
b
i
a
n
 
O
s
a
m
a
 
b
i
n
 
L
a
d
e
n
a
n
d
 
t
h
r
e
a
t
e
n
i
n
g
 
a
b
i
o
l
o
g
i
c
a
l
/
c
h
e
m
i
c
a
l
 
a
t
t
a
c
k
 
a
g
a
i
n
s
t
p
u
b
l
i
c
 
p
l
a
c
e
s
 
s
u
c
h
 
a
s
 
t
h
e
 
a
i
r
p
o
r
t
 
.
M
a
c
h
i
n
e
 
t
r
a
n
s
l
a
t
i
o
n
:
T
h
e
 
A
m
e
r
i
c
a
n
 
[
?
]
 
i
n
t
e
r
n
a
t
i
o
n
a
l
a
i
r
p
o
r
t
 
a
n
d
 
i
t
s
 
t
h
e
 
o
f
f
i
c
e
 
a
l
l
r
e
c
e
i
v
e
s
 
o
n
e
 
c
a
l
l
s
 
s
e
l
f
 
t
h
e
 
s
a
n
d
A
r
a
b
 
r
i
c
h
 
b
u
s
i
n
e
s
s
 
[
?
]
 
a
n
d
 
s
o
 
o
n
e
l
e
c
t
r
o
n
i
c
 
m
a
i
l
 
,
 
w
h
i
c
h
 
s
e
n
d
s
 
o
u
t
 
;
T
h
e
 
t
h
r
e
a
t
 
w
i
l
l
 
b
e
 
a
b
l
e
 
a
f
t
e
r
 
p
u
b
l
i
c
p
l
a
c
e
 
a
n
d
 
s
o
 
o
n
 
t
h
e
 
a
i
r
p
o
r
t
 
t
o
 
s
t
a
r
t
t
h
e
 
b
i
o
c
h
e
m
i
s
t
r
y
 
a
t
t
a
c
k
 
,
 
[
?
]
 
h
i
g
h
l
y
a
l
e
r
t
s
 
a
f
t
e
r
 
t
h
e
 
m
a
i
n
t
e
n
a
n
c
e
.
BLEU Evaluation Metric
(Papineni et al, ACL-2002)
N-gram precision (score is between 0 & 1)
What percentage of machine n-grams can
be found in the reference translation?
Not allowed to use same portion of reference
translation twice (can
t cheat by typing out
the the the the the
)
Brevity penalty
Can
t just type out single word 
the
(precision 1.0!)
*** Amazingly hard to 
game
 the system (i.e., find
a way to change machine output so that BLEU
goes up, but quality doesn
t)
BLEU Tends to Predict Human Judgments
slide from G. Doddington (NIST)
(
v
a
r
i
a
n
t
 
o
f
 
B
L
E
U
)
BLEU: 
Problems?
 
Doesn
t care if an incorrectly translated word is a name
or a preposition
gave it to Albright
 
  
(reference)
gave it at Albright
 
  
(translation #1)
gave it to altar 
  
 
(translation #2)
 
What happens when a program reaches human level
performance in BLEU but the translations are still bad?
maybe sooner than you think …
Slide Note
Embed
Share

These slides provide a detailed explanation of various machine translation algorithms, including IBM Model 1, EM alignment, and training without alignments. Learn about the process of aligning foreign words to English words, calculating probabilities, and refining models to improve translation accuracy. Dive into the world of language translation with insightful visuals and step-by-step explanations.

  • Machine Translation
  • Algorithms
  • IBM Model 1
  • EM Alignment
  • Language Translation

Uploaded on Feb 16, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Machine Translation Concluded David Kauchak CS159 Fall 2020 Some slides adapted from Kevin Knight Philipp Koehn Dan Klein School of Informatics University of Edinburgh USC/Information Sciences Institute USC/Computer Science Department Computer Science Department UC Berkeley

  2. Admin Assignment 5b Assignment 6 available Quiz 3: 11/10

  3. Language translation Hola!

  4. Word models: IBM Model 1 Mary did not slap the green witch NULL p(verde | green) Maria no di una botefada a la bruja verde Each foreign word is aligned to exactly one English word This is the ONLY thing we model! |F| p(f1f2...fF,a1a2...a|F||e1e2...eE)= p(fi|eai) i=1

  5. Training without alignments Initially assume a p(f|e) are equally probable Repeat: Enumerate all possible alignments Calculate how probable the alignments are under the current model (i.e. p(f|e)) Recalculate p(f|e) using counts from all alignments, weighted by how probable they are (Note: theoretical algorithm)

  6. EM alignment E-step Enumerate all possible alignments Calculate how probable the alignments are under the current model (i.e. p(f|e)) M-step Recalculate p(f|e) using counts from all alignments, weighted by how probable they are (Note: theoretical algorithm)

  7. the house the house green house green house la casa la casa casa verde casa verde the house the house green house green house la casa la casa casa verde casa verde p( casa | green) 1/2 p( casa | house) 1/2 p( casa | the) 1/2 p( verde | green) 1/2 p( verde | house) 1/4 p( verde | the) 0 p( la | green ) 0 p( la | house ) 1/4 p( la | the ) 1/2 E-step: Given p(F|E), calculate p(A,F|E)

  8. the house the house green house green house 1/8 1/4 1/4 1/8 la casa la casa casa verde casa verde the house the house green house green house 1/4 1/4 1/8 1/8 la casa la casa casa verde casa verde p( casa | green) 1/2 p( casa | house) 1/2 p( casa | the) 1/2 p( verde | green) 1/2 p( verde | house) 1/4 p( verde | the) 0 p( la | green ) 0 p( la | house ) 1/4 p( la | the ) 1/2 E-step: Given p(F|E), calculate p(A,F|E) Calculate unnormalized counts

  9. the house the house green house green house 1/8 1/4 1/4 1/8 la casa la casa casa verde casa verde the house the house green house green house 1/4 1/4 1/8 1/8 la casa la casa casa verde casa verde sum = (3/4) sum = (3/4) p( casa | green) 1/2 p( casa | house) 1/2 p( casa | the) 1/2 p( verde | green) 1/2 p( verde | house) 1/4 p( verde | the) 0 p( la | green ) 0 p( la | house ) 1/4 p( la | the ) 1/2 ?(?,??|?) ?=1 ?(?,??|?) ? ???,? = 4 E-step: Given p(F|E), calculate p(A,F|E) Normalize by the sum

  10. the house the house green house green house 1/6 1/3 1/3 1/6 la casa la casa casa verde casa verde the house the house green house green house 1/3 1/3 1/6 1/6 la casa la casa casa verde casa verde sum = (3/4) sum = (3/4) p( casa | green) 1/2 p( casa | house) 1/2 p( casa | the) 1/2 p( verde | green) 1/2 p( verde | house) 1/4 p( verde | the) 0 p( la | green ) 0 p( la | house ) 1/4 p( la | the ) 1/2 ?(?,??|?) ?=1 ?(?,??|?) ? ???,? = 4 E-step: Given p(F|E), calculate p(A,F|E) Normalize by the sum

  11. the house the house green house green house 1/6 1/3 1/3 1/6 la casa la casa casa verde casa verde the house the house green house green house 1/3 1/3 1/6 1/6 la casa la casa casa verde casa verde M-step: calculate unnormalized counts for p(f|e) given the alignments p( casa | green) p( casa | house) p( casa | the) p( verde | green) p( verde | house) p( verde | the) p( la | green ) ( la | house ) p( la | the ) c(casa,the) = 1/6+1/3 = 3/6 c(verde,the) = 0 c(la,the) = 1/3+1/3 = 4/6 c(casa,green) = 1/6+1/3 = 3/6 c(verde,green) = 1/3+1/3 = 4/6 c(la, green) = 0 c(casa,house) = 1/3+1/6+ 1/3+1/6 = 6/6 c(verde,house) = 1/6+1/6 = 2/6 c(la,house) = 1/6+1/6 = 2/6

  12. the house the house green house green house 1/6 1/3 1/3 1/6 la casa la casa casa verde casa verde the house the house green house green house 1/3 1/3 1/6 1/6 la casa la casa casa verde casa verde M-step: normalize p( casa | green) 3/7 p( casa | house) 3/5 p( casa | the) 3/7 p( verde | green) 4/7 p( verde | house) 1/5 p( verde | the) 0 p( la | green ) 0 p( la | house ) 1/5 p( la | the ) 4/7 c(casa,the) = 1/6+1/3 = 3/6 c(verde,the) = 0 c(la,the) = 1/3+1/3 = 4/6 c(casa,green) = 1/6+1/3 = 3/6 c(verde,green) = 1/3+1/3 = 4/6 c(la, green) = 0 c(casa,house) = 1/3+1/6+ 1/3+1/6 = 6/6 c(verde,house) = 1/6+1/6 = 2/6 c(la,house) = 1/6+1/6 = 2/6 Then, calculate the probabilities by normalizing the counts

  13. Implementation details For |E| English words and |F| foreign words, how many alignments are there? Repeat: E-step Enumerate all possible alignments Calculate how probable the alignments are under the current model (i.e. p(f|e)) M-step Recalculate p(f|e) using counts from all alignments, weighted by how probable they are

  14. Implementation details Each foreign word can be aligned to any of the English words (or NULL) (|E|+1)|F| Repeat: E-step Enumerate all possible alignments Calculate how probable the alignments are under the current model (i.e. p(f|e)) M-step Recalculate p(f|e) using counts from all alignments, weighted by how probable they are

  15. Thought experiment The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. The sharks await. His wife talks to him. Los tiburones esperan. Su mujer habla con l. p(fi|eai)=count(faligned-toe) p(el | the) = 0.5 p(Los | the) = 0.5 count(e)

  16. If we had the alignments Input: corpus of English/Foreign sentence pairs along with alignment for (E, F) in corpus: for aligned words (e, f) in pair (E,F): count(e,f) += 1 count(e) += 1 for all (e,f) in count: p(f|e) = count(e,f) / count(e)

  17. If we had the alignments Input: corpus of English/Foreign sentence pairs along with alignment for (E, F) in corpus: for e in E: for f in F: if f aligned-to e: count(e,f) += 1 count(e) += 1 for all (e,f) in count: p(f|e) = count(e,f) / count(e)

  18. If we had the alignments Input: corpus of English/Foreign sentence pairs along with alignment for (E, F) in corpus: for aligned words (e, f) in pair (E,F): count(e,f) += 1 count(e) += 1 for (E, F) in corpus for e in E: for f in F: if f aligned-to e: count(e,f) += 1 count(e) += 1 Are these equivalent? for all (e,f) in count: p(f|e) = count(e,f) / count(e)

  19. Thought experiment #2 The old man is happy. He has fished many times. 80 annotators El viejo est feliz porque ha pescado muchos veces. The old man is happy. He has fished many times. 20 annotators El viejo est feliz porque ha pescado muchos veces. Use partial counts: - count(viejo | man) 0.8 - count(viejo | old) 0.2 p(fi|eai)=count(faligned-toe) count(e)

  20. Without the alignments if f aligned-to e: count(e,f) += 1 count(e) += 1 ? ? e : probability that f is aligned to e in this pair count(e,f) += ? ? e count(e) += ? ? e Key: use expected counts (i.e., how likely based on the current model), rather than actual counts

  21. Without alignments ? ? e : probability that f is aligned to e in this pair a b c y z What is ? ? a ? Put another way, of all things that y could align to in this sentence, how likely is it to be a?

  22. Without alignments ? ? e : probability that f is aligned to e in this pair a b c y z Of all things that y could align to, how likely is it to be a: p(y | a) Does that do it? No! p(y | a) is how likely y is to align to a over the whole data set.

  23. Without alignments ? ? e : probability that f is aligned to e in this pair a b c y z Of all things that y could align to, how likely is it to be a: p(y | a) p(y | a) + p(y | b) + p(y | c)

  24. Without the alignments Input: corpus of English/Foreign sentence pairs along with alignment for (E, F) in corpus: for e in E: for f in F: ? ? e = p f e)/ ? ?? ??(?|?) count(e,f) += ? ? e count(e) += ? ? e for all (e,f) in count: p(f|e) = count(e,f) / count(e)

  25. EM: without the alignments Input: corpus of English/Foreign sentence pairs along with alignment for some number of iterations: for (E, F) in corpus: for e in E: for f in F: ? ? e = p f e)/ ? ?? ??(?|?) count(e,f) += ? ? e count(e) += ? ? e for all (e,f) in count: p(f|e) = count(e,f) / count(e)

  26. EM: without the alignments Input: corpus of English/Foreign sentence pairs along with alignment for some number of iterations: for (E, F) in corpus: for e in E: for f in F: ? ? e = p f e)/ ? ?? ??(?|?) count(e,f) += ? ? e count(e) += ? ? e for all (e,f) in count: p(f|e) = count(e,f) / count(e)

  27. EM: without the alignments Input: corpus of English/Foreign sentence pairs along with alignment for some number of iterations: for (E, F) in corpus: for e in E: for f in F: ? ? e = p f e)/ ? ?? ??(?|?) count(e,f) += ? ? e count(e) += ? ? e for all (e,f) in count: p(f|e) = count(e,f) / count(e) Where are the E and M steps?

  28. EM: without the alignments Input: corpus of English/Foreign sentence pairs along with alignment for some number of iterations: for (E, F) in corpus: for e in E: for f in F: ? ? e = p f e)/ ? ?? ??(?|?) count(e,f) += ? ? e count(e) += ? ? e for all (e,f) in count: p(f|e) = count(e,f) / count(e) Calculate how probable the alignments are under the current model (i.e. p(f|e))

  29. EM: without the alignments Input: corpus of English/Foreign sentence pairs along with alignment for some number of iterations: for (E, F) in corpus: for e in E: for f in F: ? ? e = p f e)/ ? ?? ??(?|?) count(e,f) += ? ? e count(e) += ? ? e for all (e,f) in count: p(f|e) = count(e,f) / count(e) Recalculate p(f|e) using counts from all alignments, weighted by how probable they are

  30. NULL Sometimes foreign words don t have a direct correspondence to an English word Adding a NULL word allows for p(f | NULL), i.e. words that appear, but are not associated explicitly with an English word Implementation: add NULL (or some unique string representing NULL) to each of the English sentences, often at the beginning of the sentence p( casa | NULL) 1/3 p( verde | NULL) 1/3 p( la | NULL ) 1/3

  31. Benefits of word-level model Rarely used in practice for modern MT system Mary did not slap the green witch e1 e2 e3 e4 e0 e5 e6 e7 f3 f5 f6f7 f8 f9 f1 f2 f4 Maria no di una botefada a la bruja verde Two key side effects of training a word-level model: Word-level alignment p(f | e): translation dictionary How do I get this?

  32. Word alignment 100 iterations green house p( casa | green) 0.005 casa verde p( verde | green) 0.995 p( la | green ) 0 How should these be aligned? p( casa | house) ~1.0 p( verde | house) ~0.0 the house p( la | house ) ~0.0 la casa p( casa | the) 0.005 p( verde | the) 0 p( la | the ) 0.995

  33. Word alignment 100 iterations green house p( casa | green) 0.005 casa verde p( verde | green) 0.995 p( la | green ) 0 Why? p( casa | house) ~1.0 p( verde | house) ~0.0 the house p( la | house ) ~0.0 la casa p( casa | the) 0.005 p( verde | the) 0 p( la | the ) 0.995

  34. Word-level alignment alignment(E,F)=argAmaxp(A,F|E) Which for IBM model 1 is: |F| alignment(E,F)=argAmax p(fi|eai) i=1 Given a model (i.e. trained p(f|e)), how do we find this? Align each foreign word (f in F) to the English word (e in E) with highest p(f|e) ai=argj:1-|E|maxp(fi|ej)

  35. Word-alignment Evaluation The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. How good of an alignment is this? How can we quantify this?

  36. Word-alignment Evaluation System: The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. Human The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. How can we quantify this?

  37. Word-alignment Evaluation System: The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. Human The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. Precision and recall!

  38. Word-alignment Evaluation System: The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. Human The old man is happy. He has fished many times. El viejo est feliz porque ha pescado muchos veces. 6 7 6 10 Precision: Recall:

  39. Problems for Statistical MT Preprocessing Language modeling Translation modeling Decoding Parameter optimization Evaluation

  40. What kind of Translation Model? Mary did not slap the green witch Word-level models Phrasal models Syntactic models Semantic models Maria no di una botefada a la bruja verde

  41. Phrasal translation model The models define probabilities over inputs p(f |e) Morgen fliege ich nach Kanada zur Konferenz 1. Sentence is divided into phrases

  42. Phrasal translation model The models define probabilities over inputs p(f |e) Morgen fliege ich nach Kanada zur Konferenz Tomorrow will fly I In Canada to the conference 1. Sentence is divided into phrases 2. Phrases are translated (avoids a lot of weirdness from word-level model)

  43. Phrasal translation model The models define probabilities over inputs p(f |e) Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference In Canada 1. Sentence is divided into phrases 2. Phrase are translated (avoids a lot of weirdness from word-level model) 3. Phrases are reordered

  44. Phrase table natuerlich Translation Probability of course 0.5 naturally 0.3 of course , 0.15 , of course , 0.05

  45. Phrase table den Vorschlag Translation Probability the proposal 0.6227 s proposal 0.1068 a proposal 0.0341 the idea 0.0250 this proposal 0.0227 proposal 0.0205 of the proposal 0.0159 the proposals 0.0159 the suggestions 0.0114

  46. Phrasal translation model The models define probabilities over inputs p(f |e) Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference In Canada Advantages?

  47. Advantages of Phrase-Based Many-to-many mappings can handle non- compositional phrases Easy to understand Local context is very useful for disambiguating Interest rate Interest in The more data, the longer the learned phrases Sometimes whole sentences!

  48. Syntax-based models S Benefits? VP NP VP PP NP NP NP NP NP NNS VBG NNP CC NNP PUNC DT CD VBP NNS IN . These 7 people include astronauts coming from France and Russia

  49. Syntax-based models Benefits Can use syntax to motivate word/phrase movement Could ensure grammaticality Two main types: p(foreign string | English parse tree) p(foreign parse tree | English parse tree)

  50. Tree to string rule S , x0:NP x1:VP ADVP -> x0:NP * x1:VP , RB therefore

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#