Probabilistic Database Model

 
A Temporal
Probabilistic Database Model
for Information Extraction
 
Maximilian Dylla
Max Planck Institute for Informatics
Saarbr¨ucken, Germany
 
 
Iris Miliaraki
Max Planck Institute for Informatics
Saarbr¨ucken, Germany
 
Martin Theobald
University of Antwerp
Antwerp, Belgium
 
Motivation
 
For cleaning uncertain temporal facts obtained from information extraction
methods
To develop a query engine which is capable of scaling to very large temporal
knowledge bases with nearly interactive query response times over millions of
uncertain facts and hundreds of thousands of grounded rules
To develop a temporal-probabilistic database model that supports both time and
probability as first class citizens
 
Problem Statement
 
As information extraction techniques inherently deliver uncertain data, they have
often been considered as one of the main motivating applications for
PDBs(Probabilistic DataBases)
TDBs(Temporal DataBases) have been an active field of research for many
years, but so far there exists no TDB system that also supports probabilistic
data or vice versa
Not aware of any unified temporal and probabilistic database approach that
natively supports the kinds of data and constraints they aim to apply to their
setting
 
Preliminaries
 
Temporal Database:
It can be defined as a triple (S, F, Ω
T
 ) consisting of a database schema S, a finite set of base
tuples F, and the time domain Ω
T
B
a
s
e
 
t
u
p
l
e
s
 
F
(
s
e
t
 
o
f
 
b
a
s
e
 
f
a
c
t
s
)
 
c
a
p
t
u
r
e
s
 
i
n
 
a
l
l
 
t
h
e
 
i
n
p
u
t
 
i
n
s
t
a
n
c
e
s
 
R
i
T
 
o
v
e
r
 
S
Time Domain(Ω
T
) is the finite sequence of time points
Database Schema S is the collection of all relation schemas (R
1
T
,R
2
T
 ,..........,R
N
T
)
A temporal relation R
T
 is represented as R
T
 (A
1
, . . . ,A
m
 )
[Tb, Te)
,
where R
T
 denotes the relation name, A
1
, . . . ,A
m
 is a finite set of attributes of R
T
 with
domains Ω
i
, and T
b
, T
e
 are attributes denoting the begin and end time points of the
time intervals that are attached to each tuple
temporal relations employs tuple timestamping, i.e., each tuple is annotated by a single
time interval that specifies its valid-time
 
Probabilistic Database:
Informally, a probabilistic database can be defined as a discrete probability distribution over a
finite number of database instances (the “possible worlds”)
Formally, we thus define a tuple-independent probabilistic database DB
p
 = (S,F, p) as a triple
consisting of
a database schema S,
 a finite set of base facts F, and
a probability measure p : F → (0, 1], which assigns a probability value p(f) to each uncertain
base fact f 
 F.
A probabilistic relation R
p
 is represented as R
p
(A
1
, . . . ,A
m
, p), where
R
p
 denotes the relation name, A1, . . . ,Am denote a finite set of attributes of R
p
 with
domains Ω
i
,
p
 
i
s
 
a
n
 
a
t
t
r
i
b
u
t
e
 
w
h
i
c
h
 
d
e
n
o
t
e
s
 
t
h
e
 
p
r
o
b
a
b
i
l
i
t
y
 
v
a
l
u
e
 
t
h
a
t
 
i
s
 
a
t
t
a
c
h
e
d
 
t
o
 
e
a
c
h
 
f
a
c
t
 
i
n
 
a
r
e
l
a
t
i
o
n
 
i
n
s
t
a
n
c
e
 
R
p
 
o
f
 
R
p
p(f) denotes the confidence in the existence of the fact in the database
 
Conditioning a PDB
TDB uses time interval and PDB uses confidence for the validity of tuples
Now, the third concept is 
CONDITIONING
 which allows for additionally incorporating consistency
constraints into a possible-worlds-based representation formalism
Formally, we can compute the marginal probability of any Boolean formula ϕ over variables in F
as the sum of the probabilities of all the possible worlds W 
 F that entail ϕ, which is denoted
as W |= ϕ.
P(ϕ) := 
Σ
W|=ϕ,W
F
P(W)
Conditioning a formula ϕ on another formula ψ, which also ranges over variables in F, then simply
denotes the process of computing the following conditional probability:
P(ϕ | ψ) :=P(ϕ 
 ψ)/P(ψ)
Intuitively, 
conditioning removes the possible worlds from a PDB, in which the constraints are not
satisfied, and thus re-weighs the remaining worlds to again form a probability distribution
 
TEMPORAL-PROBABILISTIC DATABASE MODEL
 
This model is the combination and extension of the TDB and PDB, into a unified
temporal-probabilistic  database model, in order to facilitate both rule-based
and probability inference over uncertain temporal knowledge
Extension employs two key concepts:
a.
Temporal deduction rules
:
It allows to represent all of the core operations known from relational algebra
over these temporal and probabilistic relations;
b.
Temporal consistency constraints:
It allows to constrain the possible worlds at which query answers are considered
to be valid.
 
Temporal Deduction Rules:
Temporal deduction rule over a database schema S is a first-order logical rule of the form
R
T
0
(X̅
0
)
[Tb0,Te0 )
 ← (
i =1,...,n 
R
T
i
 ( X̅
i
)
[Tbi,Tei)
j=1,...,m 
¬R
T
j
 ( X̅
j
)
[Tbj,Tej)
 
 Φ(X̅))
Where:
1.
R
T
0
 denotes the 
head literal intensional temporal relation
 in S, while 
R
T
i
 , 
R
T
j
refer to extensional or intensional temporal relations in S;
2.
n ≥ 1, m ≥ 0, thus requiring at least one positive relational literal;
3.
Φ( X̅) is a 
conjunction of literals
 over the arithmetic predicates =, ≠ and ≤
T
 , whose
arguments are in X̅ ;
4.
0
, X̅
i
, X̅
j
 denote 
tuples of non-temporal variables and constants
, such that Var(X̅
0
)
 
i
 Var( X̅
i
) and Var(X̅
j
)
 
i
 Var( X̅
i
);
5.
X̅ denotes a tuple of
 both temporal and non-temporal variables and constants
,
such that Var( X̅ ) ⊆ ∪
i
 Var( X̅
i
)∪ ∪i{T
bi
 , T
ei
};
6.
T
b0
 , T
e0
 are 
head's temporal arguments,
 such that T
b0
,T
e0 
∈ ∪
i
{T
bi
 , T
ei
} ∪ {t
min
, t
max
}.
 
 
Example 1
 
Rule 1: 
 
AreMarried(X, Y )
[Tb1,tmax )
←(Wedding(X, Y )
[Tb1,Te1 ) 
 ¬ Divorce(X, Y )
[Tb2,Te2 )
)
-
couple stays married from the begin time point of their wedding until to the last possible time point we consider (tmax) unless there is a
Divorce fact at any other time interval.
Rule 2:
 
AreMarried(X, Y )
[Tb1,Te2 )
←Wedding(X, Y)
[Tb1,Te1 ) 
 Divorce(X, Y )
[Tb2,Te2 )
 
 (T
e1
T
T
b2
))
-
couple stays married from the begin time point of their wedding till the end time point of their divorce
Rule 3: 
 
¬ (BornIn(X, Y )
[Tb1,Te1)
 
 BornIn(X,Z)
[Tb2,Te2)
 
 Y ≠ Z)
-
This constraint tells that if two facts are mutually exclusive(such as two birth facts of same person), at least one of the two BornIn facts has to
be set to false in all the possible instances
Rule 4: 
 
¬ (BornIn(X, Y )
[Tb1,Te1) 
 AreMarried(X,Z)
[Tb2,Te2 )
 T
b2
T
T
e1
)
-
This consistency constraint uses a temporal condition to express that a marriage should begin after the respective person was born
 
 
Grounding:
Definition 2. 
Let Ψ be a conjunctive rst-order formula and F a set of facts. Then, the grounding function G(Ψ,F)
instantiates Ψ over F into a set of propositional lineage formulas {ψ0, . . . , ψn}.
 
Example 2.
 Applying Grounding(G) to the body of Rule (2) and the set of facts {f3, f4, f5} yields the lineage
formulas {f3
f5, f4
f5}. By further instantiating Rule (2)'s
head with the arguments of f4
f5, we obtain the new fact
 f′= AreMarried(DeNiro,Abbott)
[1976-07-29,1988-12-01)
. Regarding
 the lineage, we have ϕ(f′) = f4
f5. All facts which can be
 deduced from Rules (1) and (2) are shown together with
 their lineages and time intervals in the middle part of Fig. 1.
 
 
 
Deduplicating Facts
Definition 4. L
et a temporal relation RT , non-temporal constants ¯a, a time point t 
 ΩT , and a set of facts F be given.
Then, L is dened as the set of lineages of fact RT (¯a) that are valid at time point t:
L(RT, a̅, t,F) := {ϕ(f) | f = RT ( a̅)
[tb,te)
 
 F, t
b
 ≤ t < t
e
}
We create duplicate free facts f′= RT (¯a)[tb,te), such that for any pair of time points t0, t1 
 [tb, te) it holds that:
 
L(R
T
, a̅ t
0
,F) = L(R
T
, a̅, t
1
,F)
Furthermore, we dene the new facts' lineage to be:
ϕ(f′) :=
ψi
L(RT,a,tb,F)
ψi 
  
Example 3. Applying Definition 4 to the facts in the middle of Fig. 1 yields the facts shown at the top of the figure. For
instance, if we inspect the time points 1976- 07-28 and 1976-07-29, we notice that {f
3
f
5
, f
3
¬f
5
} ≠ {f
3
f
5
, f
3
¬f
5
, f
4
f
5
, f
4
¬f
5
},
so two differing facts f
6
 and f
7
 have to be kept in the relation. Thus, the resulting duplicate-free facts are the following.
Id 
 
Fact 
     
T 
   
  
p
f6 
 
AreMarried(DeNiro,Abbott) 
 
[1936-11-01, 1976-07-29) 
  
0.3
f7 
 
AreMarried(DeNiro,Abbott) 
 
[1976-07-29, 1988-12-01) 
  
0.79
f8 
 
AreMarried(DeNiro,Abbott) 
 
[1988-12-01, tmax ) 
  
0.158
Following Eq. (10), their lineages are as follows.
ϕ(f
6
) = (f
3
f
5
) 
 (f3 
 ¬f5)
ϕ(f
7
) = (f
3
f
5
) 
 (f
3
¬f
5
) 
 (f
4
f
5
) 
 (f
4
 
 ¬f
5
)
ϕ(f
8
) = (f
3
¬f
5
) 
 (f
4
¬f
5
)
 
Temporal Consistency Constraints:
A temporal consistency constraint over a database schema S is a rst-order logical rule of the form
ㄱ(
i 
R
T
i
 ㄱ(X̅
i
)
[Tbi,Tei)
 
 Φ(X̅) )
 
where:
i.
R
T
i
 denote extensional or intensional temporal relations in S;
ii.
Φ( X̅) is a conjunction of literals relating to the arithmetic predicates =, ≠, and ≤
T
 with
arguments in X̅ ;
iii.
The X̅
i
 denote tuples of non-temporal variables and constants, and T
bi
 , T
ei
 are temporal
variables and constants, such that Var( X̅) 
 
i
(Var( X̅
i
) 
 {T
bi
 , T
ei
}
Rule 3: 
 
¬ (BornIn(X, Y )
[Tb1,Te1)
 
 BornIn(X,Z)
[Tb2,Te2)
 
 Y ≠ Z)
 
-
This constraint tells that if two facts are mutually exclusive(such as two birth facts of same person), at least one of the two BornIn facts has to
be set to false in all the possible instances
 
Rule 4: 
 
¬ (BornIn(X, Y )
[Tb1,Te1) 
 AreMarried(X,Z)
[Tb2,Te2 )
 T
b2
T
T
e1
)
 
-
This consistency constraint uses a temporal condition to express that a marriage should begin after the respective person was born
 
 
Conditioning:
Let C = {C
0
, . . . , C
n
} be a set of temporal consistency constraints, and let F
+
 be a set containing both
base facts F and facts deducible from a set of temporal deduction rules D. Then, we dene C to be
the conjunction of all grounded constraints over F
+
.
C := 
ψi
G(Ci,F+),Ci
C
ψi
 
The confidence of a fact f 
 F
+ 
with lineage ϕ(f) and with respect to the grounded constraints C is then
calculated as:
 
  
P(ϕ(f) | C) := P(ϕ(f) 
 C) / P(C)
 
Example 4:
 
Grounding the Constraint (3) against the facts of Example 1 yields ¬(f1 
 f2), where
the arithmetic literal is omitted from the grounded formula, since it evaluates to true. Altogether,
grounding Constraints (3) and (4) results in
 
C = ¬(f1 
 f2) 
 ¬(f1 
 ϕ(f6)) 
 ¬(f2 
 ϕ(f6)) 
 ¬(f2 
 ϕ(f7))
,
where all arithmetic predicates have already been evaluated correspondingly.
 
Temporal-Probabilistic Database
:
Definition: A temporal-probabilistic database DBTp =(S,F,D, C,ΩT , p) is a six-tuple consisting of a
database schema S, a set of base facts F, a set of temporal deduction rules D, a set of temporal
consistency constraints C, a time domain ΩT , and a probability measure p.
Queries over TPDP
Conjunction of literals
Similar conditions as for the deduction rules regarding safeness hold.
That is, every non-temporal variable that occurs in a negated relational literal or in an
arithmetic literal must also occur in at least one of the positive relational literals.
Eg: ¬BornIn(X, Y )
[Tb1,Te1) 
should have a positive relational literal BornIn(X,Y)
[Tb1,Te1)
Every temporal variable that occurs in an arithmetic literal must also occur in at least one of the
relational literals
¬ (BornIn(X, Y )
[Tb1,Te1) 
 AreMarried(X,Z)
[Tb2,Te2 )
 T
b2
T
T
e1
)
 
Algorithms
 
The basic concept of the algorithm for the proposed TPDB is:
1.
Ground the deduction rules and constraints
2.
Deduplicate the facts with overlapping time intervals
3.
Lineage Decompositions to reduce shannon’s expansion
4.
Final Confidence Computation
1.
Grounding:
a.
The process of grounding of deduction rules and constraints is divided into two phase:
i.
First phase
:
 
Ground all the deduction rules in a bottom-up manner while tracing the
lineage of derived facts
ii.
Second phase
: instantiates constraints against all facts (i.e., both the base and the
deduced facts).
Example 6. 
We initialize Algorithm 1 using F := {f
1
 ,. . . , f
5
 }, D comprising Rules (1) and (2), and C
containing Constraints (3) and (4) of Example 1.
First, we initialize Plan with [ AreMarried→ {(1), (2)}], and we set F
+
 = {f
1
 , . . . , f
5
 }. The loop in Line 3
performs only one iteration where R
T
 = AreMarried .
Assuming that the loop in Line 5 selects (2) first, we instantiate the head to two new facts AreMarried
(DeNiro,Abbott) 
[1936−11−01,1988−12−01)
 and Marriage (DeNiro, Abbott)
[1976−07−29,1988−12−01)
 with lineages
f
3
f
5
 and f
4
f
5
 ,respectively. Their lineage and that of the two facts deduced by Rule (1) are shown in
Example 3. Fig. 1’s top part corresponds to the output of Deduplicate (Line 8).
When we reach Line 9, F
+
 = {f
1
 , . . ,f
8
}. Next, the constraints are grounded as shown already in Example
4
 
2. Deduplicating Facts:
Example 7. We apply Algorithm 2 to the facts deduced by Rules (1) and (2),
hence constructing the lineage for the facts presented in Example 3.
First, we initialize Limits as {1936-11-01, 1976-07-29, 1988-12-01, t max }, set Begin to
[1936-11-01 → {f
3
f
5
 , f
3
¬f
5
 }, 1976-07-29 → {f
4
f
5
 , f
4
¬f
5
 }], and assign [1988-12-01 →
{f
3
f
5
 , f
4
f
5
 }, t
max
 →{f
3
¬f
5
 , f
4
¬f
5
 }] to End.
In the first iteration of the loop, Active is empty. In the next iteration, we add f 6 =
AreMarried (DeNiro, Abbott)
 [1936-11-01,1976-07-29)
 to Result and set its lineage to φ(f 6 ) =
(f
3
f
5
 ) 
 (f
3
¬f
5
 ).
The last two iterations produce f 7 , f 8 as shown in Example 3
 
3. Confidence Computation
Example 8.
 In Example 3 the lineage of f 6 was defined as (f 3 
 f 5 ) 
 (f 3 
 ¬f 5 ). Now, let us compute
its confidence:
P ((f
3
f
5
) 
 (f
3
¬f
5
 ))
= P (f
3
f
5
 ) + P(f
3
¬f
5
 )
= P(f
3
 ) · P(f
5
 ) + P(f
3
) · (1 − P(f
5
 ))
= 0.3·0.8 + 0.3·0.2 = 0.3
 
4. Lineage Decomposition
Definition: Let φ ≡ φ 0 op . . . op φ n be a lineage formula with operator op 
 {
, 
}, and let φ 0 to φ n be
sub-formulas of φ. Then, we define the function S to return the number of necessary Shannon expansions
as follows:
S(φ) = |{f | 
i≠j
 : f 
 F (φ
i
 ), f 
 F (φ
j
 )}|
Example 10. 
As an example, we apply S to φ(f 7 ) (see Example 3). That is op = 
 and S((f 3 
 f 5 ) 
 (f 3
 ¬f 5 ) 
 (f 4 
 f 5 ) 
 (f 4 
 ¬f 5 )) = |{f 3 , f 4 , f 5 }| = 3. Thus, a naive computation of P (φ(f 7 )) takes 2
3
iterations.
Example 11.
 We decompose Example 3’s φ(f 7 ) = (f
3
f
5
)
(f
3
¬f
5
) 
 (f
4
 
f
5
) 
 (f
4
 
¬f
5
 ) by Algorithm 3 with
θ = 0. As discussed in Example 10, S(φ(f 7 )) = 3, so we set φ ′ = φ(f 7 ). In Line 4, we receive m = 0
because f 5 occurs in all lineage subformulas. Hence, in Line 9, we choose f 5 and rewrite φ(f 7 ) to (f 5 
(f 3 
 f 4 )) 
 (¬f 5 
 (f 3 
 f 4 )) in Line 10. Now, all Shannon expansions are gone, yielding P (φ(f 7 )) =
(0.8 · (1 − (1 − 0.3) · (1 − 0.7))) + (0.2 · (1 − (1 −0.3) · (1 − 0.7))) = 0.8 which can be computed via Eq. (13)
 
Experimental Results
 
 
Conclusion
 
They developed a temporal-probabilistic database model that supports both
time and probability as first class citizens
They introduced an expressive class of temporal deduction rules and temporal
consistency constraints allowing for a closed and complete representation
formalism for temporal uncertain data, and analyzed the properties of their
data model from a theoretical perspective
Also, we proposed efficient algorithms, which we evaluated on various
information-extraction tasks
As future work
,
They aim to investigate the support for periodical temporal data and background knowledge (e.g.,
presidential elections)
And to extend their data model to non-independent (i.e., correlated) base facts
Slide Note
Embed
Share

This research explores the development of a temporal-probabilistic database model to handle uncertain temporal facts obtained from information extraction methods. Motivated by the need for scalable query engines and a lack of unified approaches supporting both time and probability aspects, the study addresses challenges in integrating probabilistic and temporal data. Preliminary concepts such as temporal databases and probabilistic databases are discussed, highlighting the essential components and representations involved.

  • Database Model
  • Information Extraction
  • Temporal Data
  • Probabilistic Database
  • Scalable Query Engine

Uploaded on Feb 25, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. A Temporal Probabilistic Database Model for Information Extraction Martin Theobald University of Antwerp Antwerp, Belgium Maximilian Dylla Iris Miliaraki Max Planck Institute for Informatics Saarbr ucken, Germany Max Planck Institute for Informatics Saarbr ucken, Germany

  2. Motivation For cleaning uncertain temporal facts obtained from information extraction methods To develop a query engine which is capable of scaling to very large temporal knowledge bases with nearly interactive query response times over millions of uncertain facts and hundreds of thousands of grounded rules To develop a temporal-probabilistic database model that supports both time and probability as first class citizens

  3. Problem Statement As information extraction techniques inherently deliver uncertain data, they have often been considered as one of the main motivating applications for PDBs(Probabilistic DataBases) TDBs(Temporal DataBases) have been an active field of research for many years, but so far there exists no TDB system that also supports probabilistic data or vice versa Not aware of any unified temporal and probabilistic database approach that natively supports the kinds of data and constraints they aim to apply to their setting

  4. Preliminaries Temporal Database: It can be defined as a triple (S, F, T) consisting of a database schema S, a finite set of base tuples F, and the time domain T Base tuples F(set of base facts) captures in all the input instances RiTover S Time Domain( T) is the finite sequence of time points Database Schema S is the collection of all relation schemas (R1T,R2T,..........,RNT) A temporal relation RTis represented as RT(A1, . . . ,Am)[Tb, Te), where RTdenotes the relation name, A1, . . . ,Amis a finite set of attributes of RTwith domains i, and Tb, Teare attributes denoting the begin and end time points of the time intervals that are attached to each tuple

  5. Probabilistic Database: Informally, a probabilistic database can be defined as a discrete probability distribution over a finite number of database instances (the possible worlds ) Formally, we thus define a tuple-independent probabilistic database DBp= (S,F, p) as a triple consisting of a database schema S, a finite set of base facts F, and a probability measure p : F (0, 1], which assigns a probability value p(f) to each uncertain base fact f F. A probabilistic relation Rpis represented as Rp(A1, . . . ,Am, p), where Rpdenotes the relation name, A1, . . . ,Am denote a finite set of attributes of Rpwith domains i, p is an attribute which denotes the probability value that is attached to each fact in a

  6. Conditioning a PDB TDB uses time interval and PDB uses confidence for the validity of tuples Now, the third concept is CONDITIONING which allows for additionally incorporating consistency constraints into a possible-worlds-based representation formalism Formally, we can compute the marginal probability of any Boolean formula over variables in F as the sum of the probabilities of all the possible worlds W F that entail , which is denoted as W |= . P( ) := W|= ,W FP(W) Conditioning a formula on another formula , which also ranges over variables in F, then simply denotes the process of computing the following conditional probability: P( | ) :=P( )/P( ) Intuitively, conditioning removes the possible worlds from a PDB, in which the constraints are not satisfied, and thus re-weighs the remaining worlds to again form a probability distribution

  7. TEMPORAL-PROBABILISTIC DATABASE MODEL This model is the combination and extension of the TDB and PDB, into a unified temporal-probabilistic database model, in order to facilitate both rule-based and probability inference over uncertain temporal knowledge Extension employs two key concepts: a. Temporal deduction rules: It allows to represent all of the core operations known from relational algebra over these temporal and probabilistic relations; b. Temporal consistency constraints: It allows to constrain the possible worlds at which query answers are considered to be valid.

  8. Temporal Deduction Rules: Temporal deduction rule over a database schema S is a first-order logical rule of the form RT0(X 0)[Tb0,Te0 ) ( i =1,...,n RTi( X i)[Tbi,Tei) j=1,...,m RTj( X j)[Tbj,Tej) (X )) Where: 1.RT0denotes the head literal intensional temporal relation in S, while RTi, RTj refer to extensional or intensional temporal relations in S; 2. n 1, m 0, thus requiring at least one positive relational literal; 3. ( X ) is a conjunction of literals over the arithmetic predicates =, and T, whose arguments are in X ; 4. X 0, X i, X jdenote tuples of non-temporal variables and constants, such that Var(X 0) iVar( X i) and Var(X j) iVar( X i); 5. X denotes a tuple of both temporal and non-temporal variables and constants,

  9. Example 1 Rule 1: AreMarried(X, Y )[Tb1,tmax ) (Wedding(X, Y )[Tb1,Te1 ) Divorce(X, Y )[Tb2,Te2 )) - couple stays married from the begin time point of their wedding until to the last possible time point we consider (tmax) unless there is a Divorce fact at any other time interval. Rule 2: AreMarried(X, Y )[Tb1,Te2 ) Wedding(X, Y)[Tb1,Te1 ) Divorce(X, Y )[Tb2,Te2 ) (Te1 TTb2)) - couple stays married from the begin time point of their wedding till the end time point of their divorce Rule 3: (BornIn(X, Y )[Tb1,Te1) BornIn(X,Z)[Tb2,Te2) Y Z) - This constraint tells that if two facts are mutually exclusive(such as two birth facts of same person), at least one of the two BornIn facts has to be set to false in all the possible instances Rule 4: (BornIn(X, Y )[Tb1,Te1) AreMarried(X,Z)[Tb2,Te2 ) Tb2 TTe1) - This consistency constraint uses a temporal condition to express that a marriage should begin after the respective person was born

  10. Grounding: Definition 2. Let be a conjunctive rst-order formula and F a set of facts. Then, the grounding function G( ,F) instantiates over F into a set of propositional lineage formulas { 0, . . . , n}. Example 2. Applying Grounding(G) to the body of Rule (2) and the set of facts {f3, f4, f5} yields the lineage formulas {f3 f5, f4 f5}. By further instantiating Rule (2)'s head with the arguments of f4 f5, we obtain the new fact f = AreMarried(DeNiro,Abbott)[1976-07-29,1988-12-01). Regarding the lineage, we have (f ) = f4 f5. All facts which can be deduced from Rules (1) and (2) are shown together with their lineages and time intervals in the middle part of Fig. 1.

  11. Deduplicating Facts Definition 4. Let a temporal relation RT , non-temporal constants a, a time point t T , and a set of facts F be given. Then, L is dened as the set of lineages of fact RT ( a) that are valid at time point t: L(RT, a , t,F) := { (f) | f = RT ( a )[tb,te) F, tb t < te} We create duplicate free facts f = RT ( a)[tb,te), such that for any pair of time points t0, t1 [tb, te) it holds that: L(RT, a t0,F) = L(RT, a , t1,F) Furthermore, we dene the new facts' lineage to be: (f ) := i L(RT,a,tb,F) i Example 3. Applying Definition 4 to the facts in the middle of Fig. 1 yields the facts shown at the top of the figure. For instance, if we inspect the time points 1976- 07-28 and 1976-07-29, we notice that {f3 f5, f3 f5} {f3 f5, f3 f5, f4 f5, f4 f5}, so two differing facts f6and f7have to be kept in the relation. Thus, the resulting duplicate-free facts are the following. Id Fact p f6 AreMarried(DeNiro,Abbott) [1936-11-01, 1976-07-29) f7 AreMarried(DeNiro,Abbott) [1976-07-29, 1988-12-01) f8 AreMarried(DeNiro,Abbott) [1988-12-01, tmax ) T 0.3 0.79 0.158 Following Eq. (10), their lineages are as follows. (f6) = (f3 f5) (f3 f5) (f7) = (f3 f5) (f3 f5) (f4 f5) (f4 f5) (f8) = (f3 f5) (f4 f5)

  12. Temporal Consistency Constraints: A temporal consistency constraint over a database schema S is a rst-order logical rule of the form ( i RTi (X i)[Tbi,Tei) (X ) ) where: i. RTidenote extensional or intensional temporal relations in S; ii. ( X ) is a conjunction of literals relating to the arithmetic predicates =, , and Twith arguments in X ; iii. The X idenote tuples of non-temporal variables and constants, and Tbi, Teiare temporal variables and constants, such that Var( X ) i(Var( X i) {Tbi, Tei} Rule 3: (BornIn(X, Y )[Tb1,Te1) BornIn(X,Z)[Tb2,Te2) Y Z) - This constraint tells that if two facts are mutually exclusive(such as two birth facts of same person), at least one of the two BornIn facts has to be set to false in all the possible instances Rule 4: (BornIn(X, Y )[Tb1,Te1) AreMarried(X,Z)[Tb2,Te2 ) Tb2 TTe1) - This consistency constraint uses a temporal condition to express that a marriage should begin after the respective person was born

  13. Conditioning: Let C = {C0, . . . , Cn} be a set of temporal consistency constraints, and let F+be a set containing both base facts F and facts deducible from a set of temporal deduction rules D. Then, we dene C to be the conjunction of all grounded constraints over F+. C := i G(Ci,F+),Ci C i The confidence of a fact f F+ with lineage (f) and with respect to the grounded constraints C is then calculated as: P( (f) | C) := P( (f) C) / P(C) Example 4: Grounding the Constraint (3) against the facts of Example 1 yields (f1 f2), where the arithmetic literal is omitted from the grounded formula, since it evaluates to true. Altogether, grounding Constraints (3) and (4) results in C = (f1 f2) (f1 (f6)) (f2 (f6)) (f2 (f7)), where all arithmetic predicates have already been evaluated correspondingly.

  14. Temporal-Probabilistic Database: Definition: A temporal-probabilistic database DBTp =(S,F,D, C, T , p) is a six-tuple consisting of a database schema S, a set of base facts F, a set of temporal deduction rules D, a set of temporal consistency constraints C, a time domain T , and a probability measure p. Queries over TPDP Conjunction of literals Similar conditions as for the deduction rules regarding safeness hold. That is, every non-temporal variable that occurs in a negated relational literal or in an arithmetic literal must also occur in at least one of the positive relational literals. Eg: BornIn(X, Y )[Tb1,Te1) should have a positive relational literal BornIn(X,Y)[Tb1,Te1) Every temporal variable that occurs in an arithmetic literal must also occur in at least one of the relational literals (BornIn(X, Y ) AreMarried(X,Z) T TT )

  15. Algorithms The basic concept of the algorithm for the proposed TPDB is: 1.Ground the deduction rules and constraints 2.Deduplicate the facts with overlapping time intervals 3.Lineage Decompositions to reduce shannon s expansion 4.Final Confidence Computation

  16. 1. Grounding: a. The process of grounding of deduction rules and constraints is divided into two phase: i. First phase: lineage of derived facts Ground all the deduction rules in a bottom-up manner while tracing the ii. Second phase: instantiates constraints against all facts (i.e., both the base and the deduced facts). Example 6. We initialize Algorithm 1 using F := {f1,. . . , f5}, D comprising Rules (1) and (2), and C containing Constraints (3) and (4) of Example 1. First, we initialize Plan with [ AreMarried {(1), (2)}], and we set F+= {f1, . . . , f5}. The loop in Line 3 performs only one iteration where RT= AreMarried . Assuming that the loop in Line 5 selects (2) first, we instantiate the head to two new facts AreMarried (DeNiro,Abbott) [1936 11 01,1988 12 01)and Marriage (DeNiro, Abbott)[1976 07 29,1988 12 01)with lineages f3 f5and f4 f5,respectively. Their lineage and that of the two facts deduced by Rule (1) are shown in Example 3. Fig. 1 s top part corresponds to the output of Deduplicate (Line 8). When we reach Line 9, F+= {f , . . ,f }. Next, the constraints are grounded as shown already in Example

  17. 2. Deduplicating Facts: Example 7. We apply Algorithm 2 to the facts deduced by Rules (1) and (2), hence constructing the lineage for the facts presented in Example 3. First, we initialize Limits as {1936-11-01, 1976-07-29, 1988-12-01, t max }, set Begin to [1936-11-01 {f3 f5, f3 f5}, 1976-07-29 {f4 f5, f4 f5}], and assign [1988-12-01 {f3 f5, f4 f5}, tmax {f3 f5, f4 f5}] to End. In the first iteration of the loop, Active is empty. In the next iteration, we add f 6 = AreMarried (DeNiro, Abbott)[1936-11-01,1976-07-29)to Result and set its lineage to (f 6 ) = (f3 f5) (f3 f5). The last two iterations produce f 7 , f 8 as shown in Example 3

  18. 3. Confidence Computation Example 8. In Example 3 the lineage of f 6 was defined as (f 3 f 5 ) (f 3 f 5 ). Now, let us compute its confidence: P ((f3 f5) (f3 f5)) = P (f3 f5) + P(f3 f5) = P(f3) P(f5) + P(f3) (1 P(f5)) = 0.3 0.8 + 0.3 0.2 = 0.3

  19. 4. Lineage Decomposition Definition: Let 0 op . . . op n be a lineage formula with operator op { , }, and let 0 to n be sub-formulas of . Then, we define the function S to return the number of necessary Shannon expansions as follows: S( ) = |{f | i j: f F ( i), f F ( j)}| Example 10. As an example, we apply S to (f 7 ) (see Example 3). That is op = and S((f 3 f 5 ) (f 3 f 5 ) (f 4 f 5 ) (f 4 f 5 )) = |{f 3 , f 4 , f 5 }| = 3. Thus, a naive computation of P ( (f 7 )) takes 23 iterations. Example 11. We decompose Example 3 s (f 7 ) = (f3 f5) (f3 f5) (f4 f5) (f4 f5) by Algorithm 3 with = 0. As discussed in Example 10, S( (f 7 )) = 3, so we set = (f 7 ). In Line 4, we receive m = 0 because f 5 occurs in all lineage subformulas. Hence, in Line 9, we choose f 5 and rewrite (f 7 ) to (f 5 (f 3 f 4 )) ( f 5 (f 3 f 4 )) in Line 10. Now, all Shannon expansions are gone, yielding P ( (f 7 )) = (0.8 (1 (1 0.3) (1 0.7))) + (0.2 (1 (1 0.3) (1 0.7))) = 0.8 which can be computed via Eq. (13)

  20. Experimental Results

  21. Conclusion They developed a temporal-probabilistic database model that supports both time and probability as first class citizens They introduced an expressive class of temporal deduction rules and temporal consistency constraints allowing for a closed and complete representation formalism for temporal uncertain data, and analyzed the properties of their data model from a theoretical perspective Also, we proposed efficient algorithms, which we evaluated on various information-extraction tasks As future work, They aim to investigate the support for periodical temporal data and background knowledge (e.g., presidential elections)

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#