Refining Coarse-Grained Labels based on Time

R
e
f
i
n
i
n
g
C
o
a
r
s
e
-
G
r
a
i
n
e
d
L
a
b
e
l
s
 
b
a
s
e
d
o
n
 
T
i
m
e
N
i
e
k
 
T
a
x
,
 
M
s
c
.
2
 
 
 
Mining human life
3
 
 
Motivating example
4
 
 
 
5
 
 
 
6
 
 
 
2d 17h 43m
15d 22h 47m
1d 21h 0m
24d 18h 48m
7
 
 
 
8
 
 
 
9
 
 
 
2d 17h 43m
15d 22h 47m
1d 21h 0m
24d 18h 48m
10
10
 
 
 
2d 17h 43m
15d 22h 47m
1d 21h 0m
24d 18h 48m
Application reviewed -> {Application reviewed_1, Application reviewed_2}
Steps
1.
Generate a set of potential label splits based on log
2.
Evaluate each potential label split
3.
Select set of label splits from the potential label splits
 
11
11
 
12
12
Potential Splits
Potential splits
Step 2) Evaluate each potential split
When is a split of an activity label a good split?
 
13
13
Step 2) Evaluate each potential split
When is a split of an activity label a good split?
 
14
14
Step 2) Evaluate each potential split
When is a split of an activity label a good split?
 
15
15
Ordering-based log statistics :
Direct follows
Eventually follows
Direct precedes
Eventually precedes
Length-two loop
Etc…
 
16
16
Ordering-based log statistics:
Direct follows
Eventually follows
Direct precedes
Eventually precedes
Length-two loop
Etc…
We can regard a log statistic as a statistic on two activities:
 
“Activity a eventually follows activity b n times”
 
17
17
Splitting activity label a
 
18
18
a2
a
b
Eventually Follows
Eventually Precedes
a1
a
Splitting activity label a
 
19
19
a2
a
b
Eventually Follows
Eventually Precedes
a1
a
We calculate a transformed
version of the log in which
the label split under
consideration is performed
Statistical intermezzo: Fisher’s Exact Test
Knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24
are female, and assuming the null hypothesis that men and women are
equally likely to diet, what is the probability that these 10 dieters would
be so unevenly distributed between the women and the men?
Uses hypergeometric distribution
Assumes independence of the binary data
Assumes fixed row and column totals
Found to be slightly conservative in case of non-fixed row or column
totals
 
20
20
Statistical intermezzo: Fisher’s Exact Test
Knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24
are female, and assuming the null hypothesis that men and women are
equally likely to diet, what is the probability that these 10 dieters would
be so unevenly distributed between the women and the men?
Uses hypergeometric distribution
Assumes independence of the binary data
Assumes fixed row and column totals
Found to be slightly conservative in case of non-fixed row or column
totals
 
21
21
Let’s examine split
 
22
22
 
Eventually Follows
Eventually Precedes
 
Let p(a,b) be:
a eventually follows b
a eventually precedes b
a2
a
b
a1
Let’s examine split
Is the difference between a1 and a2 with regard to property p statistically significant?
Fisher’s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following
b is the same for a1 and a2
 
23
23
 
Let’s examine split
Is the difference between a1 and a2 with regard to property p statistically significant?
Fisher’s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following
b is the same for a1 and a2
 
24
24
 
P-value: 0.0002
Let’s examine split
Is the difference between a1 and a2 with regard to property p statistically significant?
Fisher’s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following
b is the same for a1 and a2
What is the effect size of the difference between a1 and a2 with regard to property p?
Odds Ratio:
 
25
25
 
P-value: 0.0002
Let’s examine split
Is the difference between a1 and a2 with regard to property p statistically significant?
Fisher’s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following
b is the same for a1 and a2
What is the effect size of the difference between a1 and a2 with regard to property p?
Odds Ratio:
 
26
26
 
P-value: 0.0002
Let’s examine split
Is the difference between a1 and a2 with regard to property p statistically significant?
Fisher’s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following
b is the same for a1 and a2
What is the effect size of the difference between a1 and a2 with regard to property p?
Odds Ratio:
 
27
27
 
P-value: 0.0002
Odds ratio: 1.125
Let’s examine split
Is the difference between a1 and a2 with regard to property p statistically significant?
Fisher’s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following
b is the same for a1 and a2
What is the effect size of the difference between a1 and a2 with regard to property p?
Odds Ratio:
 
28
28
 
Odds ratio: 4
P-value: 1.0
Correcting for multiple splits
Bonferroni correction
 
Divide the significance level by the number of tests performed
 
29
29
a2
a
b
a1
Eventually Follows
Eventually Precedes
Step 3) Selecting set of splits from set of potential splits
When multiple splits were found for the same activity that meet the
thresholds, choose the one with the highest effect size.
 
30
30
Multiple Split Problem
 
Time Relation Patterns
 
32
32
 
 
33
33
Time Relation Patterns
 
34
34
 
 
35
35
 
 
36
36
Time Relation Patterns
 
37
37
 
 
38
38
Time Relation Patterns
 
39
39
 
 
40
40
 
41
41
 
42
42
Step 1) Generating potential splits
 
43
43
 
Step 1) Generating potential splits
 
44
44
 
First we calculate time relations
 
45
45
First we calculate time relations
 
46
46
First we calculate time relations
 
47
47
First we calculate time relations
 
48
48
First we calculate time relations
 
49
49
First we calculate time relations
 
50
50
First we calculate time relations
 
51
51
First we calculate time relations
 
52
52
First we calculate time relations
 
53
53
First we calculate time relations
 
54
54
First we calculate time relations
 
55
55
First we calculate time relations
 
56
56
First we calculate time relations
 
57
57
First we calculate time relations
 
58
58
2d 17h 43m
15d 22h 47m
Generating the splits
Quantiles
 
59
59
3
2
Generating the splits
Quantiles
 
60
60
3
2
Generating the splits
Quantiles
 
61
61
3
2
Generating the splits
OPTICS (density-based clustering algorithm)
 
62
62
 
63
63
False Positive Type I
Parallel activities
 
64
64
False Positive Type II
Intermediate parallel activity
 
65
65
False Positive Type III
Unfolded loop
 
66
66
Slide Note
Embed
Share

Process of refining coarse-grained labels based on time in human life activities. Dive into motivating case studies with detailed timestamps of building permit applications to understand the decision-making timeline. Witness a unique analysis of time-based events in this insightful presentation

  • Time-based analysis
  • Human activities
  • Decision-making process
  • Data visualization

Uploaded on Feb 22, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Refining Coarse-Grained Labels based on Time Niek Tax, Msc.

  2. Mining human life 2

  3. Motivating example Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 3

  4. 4

  5. Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 5

  6. Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 2d 17h 43m 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 15d 22h 47m 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 1d 21h 0m 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 24d 18h 48m 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 6

  7. 7

  8. 8

  9. Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 2d 17h 43m 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 15d 22h 47m 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 1d 21h 0m 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 24d 18h 48m 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 9

  10. Application reviewed -> {Application reviewed_1, Application reviewed_2} Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 2d 17h 43m 1 Application reviewed_1 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 15d 22h 47m 2 Application reviewed_2 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 1d 21h 0m 3 Application reviewed_1 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 24d 18h 48m 3 Application reviewed_2 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 10

  11. Steps 1. Generate a set of potential label splits based on log 2. Evaluate each potential label split 3. Select set of label splits from the potential label splits 11

  12. Potential Splits 12

  13. Step 2) Evaluate each potential split When is a split of an activity label a good split? 13

  14. Step 2) Evaluate each potential split When is a split of an activity label a good split? 14

  15. Step 2) Evaluate each potential split When is a split of an activity label a good split? 15

  16. Ordering-based log statistics : Direct follows Eventually follows Direct precedes Eventually precedes Length-two loop Etc 16

  17. Ordering-based log statistics: Direct follows Eventually follows Direct precedes Eventually precedes Length-two loop Etc We can regard a log statistic as a statistic on two activities: Activity a eventually follows activity b n times 17

  18. Splitting activity label a a Eventually Follows Eventually Precedes a1 a2 a b 18

  19. Splitting activity label a We calculate a transformed version of the log in which the label split under consideration is performed a Eventually Follows Eventually Precedes a1 a2 a b 19

  20. Statistical intermezzo: Fishers Exact Test Men Women Row total Dieting 1 9 10 Non-dieting 11 3 14 Column total 12 12 24 Knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24 are female, and assuming the null hypothesis that men and women are equally likely to diet, what is the probability that these 10 dieters would be so unevenly distributed between the women and the men? Uses hypergeometric distribution Assumes independence of the binary data Assumes fixed row and column totals Found to be slightly conservative in case of non-fixed row or column totals 20

  21. Statistical intermezzo: Fishers Exact Test Men Women Row total Dieting 1 (5) 9 (5) 10 Non-dieting 11 (7) 3 (7) 14 Column total 12 12 24 Knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24 are female, and assuming the null hypothesis that men and women are equally likely to diet, what is the probability that these 10 dieters would be so unevenly distributed between the women and the men? Uses hypergeometric distribution Assumes independence of the binary data Assumes fixed row and column totals Found to be slightly conservative in case of non-fixed row or column totals 21

  22. Eventually Follows Eventually Precedes { , a a } a Let s examine split 1 2 a1 a2 : p A A Let p(a,b) be: a eventually follows b a eventually precedes b a b a a 1 2 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 22

  23. { , } a a a Let s examine split 1 2 a a 1 2 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 23

  24. { , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 24

  25. { , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 25

  26. { , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 26

  27. { , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 Odds ratio: 1.125 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 27

  28. { , } a a a Let s examine split 1 2 a1 a2 Odds ratio: 4 P-value: 1.0 4 1 a a 1 1 1 2 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 28

  29. Correcting for multiple splits Bonferroni correction Divide the significance level by the number of tests performed Eventually Follows Eventually Precedes a1 a2 a b 29

  30. Step 3) Selecting set of splits from set of potential splits When multiple splits were found for the same activity that meet the thresholds, choose the one with the highest effect size. 30

  31. Multiple Split Problem

  32. Time Relation Patterns 32

  33. 33

  34. Time Relation Patterns 34

  35. 35

  36. 36

  37. Time Relation Patterns 37

  38. 38

  39. Time Relation Patterns 39

  40. 40

  41. 41

  42. 42

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#