Refining Coarse-Grained Labels based on Time
Process of refining coarse-grained labels based on time in human life activities. Dive into motivating case studies with detailed timestamps of building permit applications to understand the decision-making timeline. Witness a unique analysis of time-based events in this insightful presentation
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Refining Coarse-Grained Labels based on Time Niek Tax, Msc.
Motivating example Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 3
Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 5
Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 2d 17h 43m 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 15d 22h 47m 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 1d 21h 0m 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 24d 18h 48m 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 6
Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 2d 17h 43m 1 Application reviewed 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 15d 22h 47m 2 Application reviewed 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 1d 21h 0m 3 Application reviewed 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 24d 18h 48m 3 Application reviewed 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 9
Application reviewed -> {Application reviewed_1, Application reviewed_2} Case Activity Timestamp 1 Building permit application received 2014/02/28 15:31 2d 17h 43m 1 Application reviewed_1 2014/03/03 09:14 1 Application rejected 2014/03/04 16:23 2 Building permit application received 2014/03/03 13:45 15d 22h 47m 2 Application reviewed_2 2014/03/19 08:59 2 Application accepted 2014/03/20 12:32 3 Building permit application received 2014/03/16 14:52 1d 21h 0m 3 Application reviewed_1 2014/03/18 11:52 3 Application rejected 2014/03/19 12:33 3 Building permit application received 2014/03/19 16:37 24d 18h 48m 3 Application reviewed_2 2014/04/13 11:25 3 Application accepted 2014/04/14 12:11 10
Steps 1. Generate a set of potential label splits based on log 2. Evaluate each potential label split 3. Select set of label splits from the potential label splits 11
Step 2) Evaluate each potential split When is a split of an activity label a good split? 13
Step 2) Evaluate each potential split When is a split of an activity label a good split? 14
Step 2) Evaluate each potential split When is a split of an activity label a good split? 15
Ordering-based log statistics : Direct follows Eventually follows Direct precedes Eventually precedes Length-two loop Etc 16
Ordering-based log statistics: Direct follows Eventually follows Direct precedes Eventually precedes Length-two loop Etc We can regard a log statistic as a statistic on two activities: Activity a eventually follows activity b n times 17
Splitting activity label a a Eventually Follows Eventually Precedes a1 a2 a b 18
Splitting activity label a We calculate a transformed version of the log in which the label split under consideration is performed a Eventually Follows Eventually Precedes a1 a2 a b 19
Statistical intermezzo: Fishers Exact Test Men Women Row total Dieting 1 9 10 Non-dieting 11 3 14 Column total 12 12 24 Knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24 are female, and assuming the null hypothesis that men and women are equally likely to diet, what is the probability that these 10 dieters would be so unevenly distributed between the women and the men? Uses hypergeometric distribution Assumes independence of the binary data Assumes fixed row and column totals Found to be slightly conservative in case of non-fixed row or column totals 20
Statistical intermezzo: Fishers Exact Test Men Women Row total Dieting 1 (5) 9 (5) 10 Non-dieting 11 (7) 3 (7) 14 Column total 12 12 24 Knowing that 10 of these 24 teenagers are dieters, and that 12 of the 24 are female, and assuming the null hypothesis that men and women are equally likely to diet, what is the probability that these 10 dieters would be so unevenly distributed between the women and the men? Uses hypergeometric distribution Assumes independence of the binary data Assumes fixed row and column totals Found to be slightly conservative in case of non-fixed row or column totals 21
Eventually Follows Eventually Precedes { , a a } a Let s examine split 1 2 a1 a2 : p A A Let p(a,b) be: a eventually follows b a eventually precedes b a b a a 1 2 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 22
{ , } a a a Let s examine split 1 2 a a 1 2 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 23
{ , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 24
{ , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 25
{ , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 26
{ , } a a a Let s examine split 1 2 a1 a2 P-value: 0.0002 Odds ratio: 1.125 5000 5000 a a 1 2 3200 3600 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 27
{ , } a a a Let s examine split 1 2 a1 a2 Odds ratio: 4 P-value: 1.0 4 1 a a 1 1 1 2 ( , ) ( , ) p a b ( , ) ( p a b p a b p a b 1 2 | | | | , ) a a 1 1 2 2 Is the difference between a1 and a2 with regard to property p statistically significant? Fisher s Exact Test: Null hypothesis: chance of an occurrence of a of eventually following b is the same for a1 and a2 What is the effect size of the difference between a1 and a2 with regard to property p? Odds Ratio: 1 2 2 ( , )(| | ( , )) ( , )(| | p( , )) p a b a a b p a b a p a b 2 1 1 28
Correcting for multiple splits Bonferroni correction Divide the significance level by the number of tests performed Eventually Follows Eventually Precedes a1 a2 a b 29
Step 3) Selecting set of splits from set of potential splits When multiple splits were found for the same activity that meet the thresholds, choose the one with the highest effect size. 30