Instant Travel Time Estimation with Sparse Trajectories
This research by Dr. Yu Zheng aims to estimate travel time on road networks instantly using historical and current trajectories generated by vehicles. The methodology involves a context-aware tensor decomposition approach, optimal concatenation, and frequent trajectory pattern mining to address challenges such as data sparsity and scalability.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Travel Time Estimation of a Path using Sparse Trajectories Dr. Yu Zheng yuzheng@microsoft.com Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Tr1 Tr2 Tr3 Tr4 r7 r1 r2 r3 r5 r4 r6
Goal Estimate the travel time of any given path on road network instantly using historical and current trajectories generated by a sample of vehicles D S
Challenges Data sparsity Trajectory concatenation Multiple ways to combine sub-trajectories Length of a sub-trajectory and its support Scalability and efficiency A citywide estimation E.g. Beijing has over 100,000 road segments ?1 ?2 ?3 ?4 S Tr1 Tr2 Tr3 Tr4 r7 r1 ?1+ ?2+ ?3 (?1 ?2) + ?3 ?1+ (?2 ?3) r2 r3 r5 r4 r6 D
Methodology Framework Context-Aware Tensor Decomposition (CATD) Optimal Concatenation (OC) Tensor Map- Matching Tensor Construction Trajectories Decomposition Ar Context Feature Extraction Road Networks Features Trajectory Database Arec Path cost Optimal Concatenation Frequent Trajectory Pattern Mining Patterns
Methodology Supplementing missing values g1 g2 g3 21 0 6 17 g16 0 14 p1 9 p2 p3 v3 v1 v2 ti ti+1 tj 0 0 g1 g2 g3 g4 MG = 0 r1 r1 r3 r4 r4 r6 0 0 0 0 42 v4 v6 r1 v4 g6 g7 g8 g5 r2 g1 g2 g3 22 0 16 8 g16 0 27 d1 r2 t1 t2 r2 v5 0 0 v5 r5 0 35 g11 R10 g9 g12 ti 0 0 0 0 0 0 r3 r3 11 15 0 MG= v7 42 0 0 31 tj r4 0 0 g13 g14 g16 g15 tn 0 0 0 0 7 Tr1 Tr2 Tr3 Y Ar t'k r1: (v1, v2, v4) r2: (v4, v5, v6) tk r3: (v5, v7) r4: (v2, v3) t'j g2 g16 g1 tj f1 f2 fq tj tk t'j t'k uM r1 r2 rN Xh fr fp u2 Xr u1 r1 r2 rN r1 r2 rN
Methodology Supplementing missing values t'k Ah t'j X Y Ar tk g2 g16 g1 tj tk t'j t'k tj f1 f2 fq Xh uM r1 r2 rN fr fp Xr u2 u1 A = Ar || Ah r1 r2 rN ?,?,?,?,?,? =1 2+?1 2+?2 2+ ? ? ?? ?? ?? ? ?? ? ?? 2 2 2 ?3 2 2+ ? 2+ ? 2+ ? 2+ ? 2+ ? 2 ?
Methodology Framework Tensor Map- Matching Tensor Construction Trajectories Decomposition Ar Context Feature Extraction Road Networks Features Trajectory Database Arec Path cost Optimal Concatenation Frequent Trajectory Pattern Mining Patterns
Methodology Optimal Concatenation Suppose ? is decomposed as ?1||?2|| ||?? true travel time: ??. estimated travel time: ??1+ ??2+ + ??? 2 ????,?1,?2, ,?? ? ?? ??1 ??2 ??? argmin?1,?2, ,?? ????,?1,?2, ,??, subject to ?1||?2|| ||??= ? ??1 ??? ??2 D S ?1 ?2 ?? ??
Methodology Optimal Concatenation 2 ????,?1,?2, ,??= ? ?? ??1 ??2 ??? 2 = ? ??1+ ??2+ ??? ??1 ??2 ??? 2+ ?=1 ? ? ? (??? ???)(??? ???) ? ? (??? ???)(??? ???) ??? ??? ? ?(??? ???)2+ = ? ?=1 ?=1 ? = ?=1 ?=1 ?=1 assuming ??? and ??? are independent ? (??? ???)(??? ???) = ? ??? ????(??? ???)=0 ? ?(??? ???)2 ????,?1,?2, ,??= ?=1 ??? ??? 1 1 2? ?(??? ???)2= ?(??? ???,?)2= (??? ???,?)2 ??? ??? ?=1 ? ??????(???,?) ?=1 ????(??? ???,?)2= 1 2 ?=1 = ???
Methodology Support ??? vs. Variance ??? ???,? The bigger the support is, the smaller the error is The bigger the variance the bigger the error 1 ? ??????(???,?) argmin?1,?2, ,?? ?=1 subject to ?1||?2|| ||??= ? Solved by dynamic programming 1 Denote ? ?? = ??? ???(???,?) ? ? ?? argmin?1,?2, ,?? ?=1 subject to ?1||?2|| ||??= ? . (????+ ?(???+1||??+2 ||??) ????= min 1 ?<?
Methodology Make optimal concatenation more efficient Frequent trajectory pattern mining Not necessary to check every combination Using suffix-tree-based method tr3r4,u2,k= tr3,u2,k+tr4,u2,k P: r1r2r3r4 (1) Root tr4,u2,ktr4,u3,k tr4,u4,k , , 3 2 1 3 1 1 (3) Tr1 Tr2 Tr3 Tr4 r7 tk r1 2 r2 2 r3 1 r5 r6 r7 r1 r2 r3 tj uM 1 1 r5 r4 r6 (1) Patterns: r2 r2 1 r3 r6 r7 u2 r2r3, r3r4 u1 1 1 r1 r2 rN Arec Tr2,Tr3,Tr4 r3 r6 r3 (3) (u2, u3, u4) A) An example of suffix-tree B) Filling in the missing time for a pattern
Methodology Combining the suffix tree with tensors Searching for frequent trajectory pattern from the suffix tree Find the travel time of a particular user from the tensor tr3r4,u2,k= tr3,u2,k+tr4,u2,k P: r1r2r3r4 (1) Root tr4,u2,k tr4,u3,k tr4,u4,k , , 3 2 1 3 1 1 (3) tk r1 2 r2 2 r3 1 r5 r6 r7 tj uM 1 1 (1) Patterns: r2 r2 1 r3 r6 r7 u2 r2r3, r3r4 Tr1 Tr2 Tr3 Tr4 u1 1 1 r1 r2 rN r7 Arec r1 Tr2,Tr3,Tr4 r2 r3 r3 r6 r3 (3) (u2, u3, u4) r5 r4 r6 A) An example of suffix-tree B) Filling in the missing time for a pattern
Methodology Deal with efficiency and scalability data-driven space partition an element-wise optimization algorithm Use trajectory patterns as concatenation candidates Indexing recent trajectories for a fast online retrieval Root tk tj r1 r2 r6 tr1 tr1 tr2 tr2 r3 Tr1 g1 g2 Tr2 Tr2 g2 g1 Tr2 uM r6 tr3 r3 tr6 Tr1 Tr1 r2 tr1r2 tr1r2 Tr1 Tr2 u2 tr2r3 Tr1 tr2r6 Tr2 r3 r6 g4 g3 g4 g3 u1 tr1r2r3 tr1r2r6 Tr2 Tr1 r1 r2 rN
Experiments Download data here Datasets Taxi Trajectories: Generated by 32,670 taxicabs in Beijing From Sept. 1 to Oct. 31, 2013. 673+ million GPS points Total length: over 26 million km. Sampling rate: 96 seconds per point. Road networks: 148,110 nodes and 196,307 edges. Covers a 40 50km spatial range Total length of road segments: 21,985km. POIs: Th273,165 POIs of Beijing 195 tier two categories. Chose the top 10 categories that occur around road segments
Experiments Performance of CATD ?r/(5 5): 4,736 12,674 4 0.09% 0.74 300 MAE Time Cost 250 0.73 Time Cost (minute) 200 MAE (min) 0.72 150 100 MAE (min) RMSE 0.71 50 ?? 0.747 1.646 0.70 0 ?? + ? 2 3 4 5 0.732 1.629 Number of time slices L ????(?? + ? + ?) 0.714 1.613 1.0 180 160 MAE (min) Time cost (min) 0.9 140 Metrics 120 Time cost (min) tk 0.8 ?|?? ? ?|?? ? ??| 100 MAE (min) ??? = tj g1 g2 80 g2 g1 uM 0.7 ??| 60 ??? = ?? 40 0.6 u2 20 g4 g3 g4 g3 ??)2 ?(?? u1 0 ???? = 0.5 r1 r2 rN ? 1 2 3 4 5 The number of partitions (hxh)
Experiments Query paths From taxi trajectories 12,384 queries, 50 paths per hour/day Traveled by at least two drivers Total length 76,412.6km Effective time span: 2,734 hours In the field study 114 queries from Sept. 1 to Oct. 30, 2013 Total length 999km Effective time span: 62 hours B) Distribution of the length A) Geographical distribution C) Distribution of time length
Experiments Baselines Speed-Constraint-based (SC) method Trajectory-based Simple Concatenation (TSC) method. Optimal Concatenation with Historical Travel Time (OC+H) method. Optimal Concatenation with Nonnegative Matrix Factorization (OC+MF) Query paths from taxi data MAE (min) 8.808 5.244 3.245 3.061 2.545 MRE 0.665 0.396 0.245 0.231 0.192 MAE/L (min/km) 1.428 0.850 0.526 0.496 0.412 ?? ??? ?? + ? ?? + ?? ???? In-the-field study MAE/L (min/km) MAE (min) 18.193 11.300 4.990 4.052 3.771 MRE 0.561 0.349 0.154 0.125 0.116 ?? ??? ?? + ? ?? + ?? ???? 2.075 1.289 0.569 0.462 0.430
Experiments Efficiency Components Time Memory (MB) Building matrix ?,? 34ms 9 ?? ? 44ms 4.4 Tensor construction Deal with missing values (????) 233ms 14.6 Decomposition 5 5 6.31min 6.4min 2.3ms 116 144 995 Total Best OC Optimal w/o trajectory patterns 8.6ms 877 Concatenation (OC) w/o index 12.2s 714 800 3.0 tr3r4,u2,k= tr3,u2,k+tr4,u2,k P: r1r2r3r4 (1) 9 Root 700 tr4,u2,ktr4,u3,k tr4,u4,k , , Time/Query (ms) MAE 8 3 2 1 3 1 1 2.8 (3) 600 tk r1 2 r2 2 r3 1 r5 r6 r7 7 tj Size of Index (MB) uM 1 500 1 Time/Query (ms) (1) Patterns: 2.6 6 r2 r2 1 r3 r6 r7 u2 r2r3, r3r4 400 u1 1 1 MAE 5 r1 r2 rN Arec Tr2,Tr3,Tr4 300 r3 r6 r3 (3) 2.4 (u2, u3, u4) 4 200 A) An example of suffix-tree B) Filling in the missing time for a pattern 3 100 2.2 2 0 1 2.0 0 200 400 600 800 1000 0 200 400 600 800 1000 Support Support
Conclusion A very fundamental but challenging task Data sparsity Trade off between length of a path and support of trajectories Efficiency and scalability Our method Context-Aware Tensor Decomposition Optimal Concatenation Results Effectiveness 12,384 query paths and 114 in the field study Relative error ratio: 19% and 11.6% Efficiency Partition a city into grids Suffix-tree-based pattern mining and index Infer the travel time of a path in 2.3ms Download data and codes
Search for Urban Computing Thanks! Yu Zheng yuzheng@microsoft.com Homepage
Experiments Download data here 10.0k Datasets Taxi Trajectories: Generated by 32,670 taxicabs in Beijing From Sept. 1 to Oct. 31, 2013. 673+ million GPS points Total length: over 26 million km. Sampling rate: 96 seconds per point. Max. Num. of Trajectories (Support) 8.0k 6.0k 4.0k 2.0k 0.0 0 10 20 30 40 50 Length of a Path Road networks: 148,110 nodes and 196,307 edges. Covers a 40 50km spatial range Total length of road segments: 21,985km. 1-2 3-4 5-6 7-8 >8 50k Num. of Road Segments 40k 30k POIs: 20k Th273,165 POIs of Beijing 195 tier two categories. Chose the top 10 categories that occur around road segments 10k 0 0 4 8 12 16 20 Time of Day