Flexible Spatio-temporal Indexing Scheme for Large Scale GPS Tracks Retrieval
This research paper discusses a novel spatio-temporal indexing scheme optimized for managing large-scale GPS data. The study introduces a stochastic process model to simulate user behavior in uploading GPS tracks, leading to a more efficient indexing scheme with smaller size, minimal update efforts, and improved retrieval performance. Various aspects such as user behavior modeling, index design, and experimental results are explored in detail.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
A Flexible Spatio-temporal indexing Scheme for Large Scale GPS Tracks Retrieval Yu Zheng, Longhao Wang, Xing Xie yuzheng@microsoft.com Microsoft Research Asia
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Introduction Background GPS-enabled devices become prevalent Large amount of GPS logs have been accumulated Quite a few GPS-data-sharing applications appeared Spatio-temporal index is necessary For system: to manage the potentially large-scale data For users: to explore the GPS data interested them
Introduction Problem Definition Retrieve the GPS trajectories across a given region and intersecting a given time span Present techniques are not optimized to these applications Spatial query Temporal query
Introduction Our contributions A stochastic process model: simulating user behavior of uploading GPS tracks Users prefer to upload data they created recently The insert frequency of different parts of index are skewed A novel indexing scheme: optimized to the user behavior of uploading GPS tracks Smaller index size Minimal update efforts Satisfactory retrieval performance
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Modeling User Behavior A GPS track end time: Te Latitude, longitude, Time P1: Lat1, long1, T1 P2: Lat2, long2, T2 ... Pn: Latn, longn, Tn start time: Ts Pn-1 Pn P2 P3 P1 Duration of a GPS track Interval between trajectory created and uploaded
Modeling User Behavior Upload log file to server at time Tup GPS Log File Users arrival can be modeled as Poisson process Te The interval between uploading time and end time of trajectory Tint = Tup -Te Can be modeled as Rayleigh distribution Ts Tdur = Te -Ts 0.25 0.6 0.5 0.2 0.4 Probability Probability 0.15 0.3 0.1 0.2 0.1 0.05 0 0 0 1 2 3 4 5 6 7 8 22 23 24 25 26 5 10 15 20 25 30 35 40 45 50 55 60 Interval between date image uploaded and date image taken (day) Summarized from photos uploaded by multiple users over a period of 3 months on Flickr Duration of GPS track (Minutes) Tdur follows Gaussian distribution
Modeling User Behavior A (Ts, Te) represents a GPS track
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Index Design Architecture Partition space into disjoint grids Maintain a temporal index for each grid The temporal index (CSE-Tree) is special Spatial index track Segment 1 Segment 2 Segment 3 Temporal index
Temporal Index (CSE-Tree) A GPS segment can be represented by a pair (Ts, Te) A point on two dimensional plane A temporal query is a time span (Timemin , Timemax)
Temporal index Structure Partition the points into groups by Te Build a start time index (B+ Tree)to index points of each group Build a end time index (B+ Tree) to index groups Te Te ti+1 ti+1 Start Time Index Si Si index points with Ti <= Te < Ti+1 End time index ti ti ... t2 t2 S1 t1 t1 S0 Ts Ts t0
Temporal Index (CSE-Tree) Three operations Insert Compress Search
Temporal Index (CSE-Tree) Compress operation Occur when update frequency drops to some extent Convert B+ tree to dynamic array Te insert frequency now Sj+1 B+ Tree Sj ... dynamic array Si Ts now
Temporal Index (CSE-Tree) Search operation Te>Timemin: Search End Time index to get the corresponding start time indexes Ts< Timemax: Look up each start time index candidate to find the correct points Te Tree3 t3 Tree2 t2 Tree1 Timemin t1 Tree0 Ts t0 Timemax
Outline Introduction Modeling user behavior Index design Experimental results Conclusion
Experimental Settings Platform PC with 3.00 GHz Intel Pentium 4 CPU, Windows XP SP2 platform, and 0.99 GB RAM Parameters B+ tree: Inner node size is 64 bytes Leaf size 1024 bytes Poisson process: 100, 300, 500 and 700 Total duration of the process is 2400 hours (100 days) Rayleigh distribution: Tintis 1.07. Normal distributionof Tdur: mean (0.42), variance (0.98).
Experimental Results The compress operation saves index size No overlap between nodes B+ tree Dynamic array 80 70 60 index size (MB) 50 40 R-tree 30 SEB-tree 20 CSE-tree 10 0 100 300 500 700 intensity of upload Index size comparison
Experimental Results Te Te Insert efforts Less node access than both SEB-tree and R-tree Most inserts occur in the area surrounded by the broken line Few node access in End Time Tree Ts Ts 140 120 number of node access 100 80 R-tree 60 SEB-tree 40 CSE-tree 20 0 100 300 500 700 intensity of upload Mean number of node access in one insertion
Experimental Results Query performance 1600 1400 number of node access 1200 1000 800 R-tree 600 SEB-tree 400 CSE-tree 200 0 100 300 500 700 intensity of upload Mean number of node access in one query
Conclusion A model simulating user behavior of upload data Based on stochastic process theory statistical analysis on the data collection in real world CSE-Tree Smaller index size Less node access in insertion Slightly more node access than SEB-tree in query
Thanks! yuzheng@microsoft.com Q&A