Selectivity Estimation on Streaming Spatio-Textual Data Using Local Correlations

Slide Note
Embed
Share

This research focuses on efficient selectivity estimation on streaming spatio-textual data by incorporating local correlations. It addresses the challenge of estimating the number of objects in a query region with specific keywords in real-time. The study proposes novel approaches like ASP-tree and KMV-based estimation techniques to handle the massive influx of spatio-textual objects while considering spatial and keyword constraints.


Uploaded on Sep 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Selectivity Estimation on Streaming Spatio-Textual Data Using Local Correlations XIAOYANG WANG1, YING ZHANG2, WENJIE ZHANG1, XUEMIN LIN1, WEI WANG1 1. UNIVERSITY OF NEW SOUTH WALES, AUSTRALIA 2. UNIVERSITY OF TECHNOLOGY, SYDNEY, AUSTRALIA

  2. Outline Introduction Preliminaries ASP-tree based Estimation KMV based Estimation A2SP-tree based Estimation Experiments Conclusion 2

  3. Introduction An enormous amount of spatio-textual objects available in many applications Online local search Social network services text location 3

  4. Introduction Spatial keyword search Apple Watch Apple Watch Apple Watch Apple Apple Watch Watch 4

  5. Introduction Spatial keyword search Apple Watch Apple Watch Apple Watch Apple Apple Watch Watch 5

  6. Introduction Selectivity estimation of spatial keyword search How many people talk about Apple Watch Apple Watch Apple Watch Apple Apple Watch Watch 6

  7. Problem Definition Data A spatio-textual object stream Each object consists of a location and a set of keywords Query Query region (q.R) A set of query keywords (q.T) Task Estimate the number of objects that fall in query region q.R and contain the query keywords q.T by using a small summary. 7

  8. Challenges Massive spatio-textual objects continuously arrive at a rapid rate in many applications. It is challenging for the summary to support both spatial and keyword constraints. No existing work for this problem. Query keywords typically have correlations among them. 8

  9. ASP-tree Each node has a counter Each object is only counted in one node Maintain a threshold = n to merge or subdivide node = 0.5, n = 9 = 4.5 r 2 a b e f g h 2 1 2 2 c d e f g h 9

  10. ASP-tree Each node has a counter Each object is only counted in one node Maintain a threshold = n to merge or subdivide node = 0.5, n = 9 = 4.5 r 2 a b e f g h 2 1 2 2 ? ??.? ?.? ? = 2 1 2+ 1 + 2 + 2 + 2 = 8 ? = ?.? c d e f g h 10

  11. K Minimum Value (KMV) a1 a2 a3 0.4 a4 S 0.2 0.8 0.6 ? = 2 a1 a3 a1 a3 a4 0.6 a2 S 0.2 0.4 0.2 0.4 0.8 ? + 1=1 1 5 ? ? =? 1 ? 11

  12. K Minimum Value (KMV) ?1 = 2 a1 a2 a3 0.4 a4 a1 a3 S1 0.2 0.8 0.45 0.2 0.4 ?2 = 3 a1 a3 a5 0.5 a6 a1 a3 a5 0.5 S2 0.2 0.4 0.7 0.2 0.4 ? ?1 ?2 =? 1 =2 1 0.4 ? = min(?1,?2) = 2.5 ? ? ?1,?2 =????? ????? =2 a1 a3 ? 2 0.2 0.4 ? ?1 ?2 = ?(?1,?2) ? ?1 ?2 = 2.5 12

  13. Bayesian network Bayesian network to capture the correlation among keywords T2 P(T2) T1 T2 P(T1|T2) 1 10/15 1 1 8/10 t1 t2 0 5/15 0 1 2/10 1 0 2/5 0 0 3/5 Network Structure Network Parameters ? ?1 ?2 = ? ?1 = 1,?2 = 1 ? = ? ?1 ?2 ? ?2 ? 13

  14. ASP-tree based Estimation r 1 t1 a b t2 t2 ASP-tree (t1) t1,t2 t1,t2 t1,t2 t2 a b c d t2 2 0 3 2 c d t2 t1 r t1,t2 1 t1,t2 ASP-tree (t2) t1 t2 a b c d 3 3 3 1 14

  15. ASP-tree based Estimation r 1 t1 a b t2 t2 ASP-tree (t1) t1,t2 t1,t2 t1,t2 t2 a b c d t2 2 0 3 2 c d t2 t1 r t1,t2 1 t1,t2 ASP-tree (t2) t1 t2 ? ?1 = 1 1 ?(?2) = 1 1 a b c d 2+ 3 + 2 = 5.5 3 3 3 1 2+ 3 + 3 = 6.4 ?? ?=5.5 Pros: good when only one query keyword Cons: Keywords usually have correlation 14 6.5 ? = ? 14 14 = 2.5 ?? ?.? 15

  16. KMV based Estimation t1 a b t2 t2 t1,t2 t1,t2 t1,t2 t2 t2 c d t2 t1 t1,t2 t1,t2 t1 t2 16

  17. KMV based Estimation a b t2 0.4 0.2 t1,t2 0.3 t1: 0.35, 0.5, 0.15 t2 t2 0.35 c d 0.15 t1,t2 0.5 t2: 0.2, 0.3, 0.4, 0.35, 0.1, 0.15 0.1 t1 t2 17

  18. KMV based Estimation a b t2 0.4 0.2 t1,t2 0.3 t1: 0.35, 0.15 t2 t2 0.35 c d 0.15 t1,t2 0.5 t2: 0.4, 0.35, 0.1, 0.15 0.1 t1 t2 ? = ? ?1 ?2 ? ?1,?2 =4 1 0.4 2 4= 3.75 18

  19. KMV based Estimation a b t2 0.4 0.2 t1,t2 0.3 t1: 0.35, 0.15 t2 t2 0.35 c d 0.15 t1,t2 0.5 t2: 0.4, 0.35, 0.1, 0.15 0.1 t1 t2 Pros: good for multiple keyword query Cons: single keyword query is worse than ASP-tree big variance when survived samples size is small ? = ? ?1 ?2 ? ?1,?2 =4 1 0.4 2 4= 3.75 19

  20. Observations Observation 1: Many terms are highly correlated in spatio-textual objects. Moreover, correlations of these terms exhibit the locality property. Observation 2: The local correlations among the terms are likely to be preserved if the search region is enlarged within the same local area. APEC Blue 20

  21. Observations Observation 1: Many terms are highly correlated in spatio-textual objects. Moreover, correlations of these terms exhibit the locality property. Observation 2: The local correlations among the terms are likely to be preserved if the search region is enlarged within the same local area. t1 a b t2 t2 P(T1|T2, R1) = 2/3 t1,t2 t1,t2 t1,t2 t2 t2 R1 c d t2 t1 t1,t2 t1,t2 t1 t2 21

  22. Observations Observation 1: Many terms are highly correlated in spatio-textual objects. Moreover, correlations of these terms exhibit the locality property. Observation 2: The local correlations among the terms are likely to be preserved if the search region is enlarged within the same local area. R2 t1 a b t2 t2 P(T1|T2, R1) = 2/3 t1,t2 t1,t2 t1,t2 t2 P(T1|T2, R2) = 2/3 t2 c d t2 t1 t1,t2 t1,t2 t1 t2 22

  23. Observations Observation 1: Many terms are highly correlated in spatio-textual objects. Moreover, correlations of these terms exhibit the locality property. Observation 2: The local correlations among the terms are likely to be preserved if the search region is enlarged within the same local area. R3 t1 a b t2 t2 P(T1|T2, R1) = 2/3 t1,t2 t1,t2 t1,t2 t2 P(T1|T2, R2) = 2/3 t2 c d t2 P(T1|T2, R3) = 5/11 t1 t1,t2 t1,t2 t1 t2 23

  24. A2SP-tree r 1 a b t2 0.4 a b c d 0.2 t1,t2 0.3 t2 2 0 3 2 t2 0.35 c d 0.15 r 1 t1,t2 0.5 0.1 t1 t2 a b c d 3 3 3 1 24

  25. Experiments Algorithms Range counting (ASP-tree) based estimation : RC-E. DVs (KMV) based estimation : DVs-E. Local correlation based estimation : BN-E. Dataset Property GN TW Cars # objects 2.2M 11.5M 2.25M vocabulary size 208K 1.02M 81K avg. # keywords 6.75 9.3 26 25

  26. Experiments (a) Different Burst Strategies 26

  27. Experiments (b) Vary Space Budget 27

  28. Conclusion Investigate the problem of selectivity estimation for spatial keyword search. Propose ASP-tree and KMV based estimation. Propose the local correlation based estimation. 28

  29. Thanks! Q&A 29

Related