Filling Missing Values in Geo-Sensory Time Series Data
Filling missing values in spatio-temporal data is crucial for accurate analysis. This research focuses on inferring missing entries using collective information from sensors and neighborhoods. It addresses challenges like random and block missing data, non-linear variations in readings over time and location, and the importance of monitoring and data analytics. The goal is to achieve this from different spatial and temporal perspectives, incorporating local recent context and global long-term patterns.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
ST-MVL: Filling Missing Values in Geo-sensory Time Series Data Xiuwen Yi, Yu Zheng, Junbo Zhang, Tianrui Li Microsoft Research Asia Southwest Jiaotong University
Filling Missing Values in Spatio-Temporal Data Data missing is a very common phenomenon in IOT data Lost data that is supposed to have Due to Communication or device errors s5 Time ti+1 Missing reading Observed reading s2 s3 s1 s4 s5 s2 s3 s1 s4 ti s5 s2 s1 s3 s4 t2 s5 s2 s3 s1 s4 t1 A) Missing situation PM2.5 NO2 Humidity Wind Speed Missing rate 13.3% 16.0% 21.5% 30.3% Xiuwen Yi, Yu Zheng, et al. ST-MVL: Filling Missing Values in Geo-sensory Time Series Data. IJCAI 2016
Filling Missing Values in Spatio-Temporal Data Goal: Inferring the values of those missing entries using collective information: Data of a sensor and its neighborhoods A very fundamental problem s5 Time ti+1 Missing reading Observed reading s2 s3 s1 s4 s5 s2 s3 s1 s4 ti s5 s2 s1 s3 s4 t2 s5 s2 s3 s1 s4 t1 A) Missing situation Important for monitoring and further data analytics Xiuwen Yi, Yu Zheng, et al. ST-MVL: Filling Missing Values in Geo-sensory Time Series Data. IJCAI 2016
Filling Missing Values in Spatio-Temporal Data Difficulties Random missing and block missing Not handled by fixed learning models Readings changing over time and location non-linearly Not handled by simple interpolations s5 Time ti+1 Missing reading Observed reading s2 s3 s1 s4 s5 s2 s3 s1 s4 ti s5 s2 s1 s3 s4 t2 s5 s2 s3 s1 s4 t1 A) Missing situation S1 S2 S3 300 Air Quality Index 200 s1 s2 100 s4 s3 0 12 24 36 48 31 33 Time (hour) A) Geo-location of sensors B) Air quality index over time
Fill Missing Values in Spatio-Temporal Datasets Achieve this goal from different perspectives Spatial and Temporal perspectives Spatial neighbors Temporally adjacent time intervals Global and temporal perspectives Local: Recent context Global: Long-term patterns Temporal ...... t1 t2 tj-2 tj-1 tj tj+1 tj+2 tn-1 223 249 ...... tn s5 Time ti+1 s2 s3 s1 s1 s2 s3 Spatial s4 230 230 188 93 . . . 205 173 72 164 185 X 188 185 78 . . . s5 200 118 136 56 146 44 199 99 255 111 s2 s3 s1 s4 59 ti s5 s2 Local 56 s1 s3 s4 sm t2 121 102 . 60 30 40 33 . 88 106 s5 s2 s3 s1 s4 t1 Global A) Missing situation Xiuwen Yi, Yu Zheng, et al. ST-MVL: Filling Missing Values in Geo-sensory Time Series Data. IJCAI 2016
Global: long-term knowledge Spatial ? ??? ?? ?=1 Beijing from May 2014 to May 2015 ???= ?=1 Inverse Distance Weighting (IDW) ? ??? 0.8 0.9 s1 0.7 s2 Ratio Ratio 0.8 0.6 s3 0.5 0.7 0 40 80 120 0 40 80 120 Distance(km) Distance(km) A) Air quality data B) Humidity Temporal 0.8 0.9 Simple Exponential Smoothing (SES) 0.8 0.7 Ratio Ratio t 0.6 0.7 Time interval 0.6 0.5 0 4 8 12 0 4 8 12 ???= ???+ ? 1 ? ?? 1 + + ? 1 ??? 1?1 Time Interval(hour) Time Interval(hour) A) Air quality data B) Humidity
Local: Recent Context Some situations break long-term patterns S1 S2 S3 300 Air Quality Index 200 s1 s2 100 s4 s3 0 12 24 36 48 31 33 Time (hour) A) Geo-location of sensors B) Air quality index over time Temporal ...... t1 t2 tj-2 tj-1 tj tj+1 tj+2 tn-1 223 249 ...... tn Collaborative Filtering Sensors users Time intervals items s1 s2 s3 Spatial 230 230 188 93 . . . 205 173 72 164 185 188 185 78 . . . 200 118 136 56 146 44 199 99 255 111 59 Local 56 sm 121 102 . 60 30 40 33 . 88 106 Global
Fill Missing Values in Spatio-Temporal Datasets Result A multi-view-based method IDW: Inverse Distance Weighting SES: Simple Exponential Smoothing UCF: User-based Collaborative filtering ICF: Item-based Collaborative filtering Multi-view Learning global view local view IDW SES ICF UCF temporal view spatial view Input ????= ?1 ???+ ?2 ???+ ?3 ???+ ?4 ???+ ? Temporal ...... t1 t2 tj-2 tj-1 tj tj+1 tj+2 tn-1 223 249 ...... tn s1 s2 s3 Spatial 230 230 188 93 . . . 205 173 72 164 185 188 185 78 . . . 200 118 136 56 146 44 199 99 255 111 59 Local 56 sm 121 102 . 60 30 40 33 . 88 106 Global
Experiments Baselines Method Spatial Temporal Spatial + Temporal Wind Speed PM2.5 NO2 Humidity Global IDW SES IDW+SES Spatial Temporal 2.2% 3.5% 8.2% 13.3% 3.9% 6.5% 6.8% 16.0% 9.8% 9.6% 4.6% 21.5% 11.8% 19.5% 4.0% 30.3% Block missing Lobal UCF ICF, ARMA CF, NMF, stKNN General missing Overall Global+Local Kriging SARIMA AKE, DESM, NMF-MVL Comparison among different methods (based on PM2.5) General Missing Spatial Block Missing Temporal Block Missing Sudden Change Overall Method MAE MRE MAE MRE MAE MRE MAE MRE MAE MRE ARMA 22.61 0.331 29.26 0.369 \ \ 51.11 0.567 27.47 0.394 Kriging 15.53 0.221 \ \ 15.62 0.222 42.32 0.407 16.59 0.234 SARIMA 14.69 0.220 23.92 0.319 31.20 0.561 52.80 0.586 18.76 0.278 stKNN 12.84 0.188 19.91 0.235 12.72 0.226 35.13 0.390 14.00 0.201 DESM 13.65 0.191 19.24 0.233 12.66 0.224 42.87 0.425 15.59 0.228 AKE 13.34 0.195 19.08 0.229 12.14 0.22 41.54 0.403 14.27 0.211 IDW+SES 11.64 0.171 18.25 0.215 11.95 0.213 34.33 0.381 12.70 0.183 CF 12.20 0.178 19.27 0.234 12.25 0.218 34.91 0.388 13.40 0.193 NMF 11.21 0.163 18.98 0.239 12.73 0.217 34.37 0.381 13.08 0.188 NMF-MVL 11.16 0.162 18.97 0.238 12.66 0.217 34.33 0.380 13.06 0.187 ST-MVL 10.81 0.158 17.85 0.217 11.71 0.208 33.15 0.368 12.12 0.174 Xiuwen Yi, Yu Zheng, et al. ST-MVL: Filling Missing Values in Geo-sensory Time Series Data. IJCAI 2016
Search for Urban Computing Thanks! Yu Zheng yuzheng@microsoft.com Download Urban Air Apps Homepage Zheng, Y., et al. Urban Computing: concepts, methodologies, and applications. ACM transactions on Intelligent Systems and Technology. . . 2015 1 40 Yu Zheng. Methodologies for Cross-Domain Data Fusion: An Overview. IEEE Transactions on Big Data, 1, 1, 2015.