Spatio-Temporal Analytics and Data Management Strategies

extreme scale analytics on spatio temporal n.w
1 / 10
Embed
Share

Explore the world of extreme-scale analytics on spatio-temporal datasets, morphometric image analysis pipeline, satellite data analysis, and subsurface reservoir management. Gain insights into core operation categories, data access patterns, computational complexity, and more for efficient data handling and analysis.

  • Analytics
  • Spatio-Temporal
  • Data Management
  • Morphometric Analysis
  • Satellite Data

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Extreme Scale Analytics on Spatio- Temporal Datasets Joel Saltz Center for Comprehensive Informatics & Biomedical Informatics Department Emory University

  2. Morphometric Image Analysis Pipeline Preprocessing: normalization, tiling, etc. Segmentation: identify nuclei as objects Feature Extraction: compute morphometric features Classification: unsupervised learning (k-means) after patient-level aggregation and analysis

  3. Satellite Data Analysis for Monitoring and Change Analysis

  4. Subsurface Reservoir Management Numerical models of porous media Fluids flow from one region of reservoir to another region Rock and sediment properties change over time Simulate multiple realizations of multiple models and management strategies Evaluate geologic uncertainty and management strategies simultaneously Enable on-demand exploration and comparison of multiple scenarios

  5. Core Operation Categories and Patterns Core Operation Category Operations Data Access Patterns and Computational Complexity Mainly local and regular data access patterns. Moderate computational complexity. Local data access patterns as well as indexed access. Low to moderate, mainly data intensive computations. Irregular local and global data access patterns. Moderate to high computational complexity. Data Cleaning and Low Level Transformations Transformations to reduce effects of sensor/measurement artifacts. Transform sensor acquired measurements to domain specific variables. Data Subsetting, Filtering, Subsampling Select portions of a dataset corresponding to regions in atlas and/or time intervals. Select portions of a dataset based on value ranges (e.g., regions with temperature larger than X degrees). Subsample data to reduce resolution and data size. Map datasets to an atlas. Resolve data redundancy at tile boundaries to form mosaics. Create composite dataset from multiple spatially co-incident datasets. Create derived dataset from spatially co-incident datasets obtained at different times. Segment base level objects such as nuclei, buildings, lakes. Extract features from base level objects. Spatio-temporal Mapping and Registration Object Segmentation Irregular, but primarily local, data access patterns. High computational complexity. Irregular and global data access patterns. High computational complexity. Primarily local with a crucial global component for aggregation. Moderate/high computation complexity. Compute and data-intensive computations. Mixture of local and global data access patterns as well as indexed access. Object Classification Classify base level objects through possibly iterative combination of clustering, machine learning and human input (active learning). Spatio-temporal Aggregation Construct high level objects composed of classified base level object aggregates, e.g., residential areas vs industrial complexes. Compute time- series aggregates over a given imaged area. Change Detection, Comparison, and Quantification Quantify changes over time in domain specific low level variables, base level objects and high level objects. Construct change objects to describe changes in low level domain specific variables, base level and high level objects. Spatial queries for selecting and comparing segmented regions and objects.

  6. Challenges Spatial-temporal disk-resident, on-the-fly, dynamically updated datasets Access and manipulate multiple datasets generated and stored on multiple, distributed systems Analysis of raw data can generate millions to trillions of features (e.g., millions of cells and nuclei in high resolution tissue images) to be mined and compared Take advantage of hardware platforms for analysis Clusters containing hybrid CPU-GPU nodes Extreme scale machines consisting of hundreds of thousands of CPU cores Systems with deep memory and storage hierarchies Cloud computing platforms

  7. Using Hybrid CPU-GPU Systems

  8. Data Structures: Region Templates Describe 2D/3D static and temporal regions. Provides a container for points, arrays, regions, and object sets within a spatial and temporal bounding box. A region template can represent collections of spatial areas and objects where these entities vary from one another in size and shape; e.g. regions generated by segmenting cells in microscopy images, man-made structures or hurricanes in satellite imagery. Primary datasets are defined as point data elements and arrays, and derived datasets as sets of regions and objects. Region templates may be related to one another in a defined manner.

  9. Programming Abstractions and Runtime Middleware Services Programming abstractions Multi-level dataflow pipelines MapReduce style programs Spatial query capabilities I/O and Storage Services Indexing and metadata management for ensembles of datasets I/O support for retrieving data from multiple storage systems and for streaming data Query capabilities Memory Management Careful management and staging of large data structures across memory hierarchies. Masking data movement costs with computation. Execution Services Distributing and rearranging computations and data to minimize data movement Coordinated scheduling and mapping of analysis operations to heterogeneous and hybrid (CPU cores and GPUs) systems to increase overall application throughput Quality of service/data requirements Function variants Provenance Tracking, Fault-detection and tolerance

  10. End

Related


More Related Content