Introduction to Spatial Data Mining: Discovering Patterns in Large Datasets

undefined
 
A Brief Introduction  to Spatial Data Mining
inspired by Shashi Shekhar (UMN)
 
Spatial data mining
 is the  process of discovering
interesting, useful, non-trivial patterns from large 
spatial
datasets
 
Reading Material: 
http://en.wikipedia.org/wiki/Spatial_analysis
Spatial Statistics Software: 
http://www.spatial-statistics.com/
 
COSC 3337
 
Shashi Shekhar
 
Examples of Spatial Patterns
 
Historic Examples
1855 Asiatic Cholera in London: A water pump identified as the source
Fluoride and healthy gums near Colorado river
Theory of  Gondwanaland - continents fit like pieces of a jigsaw puzzle
https://www.nps.gov/subjects/geology/plate-tectonics-the-unifying-theory-of-
geology.htm#:~:text=The%20idea%20of%20continental%20drift%2C%20inspired%20by%20the,the%20theory%20that%20later%20developed%20as%20plate%2
0tectonics.
Modern Examples
Crime hotspots for planning police patrol routes
Bald eagles nest on tall trees near open water
Nile virus spreading from north east USA to south and west
Unusual warming of Pacific ocean (El Nino) affects weather in USA
 
http://en.wikipedia.org/wiki/Spatial_analysis
 
London Cholera in 1854
 
Why Learn about Spatial Data Mining?
 
Two basic reasons for new work
Consideration of use in certain application domains
Provide fundamental new understanding
 
Application domains
Scale up secondary spatial (statistical) analysis to very large datasets
Describe/explain locations of human settlements in last 5000 years
Find cancer clusters (CDC: 
https://www.cdc.gov/nceh/clusters/default.htm
 
) to locate
hazardous environments
Prepare land-use maps from satellite imagery (
Satellite images help create land
use maps in villages - Sub-Saharan Africa (scidev.net)
)
Predict habitat suitable for endangered species 
(
Feds Seek to Protect Lizard Habitat in the
Permian Basin (sanangelolive.com)
)
Find new spatial patterns
Find groups of co-located geographic features
 
Why Learn about Spatial Data Mining? - 2
 
New understanding of geographic processes for Critical questions
Ex. How is the health of planet Earth?
Ex. Characterize effects of human activity on environment and ecology
Ex. Predict effect of El Nino on weather, and economy
Traditional approach: manually generate and test hypothesis
But, spatial data is growing too fast to analyze manually
Satellite imagery, GPS tracks, sensors on highways,  …
Number of possible geographic hypothesis too large to explore manually
Large number of geographic features and locations
Number of interacting subsets of features grow exponentially
Ex. Find tele connections between weather events across ocean and land areas
SDM may reduce the set of plausible hypothesis
Identify hypothesis supported by the data
For further exploration using traditional statistical methods
 
 
Autocorrelation
 
Items in a traditional data are independent of each other,
whereas properties of locations in a map are often “
auto-correlated
”.
First law of  geography [Tobler]:
 Everything is related to everything, but nearby things are more related
than distant things.
People with similar backgrounds tend to live in the same area
Economies of nearby regions tend to be similar
Changes in temperature occur gradually over space(and time)
 
Waldo Tobler in 2000
 
 
Papers on “Laws in Geography”:
 
http://www.geog.ucsb.edu/~good/papers/393.pdf
http://www.cs.uh.edu/~ceick/DM/GOO10.pdf
 
Characteristics of Spatial Data Mining
 
Auto correlation
Patterns usually have to be defined in the spatial attribute subspace
and not in the complete attribute space
Longitude and latitude (or other coordinate systems) are the glue that
link different data collections together
People are used  to maps in GIS; therefore, data mining results have
to be summarized on the top of maps
Patterns not only refer to points, but can also refer to lines, or
polygons or other higher order geometrical objects
Patterns exist at different levels of granularity
Large number of patterns, large dataset sizes
Spatial patterns, e.g. spatial clusters can have arbitrary shapes
Regional knowledge is of particular importance due to lack of global
knowledge in geography (
spatial heterogeniety)
 
News November 9, 2023
 
Task5 is due in Kritik November 14/15!
Course exams will not be returned to students, but you
can view yours.
Today’s Class
Finish Discussion Spatial Data Mining
Graduate Research Opportunities
GHC Presentations groups J and K
Watch and Discuss parts of MIT Deep Learning
Bootcamp video; the discussion of Deep Learning will
be continued in lectures on November 14 or 16 by
Mahin.
 
Why Regional Knowledge Important in Spatial Data Mining?
 
A special challenge in spatial data mining is that
information is usually not uniformly distributed in spatial
datasets.
It has been pointed out in the literature that “
whole map
statistics are seldom useful
”, that “
most relationships in
spatial data sets are geographically regional, rather than
global
”, and that “
there is no average place on the Earth’s
surface
” [Goodchild03, Openshaw99].
Therefore, it is not surprising that domain experts are
mostly interested in discovering hidden patterns at a
regional scale rather than a global scale.
 
Michael Frank Goodchild
 
Spatial Autocorrelation: Distance-based measure
 
K
-function Definition 
(
https://www3.nd.edu/~mhaenggi/ee87021/Dixon-K-Function.pdf
 )
Test against randomness for point pattern
λ
 
is intensity of event
Model departure from randomness in a wide range of scales
Inference
For Poisson complete spatial randomness (CSR): K(h) = 
π
h
2
Plot Khat(h) against h, compare to Poisson CSR
>: cluster
<: decluster/regularity
 
K-Function based Spatial Autocorrelation
 
11
 
Basic Approach Using K-Functions
 
Example: Collocation Red and Green Objects
 
FOR radii r
1
,…,r
n
 DO
   FOR all green objects g DO
      Compute #-of-red objects within radius r
j
 of g ENDDO
   Compute average ro
j 
of values observed in previous loop
   Put entry (r
j
, (ro
j
/total_number_of_red_objects)) into
Curve
   ENDDO
 
 
 
 
 
 
Answers:                           and
 
find patterns from the following sample dataset?
 
  
Associations, Spatial associations, Co-location
 
Illustration of Cross-Correlation
 
Illustration of Cross 
K
-function for Example Data
 
Cross-K Function for Example Data
 
Colocation Rules – Spatial Interest Measures
 
http://www.youtube.com/watch?v=RPyJwYqyBuI
 
Spatial Association Rules
 
Spatial Association Rules
 A special reference spatial feature
 Transactions are defined around instance of special spatial feature
 Item-types = spatial predicates
Example: Table 7.5 (pp. 204)
 
Participation index  =   
min{pr(f
i
, c)}
Where pr(f
i
, c) of feature f
i 
in co-location c  = {f
1
, f
2
, …, f
k
}:
 
= fraction of instances of f
i
 with feature {f
1
, …, f
i-1
,  f
i+1
, …, f
k
} nearby
 
N(L) = neighborhood of location L
 
 
Co-location rules vs. traditional association rules
 
Skip in 2023
 
Spatial Regression
 
 
http://www2.cs.uh.edu/~ceick/DM/Spatial Regression.pptx
 
 
Will cover a few slides from this slideshow next:
 
Conclusions Spatial Data Mining
 
Spatial patterns are opposite of random
Common spatial patterns: location prediction, feature interaction, hot spots,
geographically referenced statistical patterns, co-location, emergent patterns,…
SDM = search for unexpected interesting patterns in large spatial databases
Spatial patterns may be discovered using
Techniques like classification, associations, clustering and outlier detection
New techniques are needed for SDM due to
Spatial Auto-correlation
Importance of non-point data types (e.g. polygons)
Continuity of space
Regional knowledge; also establishes a need for scoping
Separation between spatial and non-spatial subspace
in traditional
approaches clusters are usually defined over the complete attribute space
Knowledge sources are available now
Raw knowledge to perform spatial data mining is mostly available online now
GIS tools are available that facilitate integrating knowledge from different
source (Google Earth, ARCGIS)
 
 
A Few Links
 
ACM SIGSPATIAL 2023 
 Home
Geographically weighted regression:
https://www.bing.com/videos/search?q=Geographically+weighted+regre
ssion+video&docid=608004448652240120&mid=25C7E9AF4767BA3EE2E
C25C7E9AF4767BA3EE2EC&view=detail&FORM=VIRE
 
 
 
 
Dr. Eick will give a short report about the ACM SIGSPATIAL
Conference in the November 21 or November 28 lecture!
Slide Note
Embed
Share

Spatial data mining involves uncovering valuable patterns from extensive spatial datasets, offering insights into historical events, environmental phenomena, and predictive analytics. Examples range from analyzing disease outbreaks to predicting habitat suitability for endangered species. The application of spatial data mining provides a new understanding of geographic processes, aiding in addressing critical questions related to Earth's health, human impacts on the environment, and forecasting weather patterns.

  • Spatial Data Mining
  • Geographic Analysis
  • Data Patterns
  • Predictive Analytics
  • Environmental Science

Uploaded on May 16, 2024 | 2 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. COSC 3337 A Brief Introduction to Spatial Data Mining inspired by Shashi Shekhar (UMN) Shashi Shekhar Spatial data mining is the process of discovering interesting, useful, non-trivial patterns from large spatial datasets Reading Material: http://en.wikipedia.org/wiki/Spatial_analysis Spatial Statistics Software: http://www.spatial-statistics.com/

  2. Examples of Spatial Patterns Historic Examples 1855 Asiatic Cholera in London: A water pump identified as the source Fluoride and healthy gums near Colorado river Theory of Gondwanaland - continents fit like pieces of a jigsaw puzzle https://www.nps.gov/subjects/geology/plate-tectonics-the-unifying-theory-of- geology.htm#:~:text=The%20idea%20of%20continental%20drift%2C%20inspired%20by%20the,the%20theory%20that%20later%20developed%20as%20plate%2 0tectonics. Modern Examples Crime hotspots for planning police patrol routes Bald eagles nest on tall trees near open water Nile virus spreading from north east USA to south and west Unusual warming of Pacific ocean (El Nino) affects weather in USA http://en.wikipedia.org/wiki/Spatial_analysis Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  3. London Cholera in 1854 Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  4. Why Learn about Spatial Data Mining? Two basic reasons for new work Consideration of use in certain application domains Provide fundamental new understanding Application domains Scale up secondary spatial (statistical) analysis to very large datasets Describe/explain locations of human settlements in last 5000 years Find cancer clusters (CDC: https://www.cdc.gov/nceh/clusters/default.htm ) to locate hazardous environments Prepare land-use maps from satellite imagery (Satellite images help create land use maps in villages - Sub-Saharan Africa (scidev.net)) Predict habitat suitable for endangered species (Feds Seek to Protect Lizard Habitat in the Permian Basin (sanangelolive.com)) Find new spatial patterns Find groups of co-located geographic features Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  5. Why Learn about Spatial Data Mining? - 2 New understanding of geographic processes for Critical questions Ex. How is the health of planet Earth? Ex. Characterize effects of human activity on environment and ecology Ex. Predict effect of El Nino on weather, and economy Traditional approach: manually generate and test hypothesis But, spatial data is growing too fast to analyze manually Satellite imagery, GPS tracks, sensors on highways, Number of possible geographic hypothesis too large to explore manually Large number of geographic features and locations Number of interacting subsets of features grow exponentially Ex. Find tele connections between weather events across ocean and land areas SDM may reduce the set of plausible hypothesis Identify hypothesis supported by the data For further exploration using traditional statistical methods Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  6. Autocorrelation Items in a traditional data are independent of each other, whereas properties of locations in a map are often auto-correlated . First law of geography [Tobler]: Everything is related to everything, but nearby things are more related than distant things. People with similar backgrounds tend to live in the same area Economies of nearby regions tend to be similar Changes in temperature occur gradually over space(and time) Waldo Tobler in 2000 Papers on Laws in Geography :http://www.geog.ucsb.edu/~good/papers/393.pdf http://www.cs.uh.edu/~ceick/DM/GOO10.pdf Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  7. Characteristics of Spatial Data Mining Auto correlation Patterns usually have to be defined in the spatial attribute subspace and not in the complete attribute space Longitude and latitude (or other coordinate systems) are the glue that link different data collections together People are used to maps in GIS; therefore, data mining results have to be summarized on the top of maps Patterns not only refer to points, but can also refer to lines, or polygons or other higher order geometrical objects Patterns exist at different levels of granularity Large number of patterns, large dataset sizes Spatial patterns, e.g. spatial clusters can have arbitrary shapes Regional knowledge is of particular importance due to lack of global knowledge in geography ( spatial heterogeniety) Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  8. News November 9, 2023 Task5 is due in Kritik November 14/15! Course exams will not be returned to students, but you can view yours. Today s Class Finish Discussion Spatial Data Mining Graduate Research Opportunities GHC Presentations groups J and K Watch and Discuss parts of MIT Deep Learning Bootcamp video; the discussion of Deep Learning will be continued in lectures on November 14 or 16 by Mahin. Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  9. Why Regional Knowledge Important in Spatial Data Mining? A special challenge in spatial data mining is that information is usually not uniformly distributed in spatial datasets. It has been pointed out in the literature that whole map statistics are seldom useful , that most relationships in spatial data sets are geographically regional, rather than global , and that there is no average place on the Earth s surface [Goodchild03, Openshaw99]. Therefore, it is not surprising that domain experts are mostly interested in discovering hidden patterns at a regional scale rather than a global scale. Michael Frank Goodchild Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  10. Spatial Autocorrelation: Distance-based measure K-function Definition (https://www3.nd.edu/~mhaenggi/ee87021/Dixon-K-Function.pdf ) Test against randomness for point pattern is intensity of event Model departure from randomness in a wide range of scales Inference For Poisson complete spatial randomness (CSR): K(h) = h2 Plot Khat(h) against h, compare to Poisson CSR >: cluster <: decluster/regularity [number of events within distance h of an arbitrary event] = 1 ( ) K h E K-Function based Spatial Autocorrelation Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  11. Basic Approach Using K-Functions esri-k.jpg 11 Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  12. Example: Collocation Red and Green Objects FOR radii r1, ,rn DO FOR all green objects g DO Compute #-of-red objects within radius rj of g ENDDO Compute average roj of values observed in previous loop Put entry (rj, (roj/total_number_of_red_objects)) into Curve ENDDO Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  13. Associations, Spatial associations, Co-location Answers: and find patterns from the following sample dataset? Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  14. Illustration of Cross-Correlation Illustration of Cross K-function for Example Data Cross-K Function for Example Data Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  15. Colocation Rules Spatial Interest Measures http://www.youtube.com/watch?v=RPyJwYqyBuI Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  16. Spatial Association Rules Spatial Association Rules A special reference spatial feature Transactions are defined around instance of special spatial feature Item-types = spatial predicates Example: Table 7.5 (pp. 204) Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  17. Co-location rules vs. traditional association rules Skip in 2023 Association rules Co-location rules Underlying space discrete sets continuous space item-types item-types events /Boolean spatial features collection Transaction (T) Neighborhood (N) prevalence measure support participation index conditional probability metric Pr.[ A in T | B in T ] Pr.[ A in N(L) | B at location L ] Participation index = min{pr(fi, c)} Where pr(fi, c) of feature fi in co-location c = {f1, f2, , fk}: = fraction of instances of fi with feature {f1, , fi-1, fi+1, , fk} nearby N(L) = neighborhood of location L Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  18. Spatial Regression Will cover a few slides from this slideshow next: http://www2.cs.uh.edu/~ceick/DM/Spatial Regression.pptx Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  19. Conclusions Spatial Data Mining Spatial patterns are opposite of random Common spatial patterns: location prediction, feature interaction, hot spots, geographically referenced statistical patterns, co-location, emergent patterns, SDM = search for unexpected interesting patterns in large spatial databases Spatial patterns may be discovered using Techniques like classification, associations, clustering and outlier detection New techniques are needed for SDM due to Spatial Auto-correlation Importance of non-point data types (e.g. polygons) Continuity of space Regional knowledge; also establishes a need for scoping Separation between spatial and non-spatial subspace in traditional approaches clusters are usually defined over the complete attribute space Knowledge sources are available now Raw knowledge to perform spatial data mining is mostly available online now GIS tools are available that facilitate integrating knowledge from different source (Google Earth, ARCGIS) Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

  20. A Few Links ACM SIGSPATIAL 2023 Home Geographically weighted regression: https://www.bing.com/videos/search?q=Geographically+weighted+regre ssion+video&docid=608004448652240120&mid=25C7E9AF4767BA3EE2E C25C7E9AF4767BA3EE2EC&view=detail&FORM=VIRE Dr. Eick will give a short report about the ACM SIGSPATIAL Conference in the November 21 or November 28 lecture! Ch. Eick: Spatial Data Mining (inspired by a talk given at UH by Shashi Shekhar (UMN))

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#