Data Mining Anomaly/Outlier Detection

Data Mining

Anomaly/Outlier Detection

Lecture Notes for Chapter 9 (10 first Edition)

Introduction to Data Mining 2

nd

 Edition

by

Tan, Steinbach,

Karpatne,

Kumar

New slides have been added and the original slides have

been significantly modified by

Christoph F. Eick

Lecture Organization COSC 3337

Anomaly/Outlier Detection

1.

Graphic-based Approaches

2.

Model-based

3.

One-Class SVM Approach

only briefly covered

4.

Distance-Based Approaches

5.

Reconstuction Error-based approach

0. Anomaly/Outlier Detection

What are anomalies/outliers?

–

The set of data points that are considerably different than the

remainder of the data

Variants of Anomaly/Outlier Detection Problems

–



–



–

Applications:

–

Credit card fraud detection, telecommunication fraud detection,

network intrusion detection, fault detection, data cleaning, sensor

fusion,…

Anomaly Detection

Assumption:

–

There are considerably more “normal” observations

than “abnormal” observations (outliers/anomalies) in

the dataset

Challenges

–

How many outliers are there in the data?

–

Method is unsupervised



 Validation can be quite challenging (just like for clustering)

–

Finding needle in a haystack

Anomaly Detection Schemes

General Steps

–

Build a profile of the “normal” behavior



Profile can be patterns or summary statistics for the overall population

–

Use the “normal” profile to detect anomalies



Anomalies are observations whose characteristics

differ significantly from the normal profile

Types of anomaly detection

schemes

1.

Graphical

2.

Model-based, relying on parametric models

3.

One-Class SVM Approach

4.

Distance-based

5.

Profile-based Approaches

1. Graphical Approaches

Idea: user identifies outliers by visual inspection

Scatter plot (2-D), Spin plot (3-D)

Limitations

–

Time consuming

–

Subjective

–

Data with higher

   dimensions

Box-Plot Approach for Outlier Detection

Mixture of a graphical and a statistical approach

Observations that are more than



IQR (e.g.



 =1.5) above

or below the inter-quantile range are outliers.

Decent approach for 1D/single attribute outlier detection!

Sad news

: Cannot be used for multi-variate data!

Outlier Detection Example1

Anomaly/Outlier Detection (Second Introduction)

What are anomalies/outliers?

–

The set of data points that are

considerably different than the

remainder of the data

Natural implication is that anomalies are

relatively rare

–

One in a thousand occurs often if you have lots of data

–

Context is important, e.g., freezing temps in July

Can be important or a nuisance

–

10 foot tall 2 year old

–

Unusually high blood pressure

Causes of Anomalies

Data from different classes

–

Measuring the weights of oranges, but a few grapefruit

are mixed in

Natural variation

–

Unusually tall people

Data errors

–

200 pound 2 year old

Object vs. Attribute Anomalies

Many anomalies are defined in terms of a single attribute

–

Height

–

Shape

–

Color

Object anomalies are harder to identify as objects are

usually described by multiple attributes

Can be hard to find an anomaly using all attributes

–

Noisy or irrelevant attributes

–

Object is only anomalous with respect to some

attributes

However, an object may not be anomalous in any one

attribute

General Issues: Anomaly Scoring

Many anomaly detection techniques provide only a binary

categorization

–

An object is an anomaly or it isn’t

–

This is especially true of classification-based approaches

Other approaches assign a score to each object/pont

–

This score measures the degree to which an object is an anomaly

–

This allows objects to be ranked

–

In general, this is the “

preferable approach

”

However, in the end, you often need a binary decision

–

Should this credit card transaction be flagged?

–

Still useful to have a score

How many anomalies are there?

2. Model-Based Anomaly Detection

Build a model for the data and see

–

Unsupervised



Anomalies are those points that don’t fit well



Anomalies are those points that distort the model



Examples:

–

Statistical distribution

–

Clusters

–

Regression

–

Geometric

–

Graph

–

Supervised



Anomalies are regarded as a rare class



Need to have training data

Additional Anomaly Detection Techniques

Proximity-based

–

Anomalies are points far away from other points

–

Can detect this graphically in some cases

Density-based

–

Low density points are outliers

Pattern matching

–

Create profiles or templates of atypical but important

events or objects

–

Algorithms to detect these patterns are usually simple

and efficient

Model-based Statistical Approaches

Fit a parametric model M to the data, capturing the distribution

of the data (e.g., normal distribution)

Apply a statistical test that depends on

–

Data distribution

–

Parameter of distribution (e.g., mean, variance)

–

Number of expected outliers (confidence limit)

Alternatively, rank points by their likelihood with respect to M

Normal Distributions

Statistical Approaches: GMM

1.

Fit a model M to the dataset D; e.g.

–

A Bivariate Gaussian Model

–

A Bivariate Gaussian Mixture Model

Mixture model –

Wikipedia

by running the EM clustering algorithm;

see:

Expectation–maximization algorithm - Wikipedia

2.

Plug each point p into the density function d

of

model M and compute d

(p) or preferably log(d

(p)),

called the

log likelihood

 of p, and add this value as in

a new column ols (“

outlier score”

) to D obtaining D’—

the smaller this value is the more likely p is an outlier

with respect M.

3.

Sort D’ in ascending order—the first record is the

record with the smallest value for log(d

(p))

4.

Remove the top k records from D’

General Idea EM Algorithm

Parameter of K-Means/GMM Models

K-means models are characterized by k centroids

EM/Gaussian Mixture Models are characterized by k Gaussian

with each Gaussian characterized by:

–

Weight of the particular Gaussian

–

Mean value

–

Covariance Matrix

EM-style algorithms:

–

E-Step: Assign objects to clusters (deterministic in the case

of K-means; probabilistic in the case of EM)

–

M-Step: updates the model parameters (e.g. centroids in the

case of K-means; the mixture parameters in the case of EM)

–

Repeat sequences of E-M steps until there is some

convergence

–

Start with an initial assignment of objects to clusters

Statistical/Model-based Approaches:

1.

Fit a model M to the dataset D; e.g.

–

A Bivariate Gaussian Model

–

A Bivariate Gaussian Mixture Model by running

the EM clustering algorithm; see:

Expectation–maximization algorithm -

Wikipedia

2.

Plug each point p into the density function d

of

model M and compute d

(p) or preferably log(d

(p)),

called the

log likelihood

 of p, and add this value as in

a new column ols (“

outlier score”

) to D obtaining

D’—the smaller this value is the more likely p is an

outlier with respect M.

3.

Sort D’ in ascending order—the first record is the

record with the smallest value for log(d

(p))

4.

Perform the remaining tasks using D’

Density-based: LOF approach

For each point, compute the density of its local neighborhood; e.g. use

DBSCAN’s approach

ö

Outliers are points with largest LOF value (measured as point-

density/neighbor densities)

In the NN approach, p

is

not considered as outlier,

while LOF approach find

both p

 and p

as outliers;

moreover, some/all points in

cluster C

 might be

considered as outliers!

Relative Density Outlier Scores

Limitations of Model-based Approaches

Most of the statistical tests are for a single attributes

In many cases, data distribution/model may not be known

For high dimensional data, it may be difficult to estimate the

“true” density function. However, mixtures of Gaussians and

conjunction with EM have been successfully used in practice for

some outlier detection tasks that involve multi-variate data.

However, very high dimensional data are still a challenge!

As alternative to parametric density estimation,

non-parametric

density-based approaches, such as kernel density estimation

have shown some promise; see:

https://en.wikipedia.org/wiki/Kernel_density_estimation

However, these approaches

just provide you with estimated

densities

but not with a ‘true density function” density function;

therefore, they are not truly model based, but rather just density-

based approaches.

Model-based vs  Model-free

Model-based Approaches



Model can be parametric or non-parametric



Anomalies are those points that don’t fit well



Anomalies are those points that distort the model

Model-free Approaches



Anomalies are identified directly from the data without

building a model

Often the underlying assumption is that the

most of the points in the data are normal

General Issues: Label vs Score

Some anomaly detection techniques provide only a

binary categorization

Other approaches  measure the degree to which an

object is an anomaly

–

This allows objects to be ranked

–

Scores can also have associated meaning (e.g., statistical

significance)

3.

One-Class

SVM Approach

for Outlier

Detection

Minimize R and the error resulting from points outside

the sphere—their error is their distance to the sphere.

‘

’

4. Distance-based Approaches

Approach:

–

Compute the distance between every pair of data

points

–

There are various ways to define outliers:



Data points for which there are fewer than

 neighboring

points within a distance



The top n data points whose distance to the kth nearest

neighbor is greatest



The top n data points whose average distance to the k

nearest neighbors is greatest

One Nearest Neighbor - One Outlier

One Nearest Neighbor - Two Outliers

Five Nearest Neighbors - Small Cluster

Five Nearest Neighbors - Differing Density

Reconstruction-Error Based Approaches

Based on assumptions there are patterns in the

distribution of the normal class that can be

captured using lower-dimensional

representations

Reduce data to lower dimensional data

–

E.g. Use Principal Components Analysis (PCA) or

Auto-encoders

Measure the reconstruction error for each object

–

The difference between original and reduced

dimensionality version

Reconstruction Error

Basic Architecture of an Autoencoder

An autoencoder is a multi-layer neural network

The number of input and output neurons is equal

to the number of original attributes.

Strengths and Weaknesses

Does not require assumptions about distribution

of normal class

Can use many dimensionality reduction

approaches

The reconstruction error is computed in the

original space

–

This can be a problem if dimensionality is high

Slide Note

Embed Share

Download

Uncover insights on anomaly and outlier detection in data mining through lecture notes from Chapter 9 of the book "Introduction to Data Mining 2nd Edition" by Tan, Steinbach, Karpatne, Kumar. These notes have been revised by Christoph F, featuring new slides. Explore the updated content to enhance your knowledge in this specialized area of data analysis.

rroge Follow

Uploaded on Mar 09, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Data Mining Anomaly/Outlier Detection Lecture Notes for Chapter 9 (10 first Edition) Introduction to Data Mining 2nd Edition by Tan, Steinbach, Karpatne, Kumar New slides have been added and the original slides have been significantly modified by Christoph F. Eick

Lecture Organization COSC 3337 Anomaly/Outlier Detection Graphic-based Approaches Model-based One-Class SVM Approach only briefly covered Distance-Based Approaches Reconstuction Error-based approach 1. 2. 3. 4. 5. 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

0. Anomaly/Outlier Detection What are anomalies/outliers? The set of data points that are considerably different than the remainder of the data Variants of Anomaly/Outlier Detection Problems Given a database D, find all the data points x D with anomaly scores greater than some threshold t Given a database D, find all the data points x D having the top-n largest anomaly scores f(x) Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D Applications: Credit card fraud detection, telecommunication fraud detection, network intrusion detection, fault detection, data cleaning, sensor fusion, 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Anomaly Detection Assumption: There are considerably more normal observations than abnormal observations (outliers/anomalies) in the dataset Challenges How many outliers are there in the data? Method is unsupervised Validation can be quite challenging (just like for clustering) Finding needle in a haystack 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Anomaly Detection Schemes General Steps Build a profile of the normal behavior Profile can be patterns or summary statistics for the overall population Use the normal profile to detect anomalies Anomalies are observations whose characteristics differ significantly from the normal profile Types of anomaly detection schemes 1. Graphical 2. Model-based, relying on parametric models 3. One-Class SVM Approach 4. Distance-based 5. Profile-based Approaches 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

1. Graphical Approaches Idea: user identifies outliers by visual inspection Scatter plot (2-D), Spin plot (3-D) Limitations Time consuming Subjective Data with higher dimensions 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Box-Plot Approach for Outlier Detection outlier 1.5*IQR IQR Mixture of a graphical and a statistical approach Observations that are more than IQR (e.g. =1.5) above or below the inter-quantile range are outliers. Decent approach for 1D/single attribute outlier detection! Sad news: Cannot be used for multi-variate data! 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Outlier Detection Example1 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Anomaly/Outlier Detection (Second Introduction) What are anomalies/outliers? The set of data points that are considerably different than the remainder of the data Natural implication is that anomalies are relatively rare One in a thousand occurs often if you have lots of data Context is important, e.g., freezing temps in July Can be important or a nuisance 10 foot tall 2 year old Unusually high blood pressure 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Causes of Anomalies Data from different classes Measuring the weights of oranges, but a few grapefruit are mixed in Natural variation Unusually tall people Data errors 200 pound 2 year old 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Object vs. Attribute Anomalies Many anomalies are defined in terms of a single attribute Height Shape Color Object anomalies are harder to identify as objects are usually described by multiple attributes Can be hard to find an anomaly using all attributes Noisy or irrelevant attributes Object is only anomalous with respect to some attributes However, an object may not be anomalous in any one attribute 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

General Issues: Anomaly Scoring Many anomaly detection techniques provide only a binary categorization An object is an anomaly or it isn t This is especially true of classification-based approaches Other approaches assign a score to each object/pont This score measures the degree to which an object is an anomaly This allows objects to be ranked In general, this is the preferable approach However, in the end, you often need a binary decision Should this credit card transaction be flagged? Still useful to have a score How many anomalies are there? 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

2. Model-Based Anomaly Detection Build a model for the data and see Unsupervised Anomalies are those points that don t fit well Anomalies are those points that distort the model Examples: Statistical distribution Clusters Regression Geometric Graph Supervised Anomalies are regarded as a rare class Need to have training data 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Additional Anomaly Detection Techniques Proximity-based Anomalies are points far away from other points Can detect this graphically in some cases Density-based Low density points are outliers Pattern matching Create profiles or templates of atypical but important events or objects Algorithms to detect these patterns are usually simple and efficient 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Model-based Statistical Approaches Fit a parametric model M to the data, capturing the distribution of the data (e.g., normal distribution) Apply a statistical test that depends on Data distribution Parameter of distribution (e.g., mean, variance) Number of expected outliers (confidence limit) Alternatively, rank points by their likelihood with respect to M Data Density Function 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Normal Distributions One-dimensional Gaussian 8 7 0.1 6 0.09 5 0.08 4 Two-dimensional Gaussian 0.07 3 0.06 2 0.05 1 y 0.04 0 0.03 -1 0.02 -2 -3 0.01 -4 probability density -5 -4 -3 -2 -1 0 1 2 3 4 5 x 16 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Statistical Approaches: GMM 1. Fit a model M to the dataset D; e.g. A Bivariate Gaussian Model A Bivariate Gaussian Mixture Model Mixture model Wikipedia by running the EM clustering algorithm; see:Expectation maximization algorithm - Wikipedia 2. Plug each point p into the density function dM of model M and compute dM(p) or preferably log(dM(p)), called the log likelihood of p, and add this value as in a new column ols ( outlier score ) to D obtaining D the smaller this value is the more likely p is an outlier with respect M. 3. Sort D in ascending order the first record is the record with the smallest value for log(dM(p)) 4. Remove the top k records from D GMM-Model 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

General Idea EM Algorithm EM Algorithm Gaussian Mixture Models: http://pypr.sourceforge.net/mog.html http://scikit-learn.org/stable/modules/mixture.html Works like K-means 18 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Parameter of K-Means/GMM Models K-means models are characterized by k centroids EM/Gaussian Mixture Models are characterized by k Gaussian with each Gaussian characterized by: Weight of the particular Gaussian Mean value Covariance Matrix EM-style algorithms: E-Step: Assign objects to clusters (deterministic in the case of K-means; probabilistic in the case of EM) M-Step: updates the model parameters (e.g. centroids in the case of K-means; the mixture parameters in the case of EM) Repeat sequences of E-M steps until there is some convergence Start with an initial assignment of objects to clusters 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Statistical/Model-based Approaches: 1. Fit a model M to the dataset D; e.g. A Bivariate Gaussian Model A Bivariate Gaussian Mixture Model by running the EM clustering algorithm; see:Expectation maximization algorithm - Wikipedia 2. Plug each point p into the density function dM of model M and compute dM(p) or preferably log(dM(p)), called the log likelihood of p, and add this value as in a new column ols ( outlier score ) to D obtaining D the smaller this value is the more likely p is an outlier with respect M. 3. Sort D in ascending order the first record is the record with the smallest value for log(dM(p)) 4. Perform the remaining tasks using D GMM-Model 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Density-based: LOF approach Not covered For each point, compute the density of its local neighborhood; e.g. use DBSCAN s approach In anomaly detection, the local outlier factor (LOF) is an algorithm proposed by Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng and J rg Sander in 2000 for finding anomalous data points by measuring the local deviation of a given data point with respect to its neighbours.[1] Outliers are points with largest LOF value (measured as point- density/neighbor densities) In the NN approach, p2 is not considered as outlier, while LOF approach find both p1 and p2 as outliers; moreover, some/all points in cluster C1 might be considered as outliers! p2 p1 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Relative Density Outlier Scores Not covered 6.85 6 C 5 4 1.40 D 3 1.33 2 A 1 Outlier Score 22 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Limitations of Model-based Approaches Most of the statistical tests are for a single attributes In many cases, data distribution/model may not be known For high dimensional data, it may be difficult to estimate the true density function. However, mixtures of Gaussians and conjunction with EM have been successfully used in practice for some outlier detection tasks that involve multi-variate data. However, very high dimensional data are still a challenge! As alternative to parametric density estimation, non-parametric density-based approaches, such as kernel density estimation have shown some promise; see: https://en.wikipedia.org/wiki/Kernel_density_estimation However, these approaches just provide you with estimated densities but not with a true density function density function; therefore, they are not truly model based, but rather just density- based approaches. 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Model-based vs Model-free Model-based Approaches Model can be parametric or non-parametric Anomalies are those points that don t fit well Anomalies are those points that distort the model Model-free Approaches Anomalies are identified directly from the data without building a model Often the underlying assumption is that the most of the points in the data are normal Introduction to Data Mining, 2nd Edition Tan, Steinbach, Karpatne, Kumar 4/12/2021 24 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

General Issues: Label vs Score Some anomaly detection techniques provide only a binary categorization Other approaches measure the degree to which an object is an anomaly This allows objects to be ranked Scores can also have associated meaning (e.g., statistical significance) Introduction to Data Mining, 2nd Edition Tan, Steinbach, Karpatne, Kumar 4/12/2021 25 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

3. One-Class SVM Approachfor Outlier Detection Consider a sphere with center a and radius R Minimize R and the error resulting from points outside the sphere their error is their distance to the sphere. Lowercase greek xiletter, pronounced ksi C + 2 t min R t subject to + 2 t t t x , 0 a R error More information: http://rvlasveld.github.io/blog/2013/07/12/introduction-to-one-class-support-vector-machines/ 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

One Class SVM with Kernel Functions Again kernel functions/mapping to a higher dimensional space can be employed in which case the class boundary shapes change as depicted. 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

4. Distance-based Approaches Approach: Compute the distance between every pair of data points There are various ways to define outliers: Data points for which there are fewer than p neighboring points within a distance r The top n data points whose distance to the kth nearest neighbor is greatest The top n data points whose average distance to the k nearest neighbors is greatest 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

One Nearest Neighbor - One Outlier D 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

One Nearest Neighbor - Two Outliers 0.55 D 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Five Nearest Neighbors - Small Cluster 2 D 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Five Nearest Neighbors - Differing Density D 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Reconstruction-Error Based Approaches Based on assumptions there are patterns in the distribution of the normal class that can be captured using lower-dimensional representations Reduce data to lower dimensional data E.g. Use Principal Components Analysis (PCA) or Auto-encoders Measure the reconstruction error for each object The difference between original and reduced dimensionality version Covered in more detail in November 2023! Introduction to Data Mining, 2nd Edition Tan, Steinbach, Karpatne, Kumar 4/12/2021 33 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Reconstruction Error Let ? be the original data object Find the representation of the object in a lower dimensional space Project the object back to the original space Call this object ? Reconstruction Error(x x)= x x x x Objects with large reconstruction errors are anomalies Introduction to Data Mining, 2nd Edition Tan, Steinbach, Karpatne, Kumar 4/12/2021 34 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Basic Architecture of an Autoencoder An autoencoder is a multi-layer neural network The number of input and output neurons is equal to the number of original attributes. Introduction to Data Mining, 2nd Edition Tan, Steinbach, Karpatne, Kumar 4/12/2021 35 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Strengths and Weaknesses Does not require assumptions about distribution of normal class Can use many dimensionality reduction approaches The reconstruction error is computed in the original space This can be a problem if dimensionality is high Introduction to Data Mining, 2nd Edition Tan, Steinbach, Karpatne, Kumar 4/12/2021 36 10/10/2023 Eick, Tan,Steinbach,Kapatne, Kumar COSC 3337: Data Science I

Data Mining Anomaly/Outlier Detection

Download Presentation

Presentation Transcript

Related

More Related Content