Anomaly Detection Using Process Behavior Charts

Anomaly Detection Using Process
Behavior Charts
Jeff LaMar
2023 Iowa & Nebraska SAS Users Group
May 22, 23
Topics Covered
2
Philosophy of Understanding Variation & Key Principals
Process Behavior Chart Overview
Why three-sigma limits
How to prepare data for charting in SAS
How to Utilize SAS for Process Behavior Charts - Example
charts
Proc Shewhart SAS options
Topics Covered
3
Creating output and input SAS data sets from charts
Run Tests for special causes
Proc Shewhart Code Snippet Cheat Sheets
How to execute a plan for Anomaly Detection and Reporting
Processes with multiple phases
Philosophy of Understanding
Variation & Key Principals
4
Notes: Much of this material can be attributed to Walter Shewhart.  He was
often known as the “father of statistical quality control” and introduced the
concept that data can contain both signal and noise.
Walter A. Shewhart – Wikipedia
In addition, many discussion points were obtained from the book:
“Understanding Variation the Key to Managing Chaos” 2
nd
 Ed. By Donald J.
Wheeler
Concepts
Before information can be useful, it must be analyzed, interpreted, and
assimilated.  In short, raw data has to be digested before it can be useful
No comparison between two values can be global
.  A simple comparison
between the current figure and some previous value cannot fully capture and
convey the behavior of any time series.  Yet, comparisons to the current value
to a previous value are the most common type of comparison encountered
See a snapshot from a typical Production Report on next page:
Philosophy of Understanding
Variation & Key Principals
5
Can you tell what’s going on here with just a glance?
On 7/2, shift 1 produced 2,701 good sticks and then produced 2,498 on 7/3.  Can you really make an accurate
assessment based on just those two data points?  Should shift 1 get “scolded” for their performance on 7/2?
Good?
Bad?
6
Concepts
It is very difficult to digest information in a table of numbers (see table above).  Numbers that are
not easily assimilated are generally hard to communicate to others
Time Series plots communicate the content of a data set more quickly and completely than do
tables of values.  See Below:
Philosophy of Understanding
Variation & Key Principals
Potential Issues:
Scale could show skewed results.
Could misinterpret ups and downs as
exceptional variation.
7
Concepts
Walter Shewhart came up with two principles for understanding data:
What does this mean?
Stop reporting comparisons between pairs of values except as part of a broader comparison
Start using graphs to present current values in context
No matter what the data looks like, you must always use some method of analysis to come up with an
interpretation of the data
i.e. you can’t look at the data and say something like “well, this week appears to be worse than last week, don’t
just stand there, do something!”
or “hey, the numbers look good this week, let’s take everyone out for a pizza!”
This off and on-again approach is detrimental to continual improvement
Principle # 1
No data have meaning apart from their context
Philosophy of Understanding
Variation & Key Principals
8
Concepts
What does this mean?
One of the most dangerous things to do is a comparison to averages
Whenever values are compared to averages,  it causes difficulties for managers to come up with explanations of why a
value say is like 5% below the average
When comparing to averages, you will find the current value to be either “above average” or “below average”.  Basically
you will be above average half the time and below average half the time.  Did you know that HALF the doctors each
year fall in bottom 50% of their class!
So what should we do?
Basically, we analyze numbers to know when a change has occurred in our processes/systems.
However, the KEY POINT here is that numbers can change while the process does not (i.e. 
there is always some kind of
variation
 in the data)
Some variation is routine, run-of-the-mill and is expected, even when the process is stable
The key is to detect exceptional variation
, which is outside of the bounds of routine and can be interpreted as a process
change
Walter Shewhart solved the problem of separating out noise from exceptional process variation (true anomalies) by
using process behavior charts
Principle # 1
No data have meaning apart from their context
Philosophy of Understanding
Variation & Key Principals
9
Concepts
The Process Behavior Chart begins with the data
plotted in a time Series
A Central line is added as a visual reference for
detecting shifts/trends
Upper and Lower Control limits are computed from
the data
These lines are placed symmetrically on either side
of the central line
The distance from center line is 3 standard
deviations
This allows the ability to filter out the routine
variation
See example chart of call center data for reference
Process Behavior Chart
How do separate out noise from exceptional variation
Philosophy of Understanding
Variation & Key Principals
10
Concepts
Signals of exceptional variation are indicated by points which fall outside the limits (or
by obvious non-random patterns of variation around the central line)
This distinction between signals and noise is the foundation for every meaningful
analysis of time series data
There are two common mistakes people make when analyzing data
Mistake 1: 
Interpreting routine variation as an issue/problem 
– i.e., 
Interpreting noise as if it were
a signal
This mistake can lead to actions which at best inappropriate or worse, contrary to the correct course of
action.  This mistake leads to waste and loss and creates non-value activities and inefficiencies.
Mistake 2:  
Not recognizing when an issue/change has occurred in the process 
– i.e., 
Failing to
detect a signal when it is present
This mistake happens a lot when you apply arbitrary “specifications” to the process.  The process changes
but the values are still within some specification limits so no one notices
Philosophy of Understanding
Variation & Key Principals
Principle # 2
While every data set contains noise, some data sets contain signals.
Therefore, before you can detect a signal within any given data set,
You must first filter out the noise
11
Concepts
Unless you make a distinction between signals and noise, you will
remain unable to properly analyze and interpret data
The 2
nd
 Principal of Understanding Data shows why every effective
data analysis begins by separating the potential signals from the
random noise
Process Behavior charts are the simplest method ever invented to
separate potential signals from probable noise
Nobody tunes in and listens to static on the radio, so why should you
try to gain insights by listening to, and trying to interpret static?
Philosophy of Understanding
Variation & Key Principals
Principle # 2
While every data set contains noise, some data sets contain signals.
Therefore, before you can detect a signal within any given data set,
You must first filter out the noise
12
Why Three-Sigma limits?
Three sigma limits provide an economic balance between:
Interpreting routine variation as an issue
Not recognizing an exceptional variation within the process
Three sigma limits have been empirically proven to work well in practice – it
provides the  sensitivity needed without causing an unacceptable number of false
alarms
The empirical rule also displays robustness – 
the underlying data does NOT have
to be normally distributed
Philosophy of Understanding
Variation & Key Principals
Empirical Rule given a homogeneous set of data:
1.
Roughly 60% to 75% of the data will be located within a distance of one standard deviation on either side of
the mean
2.
Usually 90% to 98% of the data will be located within a distance of two standard deviations on either side of
the mean
3.
Approximately 99% to 100% of the data will be located within a distance of three standard deviations on
either side of the mean
13
Philosophy of Understanding Variation & Key Principals
+/- 
one
 sigma limits across six
different distributions
42.3%
37.1%
31.7%
27.4%
26.2.3
%
13.5%
% Outside
Limits
14
Philosophy of Understanding Variation & Key Principals
+/- 
two
 sigma limits across six
different distributions
Standardized Distributions:
% Outside
Limits
15
Philosophy of Understanding Variation & Key Principals
+/- 
three
 sigma limits across six
different distributions
Again, this demonstrates that three
sigma limits covers non-normal
distributions and can provide
effective action limits when applied
to real world data
Standardized Distributions:
% Outside
Limits
16
Notes:
Process Behavior charts for Individual Values and a Moving Range (Called an
XmR chart)
Can also do a straight “X chart” which does not include the moving range
Charts created in SAS from Proc Shewhart procedure (Detail code provided
for each chart)
SAS product needed for chart creation (SAS/QC)
:
/* To find what SAS products are installed on your system */
proc
 
product_status
; 
run;
You need to have this installed:
For Base SAS Software ...
For 
SAS/QC 
...
Custom version information: 15.2 (Note: version not important)
Practical Visualizations of Process
Behavior Charts - Examples
17
How to Prepare Data for charting:
Preparing data is fairly straightforward
Create a data set that, at minimum, has two columns
1.
One column needs to be the x-axis time series number (i.e. date, day, week, month, year,
number, timestamp, etc.).  It just needs to be in a time series order and you can name it
whatever you like.
2.
The other column represents the values for the metric you want to plot on the y-axis (i.e.
Volume of applications, Total widgets produced, table load time, defect rate)
The key is to summarize your metric at the x-axis level (You might have to
summarize volumes or calculate the defect rate, etc.)
Practical Visualizations of Process
Behavior Charts - Examples
18
How to Prepare Data for charting:
Example dataset for “Good Drumstick Production Totals”
Practical Visualizations of Process
Behavior Charts - Examples
Note: The ONLY columns I really need are Date and Good_Sticks
However, this dataset will allow me to plot other metrics like Defect_pct, Down_time_pct, etc.
19
Practical Visualizations of Process Behavior Charts - Examples
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
title2
 
'Good Drum Stick Production’
;
proc
 
shewhart
 
data
 =
prodhist_day_total;
irchart
 Good_Sticks * Date /
totpanels
=
1
markers
;
run
;
Process running within “Natural Variation”
All data points within control limits
This looks better than the earlier plot!
Moving Range (lower chart) shows differences between data
points
20
Practical Visualizations of Process Behavior Charts – Examples
Individual Measures Chart for “Good Drum Stick” Production
Moving Range Chart removed
Moving Range (mR) Chart Removed
Suppressing the mR chart is optional
Added odstitle (note: “odstitle = none” removes any title within the
chart - without the quotes)
21
Practical Visualizations of Process Behavior Charts – Examples
Defect Rate with data point outside control limits
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
Title2 
'Drum Stick Defect Rate’
;
proc
 
shewhart
 
data
 =
prodhist_day_total2;
irchart
 defect_pct * Date /
totpanels
=
1
markers
odstitle=none
;
run
;
Process 
showing exceptional variation
Moving range also shows exceptional variation
22
Practical Visualizations of Process Behavior Charts – Examples
Chart options – Part 1
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
Title2 
'Drum Stick Defect Rate’
;
proc
 
shewhart
 
data
 =
prodhist_day_total2;
irchart
 defect_pct * Date /
totpanels
=
1
markers
odstitle
=
none
zonelabels
cout
outfill
tableall
;
run
;
Zonelabels – Adds labels A, B, C to zone lines
Cout - Colors Markers and lines if data points are outside the process limits
Outfill - Shades areas between control limits and connected points if outside process limits
Tableall - Creates a basic table of data being charted (see next page)
23
Practical Visualizations of Process Behavior Charts – Examples
Chart options – Part 2
Table produced with “Tableall” option
24
Practical Visualizations of Process Behavior Charts – Examples
Chart options – Part 3a
Creating Output Data Sets - Outlimits
Outlimits = <dataset> - Saves control limits and control limit parameters (1 row produced)
Note the outlimits statement follows the “/” and is part of the option choices
SAS dataset Results for code below:
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
Title2 
'Drum Stick Defect Rate’
;
proc
 
shewhart
 
data
 = prodhist_day_total;
irchart
 defect_pct * Date /
totpanels
=
1
outlimits
 =  outlimits_data
markers
odstitle
=
'Defect Rate'
zonelabels
;
run
;
Creates
outlimits
data set
25
Practical Visualizations of Process Behavior Charts – Examples
Chart options – Part 3b
Reading in the data set created from Outlimits option
In this example, the outlimits data set is read in from defect rates calculated from 12 historical days of  “in control” data
These limits are now going to be applied to future results (LCL=5.82, mean=9.08, UCL=13.8)
Note the limits statement follows the “data =“ portion of the procedure statement and precedes the semi-colon
Original outlimits output:
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
title2
 
'Good Drum Stick Production’
;
proc
 
shewhart
 
data
 = prodhist_day_total2
limits
 = outlimits_data;
irchart
 defect_pct * Date /
totpanels
=
1
markers
odstitle
=
'Defect Rate'
zonelabels
cout
outfill
;
run
;
Reads in
outlimits
data set
26
Practical Visualizations of Process Behavior Charts – Examples
Chart options – Part 4
Creating Output Data Sets - Outtable
The second output Data Set Choice that is useful
Outtable = <dataset> - Saves individual measurements, moving ranges, control limits and “exceeds limits” flags
The output dataset will contains multiple rows (One row for each time series datapoint)
SAS Results below:
27
Practical Visualizations of Process Behavior Charts – Examples
Perform tests for special causes based on non-random run patterns
These tests detect particular nonrandom patterns in the points plotted on the chart
The tests can provide greater sensitivity and useful diagnostic information while incurring a reasonable probability of a false
signal.
You can request any combination of the eight tests by specifying the test indexes with the TESTS= option
1.
TEST2RUN=run-length 
specifies the length of the pattern for Test 2. The run-length values allowed are 7, 8, 9, 11, 14, or 20.
The default run-length is 9.
2.
TEST3RUN=run-length 
specifies the length of the pattern for Test 3. The run-length values allowed are 6,7, and 8. The
default run-length is 6.
28
Practical Visualizations of Process Behavior Charts – Examples
Run tests for special causes
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
title2
 
'Drum Stick Defect Rate'
;
proc
 
shewhart
 
data
 =
prodhist_day_total3;
irchart
 defect_pct * Date /
Tests
=
1
 to 
4
totpanels
=
1
markers
odstitle
=
'Defect Rate with Run
Tests'
zonelabels
cout
outfill
;
run
;
Tests = 1 to 4 – Specifies run tests to apply on data
Test 3
 signaled with 6 data points in a row steadily increasing
Test 1
 signaled with a data point outside of the control limits
29
Proc Shewhart Code Snippet Cheat Sheet – All options discussed
*** Use as SPC template;
%let
 spc_dataset = prodhist_day_total;
%let
 metric = Defect_Pct;
%let
 x_axis = Date;
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
title2
 
'Good Drum Stick Production ‘
;
proc
 
shewhart
 
data
 = &
spc_dataset.
 
/* limits = outlimits_data */
;
irchart
 &
metric.
 * &
x_axis.
 /
nochart
 
/* Does not show either Individuals Measures or Moving Average Charts */
nochart2
 
/* Does not show secondary Moving Average chart */
tableall
 
/* outputs data table to screen */
totpanels
=
1
 
/* outputs chart on one graph */
outlimits
 =  outlimits_data 
/* outputs limits (1 row) that can be read in later */
outtable
 =  outtable_data 
/* outputs data table dataset */
markers 
/* Turn on markers - default = circle */
odstitle
=‘Defect Rate’ 
/* Add title inside the chart */
odstitle
=none 
/* Turn off the automatic title */
zonelabels
 
/* Add abc zone lines */
cout
 
/* Colors Markers and lines if data points are outside the process limits */
outfill
 
/* Shades areas between control limits and connected points if outside process
limits */
tests
=
1
 to 
4
 
/* Specifies run tests (8 of them), Could say tests=1,3,8 to specify certain
combinations */
test2run
=
7
 
/* Specifies run length pattern for Test 2 - Values allowed are 7,8,9,11,14,or
20. (default is 9) */
test3run
=
10
 
/* Specifies run length pattern for Test 3 - Values allowed are 6,7,or 8.
(default is 6) */
;
run
;
30
Proc Shewhart Code Snippet Cheat Sheet
Minimal options but a great start
*** Use as SPC template;
%let
 spc_dataset = prodhist_day_total;
%let
 metric = Defect_Pct;
%let
 x_axis = Date;
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
title2
 
'Good Drum Stick Production ‘
;
proc
 
shewhart
 
data
 = &
spc_dataset.
;
irchart
 &
metric.
 * &
x_axis.
 /
totpanels
=
1
markers
cout
outfill
;
run
;
Note:
You really don’t need the macro variables.
You can insert what you want straight in the procedure
How to Execute a plan for Anomaly Detection and reporting
31
Example plan – pg. 1
************************************************************************************************;
*** Create a dataset with a specific "by variable" with a date (or any other time series variable)
*** Shift is the variable in this case;
************************************************************************************************;
Example Screen shot:
How to Execute a plan for Anomaly Detection and reporting
32
Example plan – pg. 2
************************************************************************************************;
*** Run the Proc Shewhart with the "by variable" and produce output table for later processing;
*** Note: No charts produced at this step, just the output table;
************************************************************************************************;
proc
 
shewhart
 
data
 = prodhist_total3;
by
 shift;
irchart
 defect_pct * Date /
outtable
 =  prodhist_out
nochart
; 
run
;
How to Execute a plan for Anomaly Detection and reporting
33
Example plan – pg. 3
************************************************************************************************;
*** Identify records with Upper limit violations (in latest day) and then create a new dataset;
************************************************************************************************;
data
 upper_violations;
set
 prodhist_out;
by
 shift;
if
 last.shift 
and
 _EXLIM_ = 
'UPPER'
;
run
;
************************************************************************************************;
*** Create dataset that has only those variables with upper limit violations;
************************************************************************************************;
proc
 
sql
;
create
 
table
 prodhist_upper 
as
select
 
a.
*
from
 prodhist_total3 a
join upper violations u
on
 a.shift = u.shift
order
 
by
 a.shift, a.date
;
quit
;
How to Execute a plan for Anomaly Detection and reporting
34
Example plan – pg. 4
************************************************************************************************;
*** Create SPC charts on variables that hit the upper limit in the latest month;
************************************************************************************************;
ods
 
graphics
 
on
;
proc
 
shewhart
 
data
 = prodhist_upper;
by
 shift;
irchart
 defect_pct * Date /
totpanels
=
1
markers
zonelabels
cout
outfill
;
run
;
Processes with multiple phases
35
************************************************************************************************;
*** After plotting a chart, you identified a change based on a shift with defect percent;
*** In this example, the Engineer for the Grinder Machine made process improvements on 7/13;
************************************************************************************************;
Machine
Improvement
Processes with multiple phases
36
************************************************************************************************;
*** Modify input data set to include a phase variable due to the machine improvement;
*** NOTE: Column Needs to be named _PHASE_;
************************************************************************************************;
Processes with multiple phases
37
************************************************************************************************;
*** Run proc shewhart to get limits by phase (note: need to rename _phase to _index_);
*** Need to rename _PHASE_ in output dataset (SAS reads in Phase limits as _INDEX_);
*** Note: Objective here is to create an outlimits table for each of the phases;
************************************************************************************************;
proc
 
shewhart
 
data
 = prodhist_total4;
by
 _PHASE_;
irchart
 defect_pct * Date /
nochart
outlimits
 = phase_limits (
rename
=(_PHASE_=_INDEX_))
;
run
;
Processes with multiple phases
38
************************************************************************************************;
*** Use the phase limits dataset for input and run the proc shewhart chart
*** to create chart showing phases with distinct changes in control limits;
************************************************************************************************;
ods
 
graphics
 
on
;
title
 
'Process Behavior Chart for'
;
title2
 
'Drum Stick Defect Rate’
;
proc
 
shewhart
 
data
 = prodhist_total4 
limits
 = phase_limits;
irchart
 defect_pct * Date /
totpanels
=
1
markers
odstitle
=
'Phases Shown in Chart'
readphase
 = all 
/* Reads all the phases from the input data set */
readindex
 = all 
/* Reads all the control limits from the LIMITS = data set */
phaselegend
 
/* Displays a legend with the phase values */
;
run
;
Processes with multiple phases
39
 
Resources
40
Wheeler, Donald J. “Understanding Variation – The Key to
Managing Chaos” 2
nd
 Ed. Knoxville, TN, SPC Press, 2000
Wheeler, Donald J. “Advanced Topics in Statistical Process
Control – The Power of Shewhart’s Charts”. Knoxville, TN, SPC
Press, 1995
Wheeler, Donald J. “Understanding Statistical Process Control
2
nd
 Ed. Knoxville, TN, SPC Press, 1992
SAS/QC Documentation (see Chap 19. The SHEWHART
Procedure) -  
SAS/QC 15.1 User's Guide
Questions?
Contact Info:
Jeff LaMar
Jeffrey.c.lamar@wellsfargo.com
41
Slide Note
Embed
Share

Explore the philosophy behind understanding variation and key principles in anomaly detection using process behavior charts. Learn about preparing data, creating SAS charts, running tests for special causes, and executing a plan for anomaly detection and reporting processes. The material draws heavily from Walter Shewhart's work and emphasizes the importance of analyzing and interpreting data for effective decision-making.

  • Anomaly Detection
  • Process Behavior Charts
  • SAS
  • Understanding Variation
  • Statistical Quality Control

Uploaded on Oct 01, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Anomaly Detection Using Process Behavior Charts Jeff LaMar 2023 Iowa & Nebraska SAS Users Group May 22, 23

  2. Topics Covered Philosophy of Understanding Variation & Key Principals Process Behavior Chart Overview Why three-sigma limits How to prepare data for charting in SAS How to Utilize SAS for Process Behavior Charts - Example charts Proc Shewhart SAS options 2

  3. Topics Covered Creating output and input SAS data sets from charts Run Tests for special causes Proc Shewhart Code Snippet Cheat Sheets How to execute a plan for Anomaly Detection and Reporting Processes with multiple phases 3

  4. Philosophy of Understanding Variation & Key Principals Notes: Much of this material can be attributed to Walter Shewhart. He was often known as the father of statistical quality control and introduced the concept that data can contain both signal and noise. Walter A. Shewhart Wikipedia In addition, many discussion points were obtained from the book: Understanding Variation the Key to Managing Chaos 2nd Ed. By Donald J. Wheeler Concepts Before information can be useful, it must be analyzed, interpreted, and assimilated. In short, raw data has to be digested before it can be useful No comparison between two values can be global. A simple comparison between the current figure and some previous value cannot fully capture and convey the behavior of any time series. Yet, comparisons to the current value to a previous value are the most common type of comparison encountered See a snapshot from a typical Production Report on next page: 4

  5. Philosophy of Understanding Variation & Key Principals Production History - LaMar Drum Sticks Time Base: Shift, All Shifts 5A Nylon Tip Date Shift Part Number 7/1/2022 Shift 1 5A Nylon Tip 7/1/2022 Shift 1 5A Nylon Tip 7/1/2022 Shift 1 5A Nylon Tip 7/1/2022 Shift 2 5A Nylon Tip 7/1/2022 Shift 2 5A Nylon Tip 7/1/2022 Shift 2 5A Nylon Tip 7/1/2022 Shift 3 5A Nylon Tip 7/1/2022 Shift 3 5A Nylon Tip 7/1/2022 Shift 3 5A Nylon Tip 7/2/2022 Shift 1 5A Nylon Tip 7/2/2022 Shift 1 5A Nylon Tip 7/2/2022 Shift 1 5A Nylon Tip 7/2/2022 Shift 2 5A Nylon Tip 7/2/2022 Shift 2 5A Nylon Tip 7/2/2022 Shift 2 5A Nylon Tip 7/2/2022 Shift 3 5A Nylon Tip 7/2/2022 Shift 3 5A Nylon Tip 7/2/2022 Shift 3 5A Nylon Tip 7/3/2022 Shift 1 5A Nylon Tip 7/3/2022 Shift 1 5A Nylon Tip 7/3/2022 Shift 1 5A Nylon Tip Start Date: 07/01/2022 End Date: 07/03/2022 Total Sticks Good Stick % Defect % 1,267 91.2 1,291 88.9 2,558 90.0 843 91.2 1,460 95.3 2,303 93.8 811 93.6 1,101 87.0 1,912 89.9 1,273 94.3 1,580 94.9 2,853 94.7 251 82.9 570 90.0 821 87.8 1,114 88.7 1,430 93.1 2,544 91.2 1,351 92.7 1,348 92.2 2,699 92.6 Machine Grinder 1 Grinder 2 Total Grinder 1 Grinder 2 Total Grinder 1 Grinder 2 Total Grinder 1 Grinder 2 Total Grinder 1 Grinder 2 Total Grinder 1 Grinder 2 Total Grinder 1 Grinder 2 Total Good Sticks Up time % Down time % 77.2 85.8 81.5 65.0 94.2 79.6 63.4 77.0 70.2 78.3 87.6 83.0 43.3 55.1 49.2 69.9 87.7 78.8 79.1 87.9 83.5 1,155 1,148 2,303 769 1,391 2,160 759 960 1,719 1,201 1,500 2,701 208 513 721 988 1,332 2,320 1,252 1,246 2,498 8.8 11.1 10.0 8.8 4.7 6.2 6.4 13.0 10.1 5.7 5.1 5.3 17.1 10.0 12.2 11.3 6.9 8.8 7.3 7.8 7.4 22.8 14.2 18.5 35.0 5.8 20.4 36.6 23.0 29.8 21.7 12.4 17.1 56.7 44.9 50.8 30.1 12.3 21.2 20.9 12.1 16.5 Good? Bad? Can you tell what s going on here with just a glance? On 7/2, shift 1 produced 2,701 good sticks and then produced 2,498 on 7/3. Can you really make an accurate assessment based on just those two data points? Should shift 1 get scolded for their performance on 7/2? 5

  6. Philosophy of Understanding Variation & Key Principals Concepts Good Drumsticks Production Totals Day: 7/1/2022 7/2/2022 7/3/2022 7/4/2022 7/5/2022 7/6/2022 7/7/2022 7/8/2022 7/9/2022 7/10/2022 7/11/2022 7/12/2022 Totals: 6,182 5,742 6,419 6,200 6,393 6,608 6,684 6,015 6,167 6,002 6,079 6,354 It is very difficult to digest information in a table of numbers (see table above). Numbers that are not easily assimilated are generally hard to communicate to others Time Series plots communicate the content of a data set more quickly and completely than do tables of values. See Below: Potential Issues: Scale could show skewed results. Could misinterpret ups and downs as exceptional variation. 6

  7. Philosophy of Understanding Variation & Key Principals Concepts Walter Shewhart came up with two principles for understanding data: Principle # 1 No data have meaning apart from their context What does this mean? Stop reporting comparisons between pairs of values except as part of a broader comparison Start using graphs to present current values in context No matter what the data looks like, you must always use some method of analysis to come up with an interpretation of the data i.e. you can t look at the data and say something like well, this week appears to be worse than last week, don t just stand there, do something! or hey, the numbers look good this week, let s take everyone out for a pizza! This off and on-again approach is detrimental to continual improvement 7

  8. Philosophy of Understanding Variation & Key Principals Concepts Principle # 1 No data have meaning apart from their context What does this mean? One of the most dangerous things to do is a comparison to averages Whenever values are compared to averages, it causes difficulties for managers to come up with explanations of why a value say is like 5% below the average When comparing to averages, you will find the current value to be either above average or below average . Basically you will be above average half the time and below average half the time. Did you know that HALF the doctors each year fall in bottom 50% of their class! So what should we do? Basically, we analyze numbers to know when a change has occurred in our processes/systems. However, the KEY POINT here is that numbers can change while the process does not (i.e. there is always some kind of variation in the data) Some variation is routine, run-of-the-mill and is expected, even when the process is stable The key is to detect exceptional variation, which is outside of the bounds of routine and can be interpreted as a process change Walter Shewhart solved the problem of separating out noise from exceptional process variation (true anomalies) by using process behavior charts 8

  9. Philosophy of Understanding Variation & Key Principals Concepts Process Behavior Chart Signal How do separate out noise from exceptional variation The Process Behavior Chart begins with the data plotted in a time Series A Central line is added as a visual reference for detecting shifts/trends Upper and Lower Control limits are computed from the data These lines are placed symmetrically on either side of the central line The distance from center line is 3 standard deviations This allows the ability to filter out the routine variation See example chart of call center data for reference 9

  10. Philosophy of Understanding Variation & Key Principals Concepts Principle # 2 While every data set contains noise, some data sets contain signals. Therefore, before you can detect a signal within any given data set, You must first filter out the noise Signals of exceptional variation are indicated by points which fall outside the limits (or by obvious non-random patterns of variation around the central line) This distinction between signals and noise is the foundation for every meaningful analysis of time series data There are two common mistakes people make when analyzing data Mistake 1: Interpreting routine variation as an issue/problem i.e., Interpreting noise as if it were a signal This mistake can lead to actions which at best inappropriate or worse, contrary to the correct course of action. This mistake leads to waste and loss and creates non-value activities and inefficiencies. Mistake 2: Not recognizing when an issue/change has occurred in the process i.e., Failing to detect a signal when it is present This mistake happens a lot when you apply arbitrary specifications to the process. The process changes but the values are still within some specification limits so no one notices 10

  11. Philosophy of Understanding Variation & Key Principals Concepts Principle # 2 While every data set contains noise, some data sets contain signals. Therefore, before you can detect a signal within any given data set, You must first filter out the noise Unless you make a distinction between signals and noise, you will remain unable to properly analyze and interpret data The 2nd Principal of Understanding Data shows why every effective data analysis begins by separating the potential signals from the random noise Process Behavior charts are the simplest method ever invented to separate potential signals from probable noise Nobody tunes in and listens to static on the radio, so why should you try to gain insights by listening to, and trying to interpret static? 11

  12. Philosophy of Understanding Variation & Key Principals Why Three-Sigma limits? Empirical Rule given a homogeneous set of data: 1. Roughly 60% to 75% of the data will be located within a distance of one standard deviation on either side of the mean Usually 90% to 98% of the data will be located within a distance of two standard deviations on either side of the mean Approximately 99% to 100% of the data will be located within a distance of three standard deviations on either side of the mean 2. 3. Three sigma limits provide an economic balance between: Interpreting routine variation as an issue Not recognizing an exceptional variation within the process Three sigma limits have been empirically proven to work well in practice it provides the sensitivity needed without causing an unacceptable number of false alarms The empirical rule also displays robustness the underlying data does NOT have to be normally distributed 12

  13. Philosophy of Understanding Variation & Key Principals Standardized Distributions: % Outside Limits Uniform 42.3% +/- one sigma limits across six different distributions Triangle 37.1% Normal 31.7% Weibull 27.4% Gamma 26.2.3 % Exponential 13.5% 13

  14. Philosophy of Understanding Variation & Key Principals % Outside Limits Standardized Distributions: +/- two sigma limits across six different distributions 0.0% Uniform Triangle 3.8% Normal 4.5% 4.8% Weibull 4.7% Gamma Exponential 5.0% 14

  15. Philosophy of Understanding Variation & Key Principals Standardized Distributions: % Outside Limits +/- three sigma limits across six different distributions 0.0% Uniform Again, this demonstrates that three sigma limits covers non-normal distributions and can provide effective action limits when applied to real world data Triangle 0.0% Normal 0.3% 0.9% Weibull 1.4% Gamma Exponential 1.8% 15

  16. Practical Visualizations of Process Behavior Charts - Examples Notes: Process Behavior charts for Individual Values and a Moving Range (Called an XmR chart) Can also do a straight X chart which does not include the moving range Charts created in SAS from Proc Shewhart procedure (Detail code provided for each chart) SAS product needed for chart creation (SAS/QC): /* To find what SAS products are installed on your system */ procproduct_status; run; You need to have this installed: For Base SAS Software ... For SAS/QC ... Custom version information: 15.2 (Note: version not important) 16

  17. Practical Visualizations of Process Behavior Charts - Examples How to Prepare Data for charting: Preparing data is fairly straightforward Create a data set that, at minimum, has two columns 1. One column needs to be the x-axis time series number (i.e. date, day, week, month, year, number, timestamp, etc.). It just needs to be in a time series order and you can name it whatever you like. 2. The other column represents the values for the metric you want to plot on the y-axis (i.e. Volume of applications, Total widgets produced, table load time, defect rate) The key is to summarize your metric at the x-axis level (You might have to summarize volumes or calculate the defect rate, etc.) 17

  18. Practical Visualizations of Process Behavior Charts - Examples How to Prepare Data for charting: Example dataset for Good Drumstick Production Totals Note: The ONLY columns I really need are Date and Good_Sticks However, this dataset will allow me to plot other metrics like Defect_pct, Down_time_pct, etc. 18

  19. Practical Visualizations of Process Behavior Charts - Examples ods graphics on; title 'Process Behavior Chart for'; title2 'Good Drum Stick Production ; procshewhart data = prodhist_day_total; irchart Good_Sticks * Date / totpanels=1 markers ; run; Process running within Natural Variation All data points within control limits This looks better than the earlier plot! Moving Range (lower chart) shows differences between data points 19

  20. Practical Visualizations of Process Behavior Charts Examples Individual Measures Chart for Good Drum Stick Production Moving Range Chart removed ods graphics on; title 'Process Behavior Chart for'; title2 'Good Drum Stick Production ; procshewhart data = prodhist_day_total; irchart Good_Sticks * Date / nochart2 totpanels=1 markers odstitle='Individuals Measures Chart' ; run; Removes the Moving Range Chart Moving Range (mR) Chart Removed Suppressing the mR chart is optional Added odstitle (note: odstitle = none removes any title within the chart - without the quotes) 20

  21. Practical Visualizations of Process Behavior Charts Examples Defect Rate with data point outside control limits ods graphics on; title 'Process Behavior Chart for'; Title2 'Drum Stick Defect Rate ; procshewhart data = prodhist_day_total2; irchart defect_pct * Date / totpanels=1 markers odstitle=none; run; Process showing exceptional variation Moving range also shows exceptional variation 21

  22. Practical Visualizations of Process Behavior Charts Examples Chart options Part 1 ods graphics on; title 'Process Behavior Chart for'; Title2 'Drum Stick Defect Rate ; procshewhart data = prodhist_day_total2; irchart defect_pct * Date / totpanels=1 markers odstitle=none zonelabels cout outfill tableall ; run; Zonelabels Adds labels A, B, C to zone lines Cout - Colors Markers and lines if data points are outside the process limits Outfill - Shades areas between control limits and connected points if outside process limits Tableall - Creates a basic table of data being charted (see next page) 22

  23. Practical Visualizations of Process Behavior Charts Examples Chart options Part 2 Table produced with Tableall option 23

  24. Practical Visualizations of Process Behavior Charts Examples Chart options Part 3a Creating Output Data Sets - Outlimits Outlimits = <dataset> - Saves control limits and control limit parameters (1 row produced) Note the outlimits statement follows the / and is part of the option choices SAS dataset Results for code below: _VAR_ Defect_pct Date _SUBGRP_ _TYPE_ _LIMITN_ ESTIMATE _ALPHA_ 2 0.002699796 _SIGMAS_ _LCLI_ _MEAN_ 9.075 13.81228575 _UCLI_ _LCLR_ _R_ _UCLR_ _STDDEV_ 3 4.337714253 0 1.781818182 5.820365965 1.579095249 ods graphics on; title 'Process Behavior Chart for'; Title2 'Drum Stick Defect Rate ; procshewhart data = prodhist_day_total; irchart defect_pct * Date / totpanels=1 outlimits = outlimits_data markers odstitle='Defect Rate' zonelabels ; run; Creates outlimits data set 24

  25. Practical Visualizations of Process Behavior Charts Examples Chart options Part 3b Reading in the data set created from Outlimits option In this example, the outlimits data set is read in from defect rates calculated from 12 historical days of in control data These limits are now going to be applied to future results (LCL=5.82, mean=9.08, UCL=13.8) Note the limits statement follows the data = portion of the procedure statement and precedes the semi-colon Original outlimits output: _VAR_ Defect_pct Date _SUBGRP_ _TYPE_ _LIMITN_ ESTIMATE _ALPHA_ 2 0.002699796 _SIGMAS_ _LCLI_ _MEAN_ 9.075 13.81228575 _UCLI_ _LCLR_ _R_ _UCLR_ _STDDEV_ 3 4.337714253 0 1.781818182 5.820365965 1.579095249 ods graphics on; title 'Process Behavior Chart for'; title2 'Good Drum Stick Production ; procshewhart data = prodhist_day_total2 limits = outlimits_data; irchart defect_pct * Date / totpanels=1 markers odstitle='Defect Rate' zonelabels cout outfill ; run; Reads in outlimits data set 25

  26. Practical Visualizations of Process Behavior Charts Examples Chart options Part 4 Creating Output Data Sets - Outtable The second output Data Set Choice that is useful Outtable = <dataset> - Saves individual measurements, moving ranges, control limits and exceeds limits flags The output dataset will contains multiple rows (One row for each time series datapoint) ods graphics on; title 'Process Behavior Chart for'; title2 'Drum Stick Defect Rate'; procshewhart data = prodhist_day_total2; irchart defect_pct * Date / outtable = outtable_data totpanels=1 markers odstitle='Defect Rate ;run; Creates outtable data set SAS Results below: 26

  27. Practical Visualizations of Process Behavior Charts Examples Perform tests for special causes based on non-random run patterns These tests detect particular nonrandom patterns in the points plotted on the chart The tests can provide greater sensitivity and useful diagnostic information while incurring a reasonable probability of a false signal. You can request any combination of the eight tests by specifying the test indexes with the TESTS= option Test Index Pattern Description 1 One point beyond Zone A (outside the control limits) 2 Nine points in a row in Zone C or beyond on one side of the central Line (See Note 1 below) 3 Six points in a row steadily increasing or steadily decreasing (See Note 2 below) 4 Fourteen points in a row alternating up and down 5 Two out of three points in a row in Zone A or beyond 6 Four out of five points in a row in Zone B or beyond 7 Fifteen points in a row in Zone C on either or both sides of the central line 8 Eight points in a row on either or both sides of the central line with no points in Zone C 1. TEST2RUN=run-length specifies the length of the pattern for Test 2. The run-length values allowed are 7, 8, 9, 11, 14, or 20. The default run-length is 9. 2. TEST3RUN=run-length specifies the length of the pattern for Test 3. The run-length values allowed are 6,7, and 8. The default run-length is 6. 27

  28. Practical Visualizations of Process Behavior Charts Examples Run tests for special causes ods graphics on; title 'Process Behavior Chart for'; title2 'Drum Stick Defect Rate'; procshewhart data = prodhist_day_total3; irchart defect_pct * Date / Tests=1 to 4 totpanels=1 markers odstitle='Defect Rate with Run Tests' zonelabels cout outfill ; run; Tests = 1 to 4 Specifies run tests to apply on data Test 3 signaled with 6 data points in a row steadily increasing Test 1 signaled with a data point outside of the control limits 28

  29. Proc Shewhart Code Snippet Cheat Sheet All options discussed *** Use as SPC template; %let spc_dataset = prodhist_day_total; %let metric = Defect_Pct; %let x_axis = Date; ods graphics on; title 'Process Behavior Chart for'; title2 'Good Drum Stick Production ; procshewhart data = &spc_dataset. /* limits = outlimits_data */; irchart &metric. * &x_axis. / nochart /* Does not show either Individuals Measures or Moving Average Charts */ nochart2 /* Does not show secondary Moving Average chart */ tableall /* outputs data table to screen */ totpanels=1 /* outputs chart on one graph */ outlimits = outlimits_data /* outputs limits (1 row) that can be read in later */ outtable = outtable_data /* outputs data table dataset */ markers /* Turn on markers - default = circle */ odstitle= Defect Rate /* Add title inside the chart */ odstitle=none /* Turn off the automatic title */ zonelabels /* Add abc zone lines */ cout /* Colors Markers and lines if data points are outside the process limits */ outfill /* Shades areas between control limits and connected points if outside process limits */ tests=1 to 4 /* Specifies run tests (8 of them), Could say tests=1,3,8 to specify certain combinations */ test2run=7 /* Specifies run length pattern for Test 2 - Values allowed are 7,8,9,11,14,or 20. (default is 9) */ test3run=10 /* Specifies run length pattern for Test 3 - Values allowed are 6,7,or 8. (default is 6) */ ; run; 29

  30. Proc Shewhart Code Snippet Cheat Sheet Minimal options but a great start *** Use as SPC template; %let spc_dataset = prodhist_day_total; %let metric = Defect_Pct; %let x_axis = Date; ods graphics on; title 'Process Behavior Chart for'; title2 'Good Drum Stick Production ; procshewhart data = &spc_dataset.; irchart &metric. * &x_axis. / totpanels=1 markers cout outfill; run; Note: You really don t need the macro variables. You can insert what you want straight in the procedure 30

  31. How to Execute a plan for Anomaly Detection and reporting Example plan pg. 1 ************************************************************************************************; *** Create a dataset with a specific "by variable" with a date (or any other time series variable) *** Shift is the variable in this case; ************************************************************************************************; Example Screen shot: 31

  32. How to Execute a plan for Anomaly Detection and reporting Example plan pg. 2 ************************************************************************************************; *** Run the Proc Shewhart with the "by variable" and produce output table for later processing; *** Note: No charts produced at this step, just the output table; ************************************************************************************************; procshewhart data = prodhist_total3; by shift; irchart defect_pct * Date / outtable = prodhist_out nochart; run; 32

  33. How to Execute a plan for Anomaly Detection and reporting Example plan pg. 3 ************************************************************************************************; *** Identify records with Upper limit violations (in latest day) and then create a new dataset; ************************************************************************************************; data upper_violations; set prodhist_out; by shift; if last.shift and _EXLIM_ = 'UPPER'; run; ************************************************************************************************; *** Create dataset that has only those variables with upper limit violations; ************************************************************************************************; procsql; create table prodhist_upper as select a.* from prodhist_total3 a join upper violations u on a.shift = u.shift order by a.shift, a.date ; quit; 33

  34. How to Execute a plan for Anomaly Detection and reporting Example plan pg. 4 ************************************************************************************************; *** Create SPC charts on variables that hit the upper limit in the latest month; ************************************************************************************************; ods graphics on; procshewhart data = prodhist_upper; by shift; irchart defect_pct * Date / totpanels=1 markers zonelabels cout outfill ; run; 34

  35. Processes with multiple phases ************************************************************************************************; *** After plotting a chart, you identified a change based on a shift with defect percent; *** In this example, the Engineer for the Grinder Machine made process improvements on 7/13; ************************************************************************************************; Machine Improvement 35

  36. Processes with multiple phases ************************************************************************************************; *** Modify input data set to include a phase variable due to the machine improvement; *** NOTE: Column Needs to be named _PHASE_; ************************************************************************************************; 36

  37. Processes with multiple phases ************************************************************************************************; *** Run proc shewhart to get limits by phase (note: need to rename _phase to _index_); *** Need to rename _PHASE_ in output dataset (SAS reads in Phase limits as _INDEX_); *** Note: Objective here is to create an outlimits table for each of the phases; ************************************************************************************************; procshewhart data = prodhist_total4; by _PHASE_; irchart defect_pct * Date / nochart outlimits = phase_limits (rename=(_PHASE_=_INDEX_)) ; run; 37

  38. Processes with multiple phases ************************************************************************************************; *** Use the phase limits dataset for input and run the proc shewhart chart *** to create chart showing phases with distinct changes in control limits; ************************************************************************************************; ods graphics on; title 'Process Behavior Chart for'; title2 'Drum Stick Defect Rate ; procshewhart data = prodhist_total4 limits = phase_limits; irchart defect_pct * Date / totpanels=1 markers odstitle='Phases Shown in Chart' readphase = all /* Reads all the phases from the input data set */ readindex = all /* Reads all the control limits from the LIMITS = data set */ phaselegend /* Displays a legend with the phase values */ ; run; 38

  39. Processes with multiple phases 39

  40. Resources Wheeler, Donald J. Understanding Variation The Key to Managing Chaos 2nd Ed. Knoxville, TN, SPC Press, 2000 Wheeler, Donald J. Advanced Topics in Statistical Process Control The Power of Shewhart s Charts . Knoxville, TN, SPC Press, 1995 Wheeler, Donald J. Understanding Statistical Process Control 2nd Ed. Knoxville, TN, SPC Press, 1992 SAS/QC Documentation (see Chap 19. The SHEWHART Procedure) - SAS/QC 15.1 User's Guide 40

  41. Questions? Contact Info: Jeff LaMar Jeffrey.c.lamar@wellsfargo.com 41

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#