Vitrage Project Update: Root Cause Analysis Service in OpenStack

undefined
 
V
i
t
r
a
g
e
 
Project Update, OpenStack Summit Sydney
Ifat Afek,  IRC: ifat_afek
 
Queens Virtual PTG: https://etherpad.openstack.org/p/vitrage-ptg-queens
 
N
o
v
e
m
b
e
r
 
2
0
1
7
undefined
 
W
h
a
t
 
i
s
 
V
i
t
r
a
g
e
?
 
T
h
e
 
O
p
e
n
S
t
a
c
k
 
R
C
A
 
(
R
o
o
t
 
C
a
u
s
e
 
A
n
a
l
y
s
i
s
)
 
s
e
r
v
i
c
e
 
Vitrage is used for organizing, analyzing and expanding
OpenStack alarms & events.
R
o
o
t
 
C
a
u
s
e
 
A
n
a
l
y
s
i
s
 
 
u
n
d
e
r
s
t
a
n
d
 
w
h
a
t
 
c
a
u
s
e
s
f
a
u
l
t
s
 
t
o
 
o
c
c
u
r
D
e
d
u
c
e
d
 
a
l
a
r
m
s
 
a
n
d
 
s
t
a
t
e
s
 
 
r
a
i
s
i
n
g
 
a
l
a
r
m
s
 
a
n
d
m
o
d
i
f
y
i
n
g
 
s
t
a
t
e
s
 
b
a
s
e
d
 
o
n
 
s
y
s
t
e
m
 
i
n
s
i
g
h
t
s
H
o
l
i
s
t
i
c
 
a
n
d
 
c
o
m
p
l
e
t
e
 
v
i
e
w
 
o
f
 
t
h
e
 
s
y
s
t
e
m
undefined
 
P
r
o
j
e
c
t
 
B
a
c
k
g
r
o
u
n
d
 
Founded during the Mitaka release of OpenStack
Became an official OpenStack project on June 1
st
 2016
First official release - Newton
~10 contributors in the last release
undefined
H
i
g
h
 
L
e
v
e
l
 
A
r
c
h
i
t
e
c
t
u
r
e
undefined
 
P
i
k
e
 
F
e
a
t
u
r
e
s
undefined
V
i
t
r
a
g
e
 
p
r
o
v
i
d
e
s
 
i
n
s
i
g
h
t
s
 
a
b
o
u
t
 
t
h
e
 
s
y
s
t
e
m
M
i
s
t
r
a
l
 
i
s
 
a
 
w
o
r
k
f
l
o
w
 
s
e
r
v
i
c
e
V
i
t
r
a
g
e
 
+
 
M
i
s
t
r
a
l
 
-
>
 
A
n
a
l
y
s
i
s
 
&
 
c
o
r
r
e
c
t
i
v
e
 
a
c
t
i
o
n
s
V
i
t
r
a
g
e
 
I
n
t
e
g
r
a
t
i
o
n
 
w
i
t
h
 
M
i
s
t
r
a
l
Zabbix
Vitrage
Mistral
Nova
 
NIC is down
 
Execute migrate_vm workflow
 
Raise
 VM unreachable alarm
 
VM migrate
 
VM migrated to another host
 
Clear
 
VM unreachable alarm
undefined
 
M
a
c
h
i
n
e
 
L
e
a
r
n
i
n
g
 
 
F
i
r
s
t
 
S
t
e
p
s
 
First steps of augmenting Vitrage with machine learning capabilities
Implemented the infrastructure
Implemented a basic Jaccard Correlation algorithm
T
o
d
a
y
:
 
e
v
a
l
u
a
t
o
r
 
t
e
m
p
l
a
t
e
s
 
a
r
e
 
m
a
n
u
a
l
l
y
 
e
d
i
t
e
d
 
b
y
 
t
h
e
 
u
s
e
r
T
o
m
o
r
r
o
w
:
 
A
u
t
o
m
a
t
i
c
a
l
l
y
 
g
e
n
e
r
a
t
e
 
E
v
a
l
u
a
t
o
r
 
T
e
m
p
l
a
t
e
s
 
b
a
s
e
d
 
o
n
 
a
l
a
r
m
 
h
i
s
t
o
r
y
F
i
n
d
i
n
g
 
a
l
a
r
m
 
c
o
r
r
e
l
a
t
i
o
n
 
(
B
 
u
s
u
a
l
l
y
 
a
p
p
e
a
r
s
 
r
i
g
h
t
 
a
f
t
e
r
 
A
)
F
i
n
d
i
n
g
 
a
l
a
r
m
 
c
a
u
s
a
l
i
t
y
 
(
I
s
 
A
 
t
h
e
 
r
o
o
t
 
c
a
u
s
e
 
o
f
 
B
?
)
Algorithm developed by Bell Labs
undefined
 
Vitrage template language extension 
 added ‘not’ operator
SNMP notifications
Keycloak support
Alarm equivalence
 
O
t
h
e
r
 
F
e
a
t
u
r
e
s
undefined
 
Q
u
e
e
n
s
 
F
e
a
t
u
r
e
s
undefined
 
Improve Vitrage high availability support
Lay the ground for alarm and RCA history
Store alarm history using snapshots and events (event sourcing pattern)
Implemented in stages:
Pike 
 collector
Queens 
 persister and player
Queens/Rocky 
 alarm history
 
 
H
i
g
h
 
A
v
a
i
l
a
b
i
l
i
t
y
 
a
n
d
 
A
l
a
r
m
 
H
i
s
t
o
r
y
undefined
 
E
x
i
s
t
i
n
g
:
 
D
e
d
i
c
a
t
e
d
 
n
o
t
i
f
i
e
r
s
 
(
N
o
v
a
,
 
S
N
M
P
,
 
M
i
s
t
r
a
l
)
 
I
n
 
Q
u
e
e
n
s
:
 
A
P
I
 
f
o
r
 
r
e
g
i
s
t
e
r
i
n
g
 
o
n
 
V
i
t
r
a
g
e
 
a
l
a
r
m
s
 
By resource id
By alarm name
By regular expression
HTTP callback upon alarm
 
 
C
o
n
f
i
g
u
r
a
b
l
e
 
N
o
t
i
f
i
c
a
t
i
o
n
s
undefined
 
Resource equivalence
Two datasources report the same resource
How to indicate the equivalency?
What if one datasource removes the resource?
 
Aggregation: APIs should return a semi-merged resource
Design in progress
 
 
E
q
u
i
v
a
l
e
n
c
e
 
a
n
d
 
A
g
g
r
e
g
a
t
i
o
n
c
o
m
p
u
t
e
-
0
A
V
A
I
A
L
A
B
L
E
c
o
m
p
u
t
e
-
0
S
U
B
O
P
T
I
M
A
L
 
E
q
u
i
v
a
l
e
n
t
c
o
m
p
u
t
e
-
0
S
U
B
O
P
T
I
M
A
L
A
P
I
 
c
a
l
l
 
N
o
v
a
 
D
i
s
c
o
v
e
r
y
 
a
g
e
n
t
 
A
g
g
r
e
g
a
t
e
d
 
d
i
s
p
l
a
y
undefined
 
H
o
s
t
 
d
o
w
n
 
a
l
a
r
m
 
-
>
 
d
e
d
u
c
e
 
t
h
a
t
 
i
n
s
t
a
n
c
e
 
i
s
 
d
o
w
n
I
n
s
t
a
n
c
e
 
d
o
w
n
 
a
l
a
r
m
 
-
>
 
s
u
s
p
e
c
t
 
t
h
a
t
 
h
o
s
t
 
i
s
 
d
o
w
n
Could be more than one suspects
 
Still under design and requirement definition
How to verify that a suspect is a real alarm?
When to clear a suspect alarm?
 
 
 
 
P
r
o
a
c
t
i
v
e
 
R
C
A
H
o
s
t
D
o
w
n
I
n
s
t
a
n
c
e
D
o
w
n
 
D
e
d
u
c
e
 
S
u
s
p
e
c
t
 
R
u
n
D
i
a
g
n
o
s
t
i
c
s
undefined
 
Parallel evaluation of Vitrage templates
Integration with OPNFV Doctor
SNMP parsing service
Templates CRUD
Discovery agent
 
O
t
h
e
r
 
F
e
a
t
u
r
e
s
undefined
 
W
e
 
a
r
e
 
l
o
o
k
i
n
g
 
f
o
r
 
c
o
n
t
r
i
b
u
t
o
r
s
!
 
Vitrage 
wiki page: 
https://wiki.openstack.org/wiki/Vitrage
Vitrage IRC channel: #openstack-vitrage
OpenStack mailing list 
 use [vitrage] tag
undefined
 
Q
&
A
 
Thank you!
 
o
p
e
n
s
t
a
c
k
 
o
p
e
n
s
t
a
c
k
 
O
p
e
n
S
t
a
c
k
F
o
u
n
d
a
t
i
o
n
Slide Note
Embed
Share

Vitrage is an OpenStack service for organizing, analyzing, and expanding alarms and events, providing a holistic view of the system. Founded in Mitaka release, Vitrage became an official project in 2016 with a focus on Root Cause Analysis. It integrates with Mistral for workflow insights and is advancing towards incorporating machine learning capabilities for alarm correlation and causality. Check out the latest updates and features of Vitrage in the OpenStack ecosystem.

  • OpenStack
  • Root Cause Analysis
  • Vitrage Project
  • Mistral Integration
  • Machine Learning

Uploaded on Aug 30, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. November 2017 Vitrage Project Update, OpenStack Summit Sydney Ifat Afek, IRC: ifat_afek Queens Virtual PTG: https://etherpad.openstack.org/p/vitrage-ptg-queens

  2. What is Vitrage? The OpenStack RCA (Root Cause Analysis) service Vitrage is used for organizing, analyzing and expanding OpenStack alarms & events. Root Cause Analysis understand what causes faults to occur Deduced alarms and states raising alarms and modifying states based on system insights Holistic and complete view of the system

  3. Project Background Founded during the Mitaka release of OpenStack Became an official OpenStack project on June 1st2016 First official release - Newton ~10 contributors in the last release

  4. High Level Architecture API, CLI, UI Graph Machine Learning Notifications To External Systems External Projects and Monitors Logic Templates

  5. Pike Features

  6. Vitrage Integration with Mistral Vitrage provides insights about the system Mistral is a workflow service Vitrage + Mistral -> Analysis & corrective actions VM migrated to another host NIC is down Execute migrate_vm workflow VM migrate Zabbix Vitrage Mistral Nova Raise VM unreachable alarm Clear VM unreachable alarm

  7. Machine Learning First Steps First steps of augmenting Vitrage with machine learning capabilities Implemented the infrastructure Implemented a basic Jaccard Correlation algorithm Today: evaluator templates are manually edited by the user Tomorrow: Automatically generate Evaluator Templates based on alarm history Finding alarm correlation (B usually appears right after A) Finding alarm causality (Is A the root cause of B?) Algorithm developed by Bell Labs X Y Event Y real start time can be before or after X Event Y real end time can be before or after X

  8. Other Features Vitrage template language extension added not operator SNMP notifications Keycloak support Alarm equivalence

  9. Queens Features

  10. High Availability and Alarm History Improve Vitrage high availability support Lay the ground for alarm and RCA history Store alarm history using snapshots and events (event sourcing pattern) Implemented in stages: Pike collector Queens persister and player Queens/Rocky alarm history

  11. Configurable Notifications Existing: Dedicated notifiers (Nova, SNMP, Mistral) In Queens: API for registering on Vitrage alarms By resource id By alarm name By regular expression HTTP callback upon alarm

  12. Equivalence and Aggregation Resource equivalence Two datasources report the same resource How to indicate the equivalency? What if one datasource removes the resource? Aggregation: APIs should return a semi-merged resource Design in progress Nova Discovery agent Aggregated display Equivalent API call compute-0 AVAIALABLE compute-0 SUBOPTIMAL compute-0 SUBOPTIMAL

  13. Proactive RCA Host down alarm -> deduce that instance is down Instance down alarm -> suspect that host is down Could be more than one suspects Run Diagnostics Host Down Still under design and requirement definition How to verify that a suspect is a real alarm? When to clear a suspect alarm? Deduce Suspect Instance Down

  14. Other Features Parallel evaluation of Vitrage templates Integration with OPNFV Doctor SNMP parsing service Templates CRUD Discovery agent

  15. We are looking for contributors! Vitrage wiki page: https://wiki.openstack.org/wiki/Vitrage Vitrage IRC channel: #openstack-vitrage OpenStack mailing list use [vitrage] tag

  16. Q&A Thank you! openstack @OpenStack openstack OpenStackFoundation

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#