Fault Localization (Pinpoint) Project Proposal Overview
The Fault Localization (Pinpoint) project proposal aims to pinpoint the exact source of failures within a cloud NFV networking environment by utilizing a set of algorithms and APIs. The proposal includes an overview of the fault localization process, an example scenario highlighting the need for fault identification, details about fault localization APIs, and its application within OpenStack. Additionally, it discusses the relationship with other projects in terms of identifying root causes and correlated failures to enhance fault management and performance monitoring.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Fault Localization (Pinpoint) Project Proposal for OPNFV September 2015 Version 0.8 1
Fault Localization Overview The process of deducing the exact source of a failure from a set of observed indications A set of algorithms A set of APIs Focus on cloud NFV networking Extendable to compute and storage Fault localization is also known as fault isolation, alarm/event correlation, and root cause analysis (RCA) 2
Fault Localization (FL) Example VNF #2 indicates that it is not working (no sessions, no network connectivity etc.) Several causes may result this: iptables, MTU and NIC failure problems The FL process should find the exact source problem ! Failure: Network Function Doesn t work VNF 1 VNF 2 Probable cause: MTU size misconfiguration Probable cause: iptables not configured VM 1 VM 2 vSwitch vSwitch Probable cause: NIC failure Hypervisor Hypervisor N I C N I C N I C ToR Switch ToR Switch 3
Fault Localization APIs User/System Find root cause(s) Find correlated failures Root cause(s) Correlated failures Fault Localization System (Set of analysis methods) Set test Get test-info Get info Get info Set config Get info System OAM tools Active tools like ping, trace etc.) Fault/Performance Information sources Events Alarms Statistics Logs System configuration Expected/desired configuration as known by the CMS System models Layering Dependencies Topology Connectivity Policy 4
Fault Localization in OpenStack User/System Find root cause(s) Find correlated failures Root cause(s) Correlated failures Fault Localization System (Set of analysis methods) Set test Get test-info Get info Get info Set config Get info System OAM tools Active tools like ping, trace etc.) Fault/Performance Information sources Events/ Alarms Statistics Logs Prediction System configuration Expected/desired configuration as known by the CMS System models Layering Dependencies Topology Connectivity Policy Ceilometer/ Monasca/ External Neutron/ Nova/ External Neutron/Nova SDN Controller Neutron/Nova 5
Relationships with other projects(1) User/System Find root cause(s) Find correlated failures Root cause(s) Correlated failures Fault Localization System (Set of analysis methods) Set test Get test-info Get info Get info Set config Get info Yardstick System OAM tools Active tools like ping, trace etc.) Bottleneck Fault/Performance Information sources Events Alarms Statistics Logs System configuration Expected/desired configuration as known by the CMS System models Layering Dependencies Topology Connectivity Policy Doctor Ceilometer/ Monasca Neutron/ Nova/ External Neutron/Nova/ Cinder etc. Neutron/Nova 6
Relationships with other projects (2) Projects underway or being proposed in OPNFV: Doctor: The Doctor project is focused on fault notification but has also some notion of event aggregation. In this context, it can be one of the inputs for the Pinpoint project Yardstick: Configuration verification testing project. Provide a testing frame work and several basic testing methods. These could be used as possible OAM tools framework for the Pinpoint project Bottleneck: This project aims automated testing environment as part of deployment to figure out system bottlenecks and performance in staging phase before deployment. It is oriented to performance and focus on staging phase. 7
Reference in NFV standard Requirement for distributed fault correlation in ETSI GS NFV-REL 001 V1.1.1 - Resiliency Requirements chapter 10.4 8 Fault correlation in NFV
Reference in ONUG RFI Requirements Requirement for fault correlation in Network State Collection, Correlation and Analytics Product/RFI Requirements May,2015 9
Proposed Project Scope VNF/ VNFM Project Scope 7 VIM Fault Localization 1 2 3 Openstack Services Neutron Ceilometer Others 6 Config, OAM, Topology 4 5 Statistics NFVI SDN Controller 10
Proposed Project Scope - cont Focus on networking fault-localization APIs for network connectivity faults Use cases : Service continuity, Network load based placement and migration In scope: Network fault localization requirements in virtual environment Gap analysis for the APIs for the above use cases e.g : API for root-cause of a connectivity problem between VNF/VMs API for OAM tools for Ethernet/IP technologies API to retrieve network topology information API for fault and performance collection engines Active tests and statistics retrieval required for the above use cases Future extensions Extend the APIs for Fault localization requirements for compute and storage Other OAM tools POC that will include simple fault localization analysis logic as reference implementation Extend for upper layers of NFV (along side with OPNFV evaluation) 11
Thank You ! 12