Doctor: Failure Detection and Notification for NFV
This presentation discusses the development of a fault management and maintenance framework for NFV, focusing on failure detection and notification. Key requirements include VIM consistency, resource awareness, immediate notifications, extensible monitoring, and fault correlation. The Doctor demo showcases quick recovery scenarios and fault management sequences in a virtualized infrastructure. Service healing processes and fault management sequences are illustrated for effective fault detection and resolution.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
OPNFV Summit 2015 Doctor: Failure Detection and Notification for NFV Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC 1
Doctor Overview Goal Build fault management and maintenance framework Approach Identify requirement Gap Analysis Implementation work in Upstream (OpenStack) Integration and testing Status Initial Requirement study, architecture design, Gap analysis : Done Collaborative Development: On-going (3 merged Blueprints in OpenStack Liberty) Standardization Sync: On-going (by NFV member efforts, joint meeting) 2
Key Requirements as VIM Consistent Resource Consistent Resource State Awareness State Awareness Immediate Notification Immediate Notification Extensible Monitoring Extensible Monitoring Fault Correlation Fault Correlation 3
Doctor Demo Overview Quick Recovery Switch Act-Sby Application Streaming Server Video Player Manager ACT SBY Virtualized Infrastructure VM-1 Down Virtual Compute Virtual Storage Virtual Network Virtualized Infrastructure Manager (VIM) = OpenStack Reaction Virtualization Layer Detection without Doctor (few minutes) Hardware Resources Detection with Doctor (1 second) Host-A Down 4
Fault Management Sequence App Manager + Viewer 0. Set Alarm Streaming Application Server Manager 5. Notify Error 6-. Action Liberty Liberty 4. Notify all Controller Controller Notifier Controller Alarm Conf. Resource Map Nova Ceilometer Virtualized Infrastructure (Resource Pool) 3. Update State 2. Find Affected Monitor Monitor Inspector 1. Raw Failure Log Monitor Monitor Failure Policy State Reflector 5
Service Healing Process Alarm Alarm Notification Notification Host A Host B Control Control VM0 VM1 VM9 App Manager Streaming Server Streaming Server vNIC vNIC vSwitch vSwitch NIC NIC Data Flow (Before) Data Flow (Before) Video Player Data Flow (After) Data Flow (After) Switch 6
Doctor Demo Screen Demo Operation Demo Operation Console Console Video Player Video Player (with Doctor) (with Doctor) VM List VM List (Horizon) (Horizon) App Manager App Manager Service Control Service Control App Manager App Manager Event/Action Log Event/Action Log VM Egress Stats VM Egress Stats ( (Zabbix Zabbix) ) Video Player Video Player (without Doctor) (without Doctor) 7
Doctor Blueprints in OpenStack Liberty Cycle Using in This Demo Project Blueprint Spec Drafter Developer Status Ceilomete r Ryota Mibu (NEC) Ryota Mibu (NEC) Completed (Liberty) Event Alarm Evaluator New nova API call to mark nova- compute down Tomi Juvonen (Nokia) Roman Dobosz (Intel) Completed (Liberty) Carlos Goncalves (NEC) Tomi Juvonen (Nokia) Completed (Liberty) Support forcing service down Nova Tomi Juvonen (Nokia) Spec approved (Mitaka) Get valid server state Waiting for spec approval (Mitaka) Add notification for service status change Balazs Gibizer (Ericsson) Balazs Gibizer (Ericsson) 9
Doctor BP Detail: Nova Mark Nova-Compute Down External Monitoring Service Client NEW API NEW API to update nova to update nova- -compute service state service state Monitoring compute Force-down API nova api VM service state nova conductor nova compute queue nova DB Hypervisor EXISTING EXISTING (periodic update) (periodic update) vSwitch nova scheduler BMC Host / Machine 10
Doctor BP Detail: Ceilometer - Event Alarm Nova Neutron Cinder Manager event stats sample Notification- driven alarm evaluator EXISTING EXISTING (polling (polling- -based) notification based) NEW Shortcut NEW Shortcut (notification (notification- -based) based) Audit Service 11
Who made this demo? Upstream OSS Community & Developer OpenStack Contributors including Doctor Developers OPNFV Doctor Team Doctor contributors who worked on requirement study, gap analysis and implementation design Doctor PoC Demo Team NTT DOCOMO NEC: Toshiaki Takahashi, Takahiro Suzuki, Ryuji Ishikawa, ... 12
Visit DOCOMO Booth, Visit DOCOMO Booth, PoC PoC Demo Zone Demo Zone 13