
Persistent Storage Solutions in OpenNebula Infrastructure
Explore different persistent storage solutions in OpenNebula, including shared partitions, transferred partitions, and block devices like iSCSI. Learn how to provision and manage storage for VMs efficiently.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
OpenNebula in production The infrastructure Stefano Lusso INFN Torino
OpenNebula infrastructure Providing persistent storage Storage infrastructure Network infrastructure 2
OpenNebula Storage Provisioning A Datastore is any storage medium used to store disk images for VMs $ onedatastore list ID NAME SIZE AVAIL CLUSTER IMAGES TYPE DS TM 0 system_servic 1.8T 91% Services 0 sys - shared 1 default 2T 68% - 48 img fs shared 2 files 115.1G 22% - 15 fil fs ssh 100 cached_qcow 2T 68% Workers 12 img fs qcow2 101 cached_raw 2T 68% Workers 2 img fs shared 103 persistent_da 115.1G 22% - 3 img fs ssh 104 persistent_da 2.2T 45% - 6 img fs ssh 105 persistent_da 2.1T 34% - 19 img iscsi iscsi 106 persistent_da 0M - - 7 img iscsi iscsi 109 system_worker - - Workers 0 sys - ssh 3
OpenNebula Storage Provisioning Persistent storage The storage space should survive the VM lifecycle Different solution adopted: Shared partition (nfs, GlusterFS ) Transferred partition (ssh) Block devices (iSCSI) 4
OpenNebula Storage Provisioning Persistent storage from shared partitions The space is located on a dedicated server A separate infrastructure is needed The shared partition can be mounted at contextualization level 5
OpenNebula Storage Provisioning Persistent storage transferred via ssh The image is copied via ssh time consuming The image is transferred back to the Datastore only after the VM is successfully shut down -- inconsistency may occur 6
OpenNebula Storage Provisioning Persistent storage via iSCSI iSCSI driver provides the possibility of using block-devices for VM images instead of the default file form improved performance It works with linux tgtd It can be forced working with SAN appliance 7
OpenNebula Storage Provisioning Datastore example $ onedatastore show 103 DATASTORE 103 INFORMATION ID : 103 NAME : persistent_data USER : oneadmin GROUP : oneadmin CLUSTER : - TYPE : IMAGE DS_MAD : fs TM_MAD : ssh BASE PATH : /var/lib/one/datastores/103 $ onedatastore show 105 DATASTORE 105 INFORMATION ID : 105 NAME : persistent_data_iscsi USER : oneadmin GROUP : oneadmin CLUSTER : - TYPE : IMAGE DS_MAD : iscsi TM_MAD : iscsi BASE PATH : /var/lib/one/datastores/105 8
OpenNebula Storage Provisioning Persistent storage example Client (WN) VM /home size 10GB /data size 4TB Head Node VM 9
OpenNebula Storage Provisioning Persistent storage example $ onevm show 18082 VM DISKS ID TARGET IMAGE TYPE SAVE SAVE_AS 0 vda ubuntu-server-14.04-v3 file NO - 1 vdb raw - 117.2G fs 3 vdd Giunti-Home 4 vde Giunti-Data file YES - NO - file YES - $ oneimage show Giunti-Home IMAGE 446 INFORMATION ID : 446 NAME : Giunti-Home USER : giunti GROUP : ec2 DATASTORE : persistent_data_iscsi TYPE : DATABLOCK REGISTER TIME : 07/08 12:05:55 PERSISTENT : Yes SOURCE : iqn.2013-10.org.opennebula:one-iscsi.to.infn.it.vg-one.lv-one-446 FSTYPE : xfs SIZE : 19.5G STATE : used RUNNING_VMS : 1 10
OpenNebula Storage Provisioning About iSCSI iSCSI iSCSI (Internet Small Computer System Interface) is a data transport protocol used to carry block-level data over IP networks Initiator Typically a server, host or device driver that initiates(i.e. begins) iSCSI command sequences. Target iSCSI targets break down iSCSI command sequences from initiators and process the SCSI commands. From the hypervisor perspective (KVM) the iSCSI block device is seen as a local disk (eg. sd*) 11
OpenNebula Storage Provisioning iSCSI initiator on KVM [root@one-kvm-63 ~]# ls -alh /var/lib/one/datastores/0/18082 total 5.1G -rw-rw-r-- 1 oneadmin oneadmin 1.1K Aug 6 20:02 deployment.0 -rw-rw---- 1 oneadmin oneadmin 3.7G Sep 7 12:32 disk.0 -rw-rw-r-- 1 oneadmin oneadmin 118G Sep 7 12:17 disk.1 -rw-r--r-- 1 oneadmin oneadmin 384K Aug 6 20:02 disk.2 lrwxrwxrwx 1 oneadmin oneadmin 40 Aug 6 20:02 disk.2.iso -> /var/lib/one/datastores/109/18082/disk.2 lrwxrwxrwx 1 oneadmin oneadmin 117 Aug 7 16:37 disk.3 -> /dev/disk/by-path/ip-192.168.1.202:3260-iscsi- iqn.2013-10.org.opennebula:one-iscsi.to.infn.it.vg-one.lv-one-446-lun-1 lrwxrwxrwx 1 oneadmin oneadmin 104 Aug 7 16:37 disk.4 -> /dev/disk/by-path/ip-192.168.1.215:3260-iscsi- iqn.2004-04.com.qnap:ts-809u:iscsi.giuntidata.c60347-lun-0 ip-192.168.1.202:3260 ( IP addr target ) ( tcp port 3260 ) iqn.2013-10.org.opennebula (iSCSI qualified name) (yyyy-mm.naming-authority) one-iscsi.to.infn.it vg-one.lv-one-446-lun-1 (partition LVM) (hostname) 12
OpenNebula Storage Provisioning iSCSI target on linux Dedicated server with tgtd running and storage partition available scsi3 : mpp virtual bus adapter :version:09.03.0C05.0652,timestamp:Tue Jan 8 05:52:48 CST 2013 scsi 3:0:0:0: Direct-Access SUN VirtualDisk 0760 PQ: 0 ANSI: 5 scsi(3:0:0:0): Enabled tagged queuing, queue depth 30. sd 3:0:0:0: Attached scsi generic sg3 type 0 sd 3:0:0:0: [sda] 4294967296 512-byte logical blocks: (2.19 TB/2.00 TiB) [root@one-iscsi] ~# blkid | grep sda /dev/sda: UUID="1sebp8-e22F-RraL-tYp2-1e6N-boD3-NqwgYq" TYPE="LVM2_member" [root@one-iscsi ~]# fdisk -l /dev/mapper/vg--one-lv--one--446 Disk /dev/mapper/vg--one-lv--one--446: 21.0 GB, 20971520000 bytes 255 heads, 63 sectors/track, 2549 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00000000 [root@one-iscsi ~]# blkid | grep vg--one-lv--one--446 /dev/mapper/vg--one-lv--one--446: UUID="59150b9e-d713-4e71-a3e7-a67215083386" TYPE="xfs 13
OpenNebula Storage Provisioning iSCSI target on NAS The iSCSI LUN are provided by QNAP TS-809U NAS Targets are configured by hand 14
OpenNebula Storage Provisioning iSCSI target on NAS network restrictions This QNAP iSCSI NAS has poor granularity in network filtering. Network restrictions are needed to allow only KVM Hypervisors access 15
OpenNebula Infrastructure Storage Infrastructure 16
OpenNebula Infrastructure Storage Infrastructure Storage requirements: Available && performing shared area for service VM Lots of TB for certain applications Solutions GlusterFS Nfs iSCSI 17
OpenNebula Infrastructure GlusterFS GlusterFS Volume The volume is the collection of bricks and most of the Gluster file system operations happen on the volume. Gluster file system supports different types of volumes based on the requirements. Some volumes are good for scaling storage size, some for improving performance and some for both. A Volume can be Distributed, Replicated, Striped, Dispersed in any intelligent combination. A volume is typically a ext4/xfs partition mounted on the server It is also possible to attach a hot tier to a volume (with promotion/demotion policy) 18
OpenNebula Infrastructure GlusterFS DHT DHT stands for Distributed Hash Table. The way GlusterFS's DHT works is based on a few basic principles: All operations are driven by clients, which are all equal. There are no special nodes with special knowledge of where files are or should be. Directories exist on all subvolumes (bricks or lower-level aggregations of bricks); files exist on only one. Files are assigned to subvolumes based on consistent hashing. Two bricks simplified example: Brick A: Hash range from 0 to 0x7ffffff Brick B: Hash range from 0x80000000 to 0xffffffff A new file is created and corresponding the hash is 0xabad1dea The hash is between 0x80000000 and 0xbfffffff, so the corresponding file's hashed location would be on Brick B 19
OpenNebula Infrastructure GlusterFS Rebalance As bricks are added or removed, or files are renamed, many files can end up somewhere other than at their hashed locations. When this happens, the volumes need to be rebalanced. This process consists of two parts: fix-layout - Calculate new layouts, according to the current set of bricks (and possibly their characteristics) migrate-data - Migrate any "misplaced" files to their correct (hashed) locations 20
OpenNebula Infrastructure GlusterFS self-healing In a replicated volume [minimum replica count 2] it could happen, due to some failures, one or more brick among the replica bricks go down for a while. If user deletes a file, only the online brick will get affected. When the offline brick comes online at a later time, it is necessary to have that file removed from this brick. The synchronization between the replica bricks is called healing. The pro-active self-heal daemon runs in the background, diagnoses issues and automatically initiates self-healing periodically on the files which require healing. 21
OpenNebula Infrastructure Ceph OpenNebula can be integrated with Ceph, a distributed object store and file system. The Ceph datastore driver provides OpenNebula users with the possibility of using Ceph block devices as their Virtual Images. With some limitations: This driver only works with libvirt/KVM drivers. Xen is not (yet) supported. This driver requires that the OpenNebula nodes using the Ceph driver must be part of a running Ceph cluster. The hypervisor nodes need to be part of a working Ceph cluster and the Libvirt and QEMU packages need to be recent enough to have support for Ceph. 22
OpenNebula Infrastructure GlusterFS Volumes GlusterFS is is flexible depending on the layout configuration. It can be used for many purposes: Storage backend for running machines Storage backend for image repository Remote partition for VM (or group of VMs) 23
OpenNebula Infrastructure Torino GlusterFS storage setup KVM-WORKERS SRV-02 master KVM-SERVICES SRV-01 The OpenNebula server and KVM use GlusterFS shared filesystem [oneadmin@one-master ~]$ df -h Filesystem Size Used Avail Use% Mounted on one-san-01:/VmDir 917G 188G 683G 22% /var/lib/one/datastores/0 one-san-01:/PERSISTENT-STORAGE 2.3T 1.2T 1.1T 53% /var/lib/one/datastores/104 one-san-01:/IMAGEREPO 2.0T 643G 1.4T 32% /var/lib/one/datastores/1 one-san-01:/HOMECLOUD 574G 42G 504G 8% /users 24
OpenNebula Infrastructure Torino GlusterFS Volumes Volume Name: PERSISTENT-STORAGE Type: Distribute Volume ID: 1b5751dd-7d21-40dc-a214-15b8e87c6299 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: one-san-02.to.infn.it:/mnt/brick-persistor-ext4 Options Reconfigured: auth.allow: 192.168.5.* Running VM Volume Name: VmDir Type: Replicate Volume ID: 8636b8be-89e2-45b9-aabd-03cf8fa33539 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: one-san-01.to.infn.it:/bricks/VmDir01 Brick2: one-san-02.to.infn.it:/bricks/VmDir02 Options Reconfigured: auth.allow: 192.168.5.* 25
OpenNebula Infrastructure Torino nfs storage setup VM- GROUP VM SRV-A SRV-B If a large amount of data has to be exported to a small number of VM NFS is the simplest way 26
OpenNebula Infrastructure nfs storage and ebtables According to ebtables a MAC address filter is applied at bridge interface level The server exports to a specific virtual network [root@one-dsrv-98 ~]# cat /etc/exports /disk/cmps-home 172.16.219.0/24(rw,fsid=0,no_root_squash) /disk/cmps-data 172.16.219.0/24(rw,no_root_squash) The server must have an IP address in VM client VNET (172.16.219.0) [root@one-dsrv-98 ~]# ifconfig eth0:1 eth0:1 Link encap:Ethernet HWaddr 02:00:AC:10:DB:64 inet addr:172.16.219.100 Bcast:172.16.219.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 And the MAC Address of the interface is forced in the SAME range (02:00:AC:10:DB:**) [root@one-dsrv-98 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 MACADDR=02:00:AC:10:DB:64 ONBOOT=yes IPADDR=192.168.1.98 NETMASK=255.255.248.0 NM_CONTROLLED=no Eth0 192.168.1.98 Eth0:1 172.16.219.100 27
OpenNebula Infrastructure Storage comparison simplicity performance redundancy capacity ssh nfs iSCSI GlusterFS The above considerations depends on the available hardware 28
OpenNebula Infrastructure Network infrastructure Network requirements for a production infrastructure: Performance: many 10Gbps servers should work together. Reliability: modular / redundant switch Isolation: Vlan + ebtables And Different Physical LAN o Public access (193.205.66.128/25) o Private access (192.168.0.0/21 172.16.*.0/24) o Remote administration (10.10.1.0/24) 29
OpenNebula Infrastructure Network infrastructure Network services: DNS + DHCP: dnsmaq + cobbler NAT: iptable, OpenWrt KVM are installed via cobbler and puppet 30
OpenNebula Infrastructure Network infrastructure Despite the OpenNebula design and installation guide recommend to install a dedicated network to manage and monitor the hypervisors, and move image files, in the running installation the service and instance network coexist. Also tenant s Vnet are on the same physical LAN and the Vrouters share the same KVM network interface 31
OpenNebula Infrastructure Public Network Connection Public network KVM KVM KVM-SRV Vrouter Vrouter UsrB UsrA WN WN NAT Private network(s) Network demanding tenants (WN) have NAT access Other tenants have their own Vrouter that shares KVM-SRV network interface 32