Advanced Cloud Computing Solutions with DIRAC Services
Explore advanced cloud computing solutions offered by DIRAC services at IN2P3, including maintenance, operation, VM scheduling, and contextualization. Learn about dynamic VM spawning, cloud endpoint abstraction, and virtual machine monitoring for efficient resource allocation. Stay updated on the latest developments in cloud computing technology to enhance your computing infrastructure.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Services FG-DIRAC Maintenance, operation Practically all the DIRAC@IN2P3 members are involved How this can be presented to the benefit of DIRAC@IN2P3 project Testing ground ? DIRAC4EGI CPPM together with UB and Cyfronet offered to maintin the service Awaiting the EGI answer Should DIRAC@IN2P3 be involved ? Playing ground for various activities, e.g. cloud management, COMDIRAC, data management FG-DIRAC beyond France-Grilles Merge FG-DIRAC and DIRAC4EGI Keep logically separate but technically unique Service administration tools should be further developed Part of the DIRAC@IN2P3 contract ? 2
Clouds VM scheduler developed for Belle MC production system Dynamic VM spawning taking Amazon EC2 spot prices and Task Queue state into account Discarding VMs automatically when no more needed The DIRAC VM scheduler by means of dedicated VM Directors is interfaced to OCCI compliant clouds: OpenStack, OpenNebula Apache-libcloud API compliant clouds Amazon EC2 4
VMDIRAC 2 VM submission Cloud endpoint abstraction Implementation Apache-libcloud ROCCI EC2 CloudDirector similar to SiteDirector ToDo Cloud endpoint testing/monitoring tools for site debugging Follow the endpoint interface evolution 5
VMDIRAC 2 VM contextualization (current) Standard minimal images No DIRAC proper images, no image maintenance costs, but Cloudinit mechanism only Using a passwordless certificate passed as user data mardirac.in2p3.fr host certificate Using bootstrapping scripts similar to LHCb Vac/Vcycle Using pilot 2.0 On the fly installation of DIRAC, CVMFS, Takes time, can be improved with custom images Starting VirtualMachineMonitorAgent Monitor and report the VM state, VM heartbeats Halt the VM in case of no activity Getting instructions from the central service, e.g. to halt the VM Starting as many pilots as they are cores ( single core jobs ) Starting one pilot for 6
VMDIRAC 2 VM contextualization in the works Bootstrapping scripts shared with the Pilot package introduced recently Single pilot per VM capable to run multiple payloads single or multi-core Same logic as for multi-core queues VMMonitor agent enhanced logic Halting on no activity Signaling pilots to stop Machine Job Features The goal Make a fully functional dynamic cloud computing resource allocation system taking into account group fair shares 7
VMDIRAC 2 VM web application Enhanced monitoring, accounting No Google tools ! VM manipulation by administrators Start, halt, other instructions to the VMMonitor agent Possibility to connect to VM to debug problems Web terminal console On the fly public IP assignment 8
The supercomputer case Multiple HPC centers are available for large scientific communities E.g., HEP experiments started to have access to a number of HPC centers Using traditional HTC applications Filling in the gaps of empty slots Including HPC into their data production systems Advantages of federating HPC centers More users and applications for each centers - better efficiency of usage Elastic usage: users can have more resources for a limited time period Example: Partnership for Advanced Computing in Europe, PRACE Common agreements on sharing HPC resources No common interware for a uniform access 10
The supercomputer case Unlike grid sites, HPC centers are not uniform Different access protocols Different user authentication methods Different batch systems Different connectivity to outside world If we want to include HPC centers into a common infrastructure we have to find a way to overcome these differences Pilot agents can be very helpful here Needs effort from both interware and HPC center sides 11
HPC example Pilot submitted to the batch system through an (GSI)SSH tunnel Pilot communicates with the DIRAC service through the Gateway proxy service Output upload to the target SE through the SE proxy 12
Co-design problem of distributed HPC Common requirements for HPC Outside world connectivity User authentication SSO schema with federated identity providers Users representing whole communities Application software provisioning Monitoring, accounting Can be delegated to the Interware level Support from interware Common model for HPC resources description Algorithms for HPC workload management with more complex payload requirements specification Uniform user interface Support from applications Allow running in multiple HPC centers e.g. standardized MPI libraries Granularity 13
Towards Open Distributed Supercomputer Infrastructure A common project involving several supercomputer centers Lobachevsky, NNU HybriLIT, JINR, Dubna CC/IN2P3, Lyon Mesocenter, AMU, Marseille LRZ, The goal is to provide necessary components to include supercomputers into a common infrastructures Together with other types of resources Based on the DIRAC interware technology Several centers are already connected Simple grid -like applications, multi-core applications Multi-processor, multi-node applications are in the works 14
Publications Workflows High level workflow treatment Metadata in workflows Big Data ?? HPC WMS for HPC ( reservation, masonry, multi-core, multi-host ) WMS for hybrid HPC/HTC/Cloud systems Clouds Managing cloud resources with community policies/shares/quotas COMDIRAC Interface to a distributed computer ( FSDIRAC included ? ) 15