Highly Available ESGF Services for the Copernicus Climate Data Store

Slide Note
Embed
Share

The Copernicus Climate Data Store (CDS) is a crucial component of the Copernicus Climate Change Service (C3S) operated by ECMWF for the European Union. It offers a unified interface for climate-related data and supports various sectors with key climate change indicators. To ensure the availability of user-facing services, a geographically distributed and load-balanced architecture is implemented, allowing for high uptime and minimal downtime. The current architecture includes load balancing using DNS across multiple sites and replication mechanisms to achieve reliability and performance.


Uploaded on Sep 17, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Highly Available ESGF Services for the Copernicus Climate Data Store Matt Pryor, Phil Kershaw, Alan Iwi (CEDA) Sebastien Gardoll (IPSL) Carsten Ebrecht (DKRZ) Luca Cinquini (NASA JPL/UCAR) ESGF Container Working Group ESGF F2F, Washington D.C. December 2018

  2. Contents Context What is the Copernicus Climate Data Store? Requirements for data discovery and download services Load-balanced Architecture Overview Challenges and Compromises Containerised ESGF services Motivation Current state Challenges and Solutions Future work

  3. CONTEXT

  4. Context What is the Copernicus Climate Data Store? The Climate Data Store (CDS) is part of the Copernicus Climate Change Service (C3S) C3S is operated by ECMWF on behalf of the European Union It aims to provide key indicators of climate change drivers, supporting all sectors The CDS provides a single, freely available interface to a range of climate-related observations and simulations Wide range of data sources from many participating organisations In-situ observations, models, reanalyses, satellite products

  5. Context Requirements for data discovery and download services CEDA, IPSL and DKRZ to provide quality-controlled subset of CMIP5 for use with the CDS using ESGF services User-facing services (e.g. search and download) must be highly available at a single set of URLs >= 98% uptime, ~7 days downtime per year Publishing not subject to this restriction No single site can meet this requirement Geographically distributed, load-balanced service is required Not the same as the traditional federated approach Some inconsistency is accepted as a trade-off for high availability

  6. CURRENT ARCHITECTURE

  7. Load-balanced Architecture Overview End User Separate master index node for publishing Publishing does not have to be highly available Replication to slaves is turned off during publishing DNS Service Data node and slave index node at each site Data replication using Synda CEDA IPSL DKRZ DNS load-balancing across sites Each DNS query returns an A record for an available site at random Available sites determined by health check Short time-to-live (TTL) means clients perform lookups regularly No need for proxy server (which is a single point of failure) Cloud-based DNS service (Amazon Route 53) Slave Index Node Slave Index Node Slave Index Node Data Node Data Node Data Node Synda Synda Master Index Node Publisher Normal operation Publication Data replication

  8. Load-balanced Architecture Challenges and Compromises To maintain high availability when publishing, some consistency must be sacrificed Data may be available for download via THREDDS at one site but not at others Slave indexes may be inconsistent after publication to the master index Data replication via Synda needs to target a specific data node Requires modifying Solr records after initial publication Non-deterministic catalog paths generated during publication Patch from Alan Iwi (CEDA) uses DRS in path instead of an integer DNS load-balancing is not perfect Reliant on clients to respect TTL for correct behaviour Reliant on third-party service (running a DNS server is difficult) Sophisticated algorithms are a lot more expensive on cloud-based providers Sophisticated health checks are also more expensive

  9. CONTAINERISED ESGF SERVICES

  10. Containerised ESGF Services Motivation for Containers Containers simplify installation A container encapsulates an application and its dependencies as a single unit No more dependency hell Process Space Libraries Libraries Containers increase confidence A container is packaged once and used multiple times Same code in test and production Guest OS Application Guest OS Application Containers increase portability A container can run anywhere there is a Linux kernel Shared Libraries Hypervisor Containers encourage modularity Each container runs a single application Containers work together to provide an integrated system Host Operating System Server Containers allow better usage of resources Higher density than a VM per application More isolation than processes on a shared host Traditional installation Virtualised installation Containerised installation

  11. Containerised ESGF Services Motivation for Kubernetes Containers excel when used with an orchestrator Automated management of containerised applications across a cluster Kubernetes is now the de facto standard Resilience and scaling are core features of the platform Zero downtime rolling upgrade In-cluster service discovery and load-balancing Storage abstraction

  12. Containerised ESGF Services Current State https://github.com/ESGF/esgf-docker All core ESGF services have been containerised Currently no support for GridFTP/Globus, node manager or dashboard MyProxy deprecated in favour of SLCS Single-node deployment using Docker Compose working Kubernetes deployment using Helm charts working Each Tomcat and Django application is fully self-contained SSL termination and client authentication using Nginx proxy Container images built, tested and pushed by Jenkins for every commit to master and devel Thanks to Sebastien Gardoll (IPSL)

  13. Containerised ESGF Services Challenges and Solutions Very different paradigm to traditional monolithic installer Shared configuration files in traditional installer are difficult to untangle for each application Initial implementation by Luca made large steps towards addressing this problem Initial implementation closely followed traditional installer Refactored to be more cloud-native No need for process managers like supervisord Use official base containers where possible Reduce container bloat

  14. Containerised ESGF Services Challenges and Solutions ESGF applications with multiple responsibilities ESGF applications could be refactored to better suit a micro-services architecture Would allow better use of scaling features in Kubernetes SSL client authentication Kubernetes has no native support for SSL client authentication Current solution requires proxy container for SSL handshake Ideally, we would allow Kubernetes to handle ingress Could replace SSL certificates for authentication with OAuth tokens

  15. Containerised ESGF Services Future work More flexible deployment Work is currently underway to support partial deployments Build Tomcat applications from source Pre-built wars are included from ESGF distribution site at build time Should build Tomcat applications from source at a particular version Also useful for testing (e.g. build an image from a dev branch) Implement more of the ESGF test suite for Docker build Feature parity with traditional installer Subject to specific deprecations Automated publication using Kubernetes jobs

  16. QUESTIONS

Related


More Related Content