Transitioning Proof of Concept Services to Production at Scale with MMG in ELIXIR

Slide Note
Embed
Share

Explore the journey of MMG from proof of concept to production services within ELIXIR, focusing on the challenges faced, such as resource allocation, technology integration, and service stability. Discover the META-pipe analysis service, its architecture, front-end technical solutions, and policies. Gain insights into future plans and other MMG/WP6 activities as they navigate the landscape of service deployment and platform integration in a research infrastructure setting.


Uploaded on Dec 15, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. MMG: from proof-of-concept to production services at scale Lars Ailo Bongo (ELIXIR-NO, WP6) WP4 F2F, 8-9 February 2017, Stockholm, Sweden ELIXIR-EXCELERATE is funded by the European Commission within the Research Infrastructures programme of Horizon 2020, grant agreement number 676559. www.elixir-europe.org/excelerate

  2. MMG on ELIXIR compute clouds Proof-of-concept: META-pipe on cPouta EMG on Embassy cloud Webinar: ELIXIR Compute Platform Roadmap TODO: Test META-pipe and EMG at scale Deploy META-pipe and EMG production service on cloud Document best practices Integrate META-pipe and EMG with other ELIXIR platforms Incorporate other MMG pipelines such as BioMaS Issues: Missing policies: who is paying for resources? Which resources can different users use? Missing technology: how to do accounting? How to ensure a stable service? 2

  3. Outline META-pipe: User feedback Elixir compute TUCs and other components used Need your help here Future plans Other MMG/WP6 activities EMG presentation to follow 3

  4. META-pipe: analysis as a service 1. Login 2. Upload data 3. Select analysis tool parameters 4. Execute analysis 5. Download results 4

  5. META-pipe: architecture Execution environments Execution Manager (Stallo) Execution Manager (CSC) Web front-end CLI Tool Execution Manager (ICE-2) Execution Manager (anywhere else?) Public API Elixir AAI Auth Storage Job Service - - Tokens Authentication events - - Inputs / uploads Outputs / downloads - - Job queue Execution status 5

  6. META-pipe: front-end technical solutions 1. Login 1. 2. Upload data 1. Incoming! web app library 2. META-pipe storage server 3. Select analysis parameters 1. META-pipe web app 4. Execute analysis 1. META-pipe job queue 2. META-pipe execution environment 5. Download result 1. Incoming! web app library 2. META-pipe storage server Authorization server integrated with ELIXIR AAI 6

  7. META-pipe: front-end policies 1. Login 1. All ELIXIR users can login gives (user, home institution) 1. Who can pay for the resources? 2. Who is allowed to use tools and resources (academic vs industry)? 2. Upload data 1. Data size gives computation requirements 2. Small for free? Medium on pre-allocated? Large as special case? 3. Select analysis parameters / execute analysis 1. Which resource to use? Who decides? Commercial clouds? 2. Scheduling/ prioritization of jobs? 3. Response time guarantees? 4. Who is responsible to maintain and monitor resources? 4. Download result 1. Private vs (eventually) public? 7

  8. META-pipe (and EMG): bakcend layers () Pipeline tools & DBs META-pipe Pipeline specification Spark program Analysis engine Spark, NFS Cloud setup cPoutaansible playbook 8

  9. META-pipe: cloud execution Pipeline tools & reference DBs: Mostly 3rdparty binaries Hundreds of GB of reference DBs Packaged in META-pipe Jenkins server Not in a container/ VM (no benefits for now) TODO: standardize description/ provenance data reporting (WP4?) TODO: summarize best practices (WP4 / ?) Spark program Regular spark program + abstractions/interfaces for running 3rdparty binaries TODO: better error detection, logging, and handling (WP6) TODO: more secure execution (WP6/ WP4) TODO: accounting and payment (WP4) TODO: use our approach for other pipelines? (WP4) 9

  10. META-pipe: cloud execution Spark, NFS execution environment: Standalone Spark NFS since some tools need a shared file system TODO: optimize execution environments (WP6/WP4) TODO: test scalability (WP6/ WP4) ?: integrate META-pipe storage server with ELIXIR storage & transfer cPoutaansible playbook Setup Spark and NFS execution environment on cPoutaOpenStack Ongoing work: setup execution environment on Open Nebula (CZ) TODO: port to other clouds (WP4?) TODO: provide best practice guidelines (WP4) TODO: long term maintaining of setup tools (?) 10

  11. WP6 deliverables The comprehensive metagenomics standards environment Paper to be submitted on Friday Provenance of sampling standard Provenance of sequencing standard Provenance of analysis best practices Archiving of analysis discussion Marine metagenomics portal (MMP) https://mmp.sfb.uit.no/ Marine reference databases (MarRef, MarDB, MarCat) META-pipe used to process data for MarCat 11

  12. WP6 deliverables MMG analysis pipelines: August 2018 Test META-pipe and MMG at scale Deploy META-pipe and MMG on ELIXIR compute clouds Evaluation of tools Synthetic benchmark metagenomes Federated search engine Training and workshops Metagenomics data analysis, 3-6 April 2017, Helsinki, Finland Metagenomics data analysis, ?, ?, Portugal 12

  13. BioMaS pipeline on INDIGO-Datacloud BioMaSis a taxonomical classification pipeline (ELIXIR-IT) Provided as an on-demand Galaxy instance Based on INDIGO-Datacloud 13

  14. Pyttipanna Who is the user of cloud services? Pipeline providers? End-users? Data transfer vs storage vs AAI 3 services? 1 distributed file storage? EMG cloud proof-of-concept = Plant use case Setup VMs, transfer data, allow user to run analyses 14

  15. Summary 2 MMG pipelines can be run on ELIXIR clouds Need resources to test at scale Need policies and TUCs (21 and 22) for production use of clouds 15

  16. TUCs TUC1/ TUC3 (Federated ID/ ELIXIR Identity): Give access to service Get information needed for accounting and payment TUC2 (Other ID): Give access to non-European-academic users TUC4 (Cloud IaaS Services) Cloud providers that can run execution environment TUC5 (HTC/ HPC cluster) Run batch jobs to produce reference databases TUC6 (PRACE cluster) We do not need PRACE scale resources 16

  17. TUCs TUC7 (Network file storage) Not provided (we setup NFS as part of execution environment) TUC8 (File transfer) Not needed (file transfer time is low) TUC9/ TUC11 (Infrastructure service directory/ registry) Not needed TUC10 (Credential translation) Not needed TUC11 (Service access management) Needed to maintain user submitted data 17

  18. TUCs TUC12/13 (Virtual machine library/ container library) We provide analysis as a service VMs/ containers useful for visualization tools TUC14 (Module library) We have META-pipe in a deployment server TUC15 (Data set replication) Not needed (our datasets are small) TUC17 (Endorsed ) User submitted data management TUC18 (Cloud storage) Replace META-pipe storage server 18

  19. TUCs TUC19 (PID and metadata registry) Provide in reference databases? TUC20/23 (Federated cloud/HPC/HTC) Not exposed to our end-users TUC21 (Operational integration) Service availability monitoring is needed TUC22 (Resource accounting) Is very much needed 19

Related


More Related Content