Terra Architecture and Design Overview

Slide Note
Embed
Share

Terra is a platform driven by the Data Biosphere vision to enable an ecosystem of interoperable components across organizations. It utilizes GA4GH standards and serves various roles such as Biomedical Researchers, Custom Apps & Portals, and Data Developers. The architecture is being restructured to support multiple clouds and integration of new resources for enhanced functionality and user experience. Various datasets are co-analyzable on Terra, with design principles focusing on scalability, provenance, and authorization domains for reproducible data science.


Uploaded on Sep 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Terra Architecture and Design Overview March 2020 Alex Baumann - abaumann@broadinstitute.org terra.bio 1

  2. Driven by Data Biosphere vision to enable an ecosystem of interoperable components across orgs 2 DataBiosphere.org

  3. GA4GH standards used wherever possible Tool Repository Service (TRS) enable libraries of pre-packaged shared methods Workflow Execution Service (WES) federated analysis: run reproducible workflows Data Repository Service (DRS) federated storage: access data, wherever it s stored Data Use (DURI) data passport: match access rules to researchers Researcher Identity (DURI) data passport: make researcher credentials portable and more ... terra.bio 3

  4. Terras architecture primarily serves 3 roles Biomedical Researchers Custom Apps & Portals All of Us Workbench Community Workbench Single Cell Portal Tool Data Developers Generators Terra Platform API Data Tool Data Mgmt Tools Mgmt Production & Curation Creation & Publication Workspaces Cloud Services terra.bio 4

  5. Rearchitecture in progress to support multiple clouds and integration of new resources Community Workbench AoU Single Cell Portal Third-party Apps Data Repo UI Terra CLI Applications Workbench App Services Data Library Cromwell RStudio Bioconductor Tools, e.g. Interactive Analysis Terra Connector Jupyter IGV Hail Billing Reports Core Services Terra Platform Workspace Manager Data Repo Manager Analysis Code Repo User Manager Spend Profile Manager Folder Manager Kernel Access Kernel Policy Manager Resource Manager Notification Manager Spend Tracker Activity Logger IAM Cloud terra.bio Layer 5

  6. Many datasets co-analyzable according to research use NHGRI AnVIL terra.bio 6

  7. Design principles and considerations Not involved in billing now - user brings billing, we manage resources Scale is critical (per month: ~2k active users, ~500 users running workflows and notebooks, ~500k workflows per month, >10m CPU hours per month) Provenance is critical to reproducible data science Authorization Domains to protect derived results terra.bio 7

  8. Questions? 8

Related