Terra Architecture and Design Overview

 
Terra Architecture and
Design Overview
 
March 2020
 
 
DataBiosphere.org
 
Driven by Data Biosphere vision to enable an
ecosystem of interoperable components across orgs
 
Tool Repository Service (TRS)
enable libraries of pre-packaged shared methods
Workflow Execution Service (WES)
federated analysis: run reproducible workflows
Data Repository Service (DRS)
federated storage: access data, wherever it’s stored
Data Use (DU
RI
)
data passport: match access rules to researchers
Researcher Identity (
DU
RI)
data passport: make researcher credentials portable
and more ...
 
GA4GH standards used wherever possible
Data
Mgmt
Tools
Mgmt
Workspaces
All of Us
Workbench
Terra Platform API
 
Terra’s architecture primarily serves 3 roles
Data
Production
& Curation
Tool
Creation &
Publication
 
Data
Generators
 
Tool
Developers
 
Biomedical
Researchers
Cloud Services
Community
Workbench
Single
Cell Portal
Custom
Apps &
Portals
 
Applications
 
App Services
 
Core Services
 
Kernel
Workspace
Manager
Interactive
Analysis
Data Repo
Manager
Analysis
Code Repo
User
Manager
Folder
Manager
Spend Profile
Manager
Kernel
Access
Spend
Tracker
Resource
Manager
Notification
Manager
IAM
Policy
Manager
Community
Workbench
AoU
Workbench
Single Cell
Portal
Cromwell
 
Tools,
e.g.
RStudio
Jupyter
IGV
Bioconductor
Hail
Data Repo
UI
Terra
CLI
Terra
Connector
Third-party
Apps
Billing Reports
Data Library
 
Terra
Platform
Cloud
Layer
Activity
Logger
 
Rearchitecture in progress to support multiple clouds
and integration of new resources
 
Many datasets co-analyzable according to research use
 
Not involved in billing now - user brings billing, we manage resources
Scale is critical (per month: ~2k active users, ~500 users running
workflows and notebooks, ~500k workflows per month, >10m CPU hours
per month)
Provenance is critical to reproducible data science
Authorization Domains to protect derived results
 
Design principles and considerations
 
Questions?
Slide Note
Embed
Share

Terra is a platform driven by the Data Biosphere vision to enable an ecosystem of interoperable components across organizations. It utilizes GA4GH standards and serves various roles such as Biomedical Researchers, Custom Apps & Portals, and Data Developers. The architecture is being restructured to support multiple clouds and integration of new resources for enhanced functionality and user experience. Various datasets are co-analyzable on Terra, with design principles focusing on scalability, provenance, and authorization domains for reproducible data science.

  • Terra Platform
  • Data Biosphere
  • GA4GH Standards
  • Cloud Integration
  • Data Science

Uploaded on Sep 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Terra Architecture and Design Overview March 2020 Alex Baumann - abaumann@broadinstitute.org terra.bio 1

  2. Driven by Data Biosphere vision to enable an ecosystem of interoperable components across orgs 2 DataBiosphere.org

  3. GA4GH standards used wherever possible Tool Repository Service (TRS) enable libraries of pre-packaged shared methods Workflow Execution Service (WES) federated analysis: run reproducible workflows Data Repository Service (DRS) federated storage: access data, wherever it s stored Data Use (DURI) data passport: match access rules to researchers Researcher Identity (DURI) data passport: make researcher credentials portable and more ... terra.bio 3

  4. Terras architecture primarily serves 3 roles Biomedical Researchers Custom Apps & Portals All of Us Workbench Community Workbench Single Cell Portal Tool Data Developers Generators Terra Platform API Data Tool Data Mgmt Tools Mgmt Production & Curation Creation & Publication Workspaces Cloud Services terra.bio 4

  5. Rearchitecture in progress to support multiple clouds and integration of new resources Community Workbench AoU Single Cell Portal Third-party Apps Data Repo UI Terra CLI Applications Workbench App Services Data Library Cromwell RStudio Bioconductor Tools, e.g. Interactive Analysis Terra Connector Jupyter IGV Hail Billing Reports Core Services Terra Platform Workspace Manager Data Repo Manager Analysis Code Repo User Manager Spend Profile Manager Folder Manager Kernel Access Kernel Policy Manager Resource Manager Notification Manager Spend Tracker Activity Logger IAM Cloud terra.bio Layer 5

  6. Many datasets co-analyzable according to research use NHGRI AnVIL terra.bio 6

  7. Design principles and considerations Not involved in billing now - user brings billing, we manage resources Scale is critical (per month: ~2k active users, ~500 users running workflows and notebooks, ~500k workflows per month, >10m CPU hours per month) Provenance is critical to reproducible data science Authorization Domains to protect derived results terra.bio 7

  8. Questions? 8

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#