Multi-VO Rucio for EGI

 
Multi-VO Rucio for EGI
Service owner and developer
for Multi-VO Rucio
Timothy Noble
Data management is the practice of keeping and using data securely,
efficiently, and cost-effectively.
A robust data management solution becomes more necessary as the number
of people accessing, generating, and sharing data increases across a number
of sites.
What is data management
 
Rucio
Overview
Data management tool
Integrates with many storage solutions
Data can be stored across multiple sites,
with different setups and protocols
Data can be anything, images, text….
Designed with more than a decade of
operational experience in very large-
scale data management (Several billion
files, approaching an Exabyte of data)
Developed by ATLAS, now used and
developed by other communities
Open source
Community-driven development
Rule – relates to and protects a file and allow for replication
Dataset – A collection of files
Container – A collection of datasets
RSEs – Rucio Storage Endpoint
Daemons – A program that performs a specific task, there are many kinds
Rucio
Terms
Single files can be replicated using rules
Files are grouped together in datasets
Can belong to multiple datasets
Containers are collections of datasets
Containers and datasets can have properties to
protect datasets
E.g. Open/closed – can have data added
Rucio
Files, Datasets and Containers
Container 1
Container 2
Dataset1
Dataset1
Dataset2
Dataset3
1
2
3
4
5
2
4
5
6
8
1
2
3
4
5
1
5
6
Replication Rules
A file from a data source may be uploaded to Rucio to an RSE
This may not be where it is needed
o
Add a replication rule
o
This would add the two datasets to the RSEs in the UK, and have 2 copies of this
dataset within the UK
o
This would add the same two copies across the UK but excludes RAL as one of the
potential sites to keep this data
RSE expressions
Moving Data
 
$ rucio add-rule scope:first_dataset scope:second_dataset 2 ‘country=uk'
$ rucio add-rule scope:first_dataset scope:second_dataset 2 'country=uk\site=RAL-LCG2
$ rucio upload -- rse storageendpoint scope:first_file scope:second_file
Each account is given a quota of space
Rules that account has count towards
their quota, but this does not mean
that space is taken up at an end point
If two users have a rule on the same
file, each has that space allocated to
their quota but only one physical copy
is present
If there is a rule on a file, the file will
not be deleted
Primary and secondary data
Multi-VO Rucio at RAL
Accounting and Quotas
Rucio
Architecture
Rucio was initially setup in April 2018 at RAL to support SKA development
Evolved to supporting Multi-VO
Developers have worked with core dev team to develop
Multi-VO functionality
Contact point for new users and provide tutorials
Timothy.Noble@stfc.ac.uk
Test transfers between sites to ensure endpoint functionality
Working on documentation to help User and VO-Admin use:
https://www.gridpp.ac.uk/wiki/Rucio
To be made available to EGI when complete
Rucio-Support@stfc365.onmicrosoft.com
 for support
Multi-VO Rucio at RAL
Setup and maintenance
Running a Rucio instance that supports multiple VOs is beneficial for small
experiments:
Maintained by RAL not by smaller experiments
Low levels of load from smaller experiments.
One instance to support and maintain.
Shared RSE configuration
New VOs are quick to add – work with VO admin to setup their environment
More contact with Developers and larger communities using Rucio to know
how best to utilise
Multi-VO Rucio at RAL
Advantages
Work in progress
Next 3-6 months
Accessing Rucio
WebUI and EGI Check-in / IAM
Accessing Rucio
Clients
Containerised Client
Installed using docker
Easy to install & setup
Lightweight Client
Easy to install & setup
Fewer dependencies
Run without a config file
Rucio Desktop
Graphical User interface for Rucio
Developed as a summer of code
project
Work in progress
https://github.com/rucio/desktop
EGI-ACE  Developments
WebUI for Multi-VO
Integration with IAM and EGI check-in
Invite new experiments to use Multi-VO Rucio
Improve documentation
Internal Developments
Multi-VO selection of certificates for daemons
Containerisation
Improve monitoring
Gain experience with on boarding a variety of communities and requirements
Multi-VO Rucio at RAL
RAL Plans
Multi-VO Rucio at RAL is available to anyone who wishes to
use it
RAL Multi-VO Rucio is working with EGI to offer an opportunity for communities to
use the powerful data management software of Rucio without needing to invest in
the setup and maintenance.
RAL Multi-VO Rucio at RAL is being viewed as a long term service to support any
community that wishes to use it.
RAL Multi-VO Rucio is being deployed and developed with new users in mind to
make it easier for communities who have no experience with Rucio or the grid
environment.
Multi-VO Rucio at RAL
Conclusions
Slide Note
Embed
Share

Data management is crucial for securely and efficiently handling data across various sites. Rucio is a robust data management tool used for storing data across multiple sites with diverse setups and protocols, supporting various types of data like images and text. Developed with over a decade of operational experience, Rucio integrates with multiple storage solutions and is open source, driven by the community. It utilizes terms like rules, datasets, and containers to manage and protect files, allowing for efficient replication and movement of data. Learn about Rucio's functionalities in data replication rules and how it facilitates storage and accessibility of data across different locations.

  • Data management
  • Rucio
  • Storage solutions
  • Data replication
  • Open source

Uploaded on Feb 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. EGI: Advanced Computing for Research www.egi.eu @EGI_eInfra Multi-VO Rucio for EGI Timothy Noble Service owner and developer for Multi-VO Rucio The work of the EGI Foundation is partly funded by the European Commission under H2020 Framework Programme

  2. What is data management Data management is the practice of keeping and using data securely, efficiently, and cost-effectively. A robust data management solution becomes more necessary as the number of people accessing, generating, and sharing data increases across a number of sites. www.egi.eu @EGI_eInfra 2 06/10/2021

  3. Rucio Overview Data management tool Integrates with many storage solutions Data can be stored across multiple sites, with different setups and protocols Data can be anything, images, text . Designed with more than a decade of operational experience in very large- scale data management (Several billion files, approaching an Exabyte of data) Developed by ATLAS, now used and developed by other communities Open source Community-driven development www.egi.eu @EGI_eInfra 3 06/10/2021

  4. Rucio Terms Rule relates to and protects a file and allow for replication Dataset A collection of files Container A collection of datasets RSEs Rucio Storage Endpoint Daemons A program that performs a specific task, there are many kinds www.egi.eu @EGI_eInfra 4 06/10/2021

  5. Rucio Files, Datasets and Containers Container 1 Single files can be replicated using rules Files are grouped together in datasets Can belong to multiple datasets Containers are collections of datasets Containers and datasets can have properties to protect datasets E.g. Open/closed can have data added Dataset1 1 2 3 4 5 Dataset2 2 4 5 6 8 Container 2 Dataset3 1 5 6 Dataset1 1 2 3 4 5 www.egi.eu @EGI_eInfra 5 06/10/2021

  6. Moving Data Replication Rules A file from a data source may be uploaded to Rucio to an RSE $ rucio upload -- rse storageendpoint scope:first_file scope:second_file This may not be where it is needed o Add a replication rule $ rucio add-rule scope:first_dataset scope:second_dataset 2 country=uk' o This would add the two datasets to the RSEs in the UK, and have 2 copies of this dataset within the UK $ rucio add-rule scope:first_dataset scope:second_dataset 2 'country=uk\site=RAL-LCG2 o This would add the same two copies across the UK but excludes RAL as one of the potential sites to keep this data RSE expressions www.egi.eu @EGI_eInfra 6 06/10/2021

  7. Multi-VO Rucio at RAL Accounting and Quotas Each account is given a quota of space Rules that account has count towards their quota, but this does not mean that space is taken up at an end point If two users have a rule on the same file, each has that space allocated to their quota but only one physical copy is present If there is a rule on a file, the file will not be deleted Primary and secondary data www.egi.eu @EGI_eInfra 7 06/10/2021

  8. Rucio Architecture www.egi.eu @EGI_eInfra 8 06/10/2021

  9. Multi-VO Rucio at RAL Setup and maintenance Rucio was initially setup in April 2018 at RAL to support SKA development Evolved to supporting Multi-VO Developers have worked with core dev team to develop Multi-VO functionality Contact point for new users and provide tutorials Timothy.Noble@stfc.ac.uk Test transfers between sites to ensure endpoint functionality Working on documentation to help User and VO-Admin use: https://www.gridpp.ac.uk/wiki/Rucio To be made available to EGI when complete Rucio-Support@stfc365.onmicrosoft.com for support www.egi.eu @EGI_eInfra 9 06/10/2021

  10. Multi-VO Rucio at RAL Advantages Running a Rucio instance that supports multiple VOs is beneficial for small experiments: Maintained by RAL not by smaller experiments Low levels of load from smaller experiments. One instance to support and maintain. Shared RSE configuration New VOs are quick to add work with VO admin to setup their environment More contact with Developers and larger communities using Rucio to know how best to utilise www.egi.eu @EGI_eInfra 10 06/10/2021

  11. Accessing Rucio WebUI and EGI Check-in / IAM Work in progress Next 3-6 months www.egi.eu @EGI_eInfra 11 06/10/2021

  12. Accessing Rucio Clients Containerised Client Installed using docker Easy to install & setup Rucio Desktop Graphical User interface for Rucio Developed as a summer of code project Work in progress Lightweight Client Easy to install & setup Fewer dependencies Run without a config file https://github.com/rucio/desktop www.egi.eu @EGI_eInfra 12 06/10/2021

  13. Multi-VO Rucio at RAL RAL Plans EGI-ACE Developments WebUI for Multi-VO Integration with IAM and EGI check-in Invite new experiments to use Multi-VO Rucio Improve documentation Internal Developments Multi-VO selection of certificates for daemons Containerisation Improve monitoring Gain experience with on boarding a variety of communities and requirements www.egi.eu @EGI_eInfra 13 06/10/2021

  14. Multi-VO Rucio at RAL Conclusions Multi-VO Rucio at RAL is available to anyone who wishes to use it RAL Multi-VO Rucio is working with EGI to offer an opportunity for communities to use the powerful data management software of Rucio without needing to invest in the setup and maintenance. RAL Multi-VO Rucio at RAL is being viewed as a long term service to support any community that wishes to use it. RAL Multi-VO Rucio is being deployed and developed with new users in mind to make it easier for communities who have no experience with Rucio or the grid environment. www.egi.eu @EGI_eInfra 14 06/10/2021

  15. EGI: Advanced Computing for Research www.egi.eu @EGI_eInfra Thank you for your attention. Questions? This work by the EGI Foundation is licensed under a Creative Commons Attribution 4.0 International License. The work of the EGI Foundation is partly funded by the European Commission under H2020 Framework Programme

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#