Enhancing Grid Site Computing Resources through BOINC Implementation

Slide Note
Embed
Share

This presentation discusses the full utilization of grid site computing resources using BOINC, focusing on NGI_CZ and CESNET cluster resources, along with strategies for better resource utilization and the implementation of BOINC for LHC community support.


Uploaded on Oct 08, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. FULL UTILIZATION OF GRID SITE COMPUTING RESOURCES USING BOINC Ji Chudoba, Alexandr Mikula, Ale Prchal CESNET and FZU EGI Conference 2021 21.10.2021, virtual

  2. NGI_CZ GRID RESOURCES CESNET represents NGI_CZ National research and education network part of eInfra.cz consortium 3 active sites registered praguelcg2 WLCG Tier-2 center plus Fermilab VOs plus astroparticle VOs distributed, mostly at FZU. Individual projects contribute to resources prague_cesnet_lcg2 CESNET contribution to EGI HTC Grid resources CESNET-MCC - CESNET contribution to EGI cloud resources More resources for CZ users via Metacentrum clusters distributed on many sites PBSPro batch system

  3. NGI_CZ GRID RESOURCES CESNET cluster resources 2068 jobslots (HT on) provided by 3 subclusters (29 servers in total) 1024 cores added in Dec 2020 30 kHS06 KVM hypervisors for services HTCondor-CE, HTCondor SE: DPM, 900 TB in 6 servers 400 TB added in June 2021 network 2x10 Gbps 1 VOMS server (out of 2) hosted on VMware in another location WLCG T2 10000 jobslots, 7 PB disk space

  4. TIER-2 PRAGUELCG2 ATLAS and ALICE VO almost continuous production High priority for local users

  5. CESNET SITE some periods of unused cores

  6. OPTIONS FOR BETTER UTILISATION Add LHC VO relatively small number of cores bad ratio effort/beneficial effect Add other VOs not connected to CZ groups increase load on support HTCondor jobs flocking may be interesting unknown side effects BOINC for LHC community support should be easy to operate

  7. BOINC IMPLEMENTATION common account for many instances if the name matches the site name, visible in the ATLAS accounting BOINC used also for backfilling standard sites D. Cameron: Adapting ATLAS@Home to trusted and semi-trusted resources CHEP19

  8. BOINC IMPLEMENTATION Standalone clients works well for desktops VirtualBox used issues with kernel modules issues when running on worker nodes long, never ending jobs not always 0 usage by BOINC when other workload started required manual interventions some scripts available Another attempt based on HTCondor Manual for backfilling does not support segmentation for cores

  9. BOINC IMPLEMENTATION Current implementation based on htcondor wiki manual BOINC jobs are allowed to run on every 4th job slot and each Boinc job utilizes exactly 4 CPU cores. For each job slot in an Unclaimed state the local startd triggers a fetch_work_boinc script periodically which in turn generates a Classad file for a Boinc job. The job Classad file is then executed by the startd. If the job slot is being unclaimed for more than 10 minutes, the job requirements are met and the boinc-client binary is executed. Because of a RANK statement in a configuration file, the Boinc jobs have lower priority, and therefore can be evicted whenever a regular grid job is waiting for a free job slot. Runs in singularity containers Receives SIGTERM when a standard jobs is executed 7 12 GB of disk space used for 4 cores

  10. BOINC

  11. BOINC

  12. GREAT EFFECIENCIES Efficiency for last 30 days for praguelcg2 boinc

  13. CONTRIBUTION TO ATLAS@HOME BOINC resources for ATLAS contribute ~5%

  14. CONCLUSIONS ATLAS@HOME application manages to effectively fill unused resources with minimal effort and now observed negative influence on standard jobs

  15. QUESTIONS?

Related


More Related Content