Understanding Fermigrid: A Comprehensive Overview

Slide Note
Embed
Share

Delve into the intricate workings of Fermigrid, a high-performance computing system, through detailed discussions on its components such as the batch system GPGrid, the provisioning layer Fifebatch, and the allocation of resources. Learn about Fermigrid's division between experiments, allocations via SPPM, HTCondor quotas, and negotiations between experiment groups. Discover how resources are managed efficiently and how interactions with Fermigrid benefit scientists, students, and users. Explore the role of GlideinWMS in extending resources beyond GPGrid to OSG.


Uploaded on Sep 18, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. FIFE Workshop Fermigrid BATCH POLICY PROPOSAL ANTHONY TIRADANI

  2. Fermigrid 2 Other talks will cover how scientists, students, and others interact with Fermigrid This talk will concentrate on some of the components of Fermigrid Specifically: The batch system GPGrid The provisioning layer Fifebatch (glideinWMS) How resources are allocated and used at each layer 6/4/13 SCD FIFE Workshop

  3. Fermigrid GPGrid 3 Gatekeeper Job HTCondor Scheduler Job Interactive UI Central Manager HTCondor Collector/Negotiator HTCondor Scheduler HTCondor Startd Worker Node 6/1/15 SCD FIFE Workshop

  4. Fermigrid - GPGrid 4 GPGrid is divided between experiments via allocations Allocations are determined by SPPM The sum of all allocations must be equivalent to the available resources Enforced via HTCondor static quotas If quotas are greater than the available resources, HTCondor auto scales the quotas to match the available resources Experiments are allowed to use resources above their quotas if they are available. Such requests will be considered after all other requests are negotiated. 6/1/15 SCD FIFE Workshop

  5. Fermigrid - GPGrid 5 GPGrid Quota Example Experiments A and B are using their full quota Experiment C is not using any of their quota Both Experiments A and B submit more jobs The extra jobs submitted by Experiments A and B are auto regrouped into a generic group and negotiated together as one group The extra slots won t be counted against Experiments A and B When Experiment C submits jobs, their jobs will be negotiated first 6/1/15 SCD FIFE Workshop

  6. Fermigrid Fifebatch (glideinWMS) 6 glideinWMS Glidein Factory, WMS Pool Jobsub Submission Cluster VO Frontend HTCondor Scheduler HTCondor Central Manager Job glidein glidein HTCondor Startd HTCondor Startd Worker Node Worker Node GPGrid OSG Grid Site 6/1/15 SCD FIFE Workshop

  7. Fermigrid - Fifebatch 7 Fifebatch Pool (a.k.a glideinWMS) Resources available extend beyond GPGrid OSG resources are aggregated as well Running on OSG is a great way to obtain computing resources well beyond the GPGrid allocations OSG is on pace for ~2 million opportunistic computing hours this year Resource allocation Mix of Hierarchical Quotas and HTCondor Priorities Quotas should only be used to carve off pieces of your allocated computing cores where you need a DC level of computing resources over an extended period of time The overwhelming majority of scheduling decisions should be based on priorities 6/1/15 SCD FIFE Workshop

  8. Fermigrid - Fifebatch 8 Fifebatch Quota Example (1) Assume that experiment A has a quota determined from SPPM to be 1000. Exp A measures resources needed to run production reconstruction on new data to be an average of (50 core x 24 h) = 1200 core hours per day. They could then set a quota of 50 from the 1000 to be allocated to production reco. 6/1/15 SCD FIFE Workshop

  9. Fermigrid - Fifebatch 9 Fifebatch Quota Example (2) Assume that experiment A has a quota determined from SPPM to be 1000. Exp A wants the ability to quickly validate new software releases or production scripts, and so could allocate a quota of 5 from the 1000 to the test queue. This is assuming that this is a fairly regular activity 6/1/15 SCD FIFE Workshop

  10. Fermigrid - Fifebatch 10 Fifebatch Quota Example (3) Assume that experiment A has a quota determined from SPPM to be 1000. Exp A once a weeks needs to run calibration over all of the data, and this needs to happen Monday morning at 9:00 am for the All Experimenters Meeting which takes place Monday afternoon. They need 25 cores x 4 hours in order to successfully complete the calibration before the meeting. This is a boundary case that may be best served by quotas or priorities. It is a regular activity that must be guaranteed to run. Does EXP A want all the calibration jobs to run first? priorities Does EXP A want to constrain the resources used? - quotas 6/1/15 SCD FIFE Workshop

  11. Fermigrid - Fifebatch 11 Fifebatch Quota Example (4) Assume that experiment A has a quota determined from SPPM to be 1000. Exp A wants to give more resources to the Cross Section Analysis group. This should be done through the use of priorities and subsequent tuning and not through quotas. 6/1/15 SCD FIFE Workshop

  12. Fermigrid - Fifebatch 12 Plans for the future Allow selected experiment representatives to set quotas and priorities in the Fifebatch layer 6/1/15 SCD FIFE Workshop

Related