Evolution and Innovation at Brookhaven National Laboratory's SDCC
The Scientific Data and Computing Center (SDCC) at Brookhaven National Laboratory is a leading computing center supporting high-throughput computing for various experiments and projects. With state-of-the-art facilities and a dedicated team, SDCC plays a crucial role in data collection, processing, and analysis. As the center evolves in 2024 with a new director and organizational changes, it continues to be at the forefront of scientific data management and computational support.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
SDCC Update Alexei Klimentov Brookhaven National Laboratory NPPS Weekly Meeting March 13, 2024
Scientific Data and Computing Center Scientific Data and Computing Center (1/3) (1/3) SDCC was established as RHIC Computing Facility (RCF) and later as ATLAS Tier-1 center (RACF) SDCC today is a leading computing center for High-Throughput Computing and scientific data, supports HEP and NP experiments, (ATLAS, Belle II, DUNE, STAR, sPHENIX) as well as other BNL, US and international projects : EIC, LQCD, NSLS II, CFN, NN, BES, WLCG, OSG The research is supported by a state-of-the-art data center, which began operation in FY 2021. SDCC supports experiments throughout the entire data collection and processing cycle, including data analysis. The data center serves more than 2000 users from more than 20 projects and experiments. Today BNL operates one of the top five scientific data centers in the world, with over 255 PB of actively managed data. The mass storage HPSS system is used in Data Carousel mode, when data are actively migrated between disk and tape. More than 1.2 EB were processed and analyzed in 2023 and data traffic has exceeded 200 PB/year. The largest T1 center in ATLAS, the most performant and stable Tier-1 center in WLCG Tier-0 center for STAR and sPHENIX Tier-1 center and RAW data storage for Belle II ~42 members working at SDCC [physicists and computing scientists with PhD, computing professionals, IT engineers] HEP ~41% of SDCC budget and staff, NP ~40%, ~19% not NPP HEP ~41% of SDCC budget and staff, NP ~40%, ~19% not NPP Year 2024 : year of evolutionary changes Year 2024 : year of evolutionary changes New SDCC Director, new organization, group leaders,... [next slide] SDCC technical advisory board (with members from other directorates) Efforts and expertise consolidation within the groups sdcc.bnl.gov FY 2022 National Laboratory Energy Frontier Research Program Review 2 T. Maeno Aug 2023
Director Alexei Klimentov (interim) Technical Advisory Board Johannes Elmsheuser (Co-Chair)[ATLAS] Stuart Wilkins (Co-Chair) [NSLS II] Adolfy Hoisie [CSI] Sara Mason [CFN] Chris Pinkenburg [sPHENIX] Cedric Serfon [Belle II] STAR [TBD] Jerome Lauret (scientific secretary) Leadership Team : C.Caramarcu, H.Ito, J.Lauret, J.Liu, O.Rind, J.Smith, A.Wong SDCC Operations Officer : I.Latif Storage & Fabrics Group Antonio Wong (interim) Services & Tools Group Ofer Rind Infrastructure Group Jason Smith Admin Support Leisa McGee (Rosetta Tommasi^) Dmitri Arkhipkin [Doug Benjamin] (Uma Ganapathy+) Ivan Glushkov Hironori Ito (Eric Lancon^) Jerome Lauret Costin Caramarcu Kevin Casella (Zhihua Dong+) Ethan Franco [i] [Jerome Lauret] Shigeki Misawa Michael Poat Tom Smith Infrastructure Section Joseph Frith* Mark Berry (Matt Cowan+) Robert Hancock Tommy Tang Resources Planning & Management [Shigeki Misawa] Users Services Section John DeStefano* Saroj Kandasamy Christian Lepore Louis Pelosi HPSS Section Tim Chou* Ognian Novakov Justin Spradley [Yingzi (Iris) Wu] Network Section (Mark Lukasczyk*-) (Frank Burstein-) Storage Section Zhengping (Jane) Liu* Doug Benjamin Carlos Gamboa Vincent Garonne Qiulan Huang James Leonardi [i] Yingzi (Iris) Wu Contacts ATLAS : Ofer Rind Belle II : Hironori Ito CFN : Shigeki Misawa DUNE : Doug Benjamin NSLS II : Jerome Lauret sPHENIX : Antonio Wong STAR : Jerome Lauret EIC : TBD SDCC Facilities Operations (Imran Latif+*) (Naveed Anwer+) Enrique Garcia Al Maruf John McCarthy * : section leader and technical lead [name] : secondary section [i] : student intern () : not supervised at SDCC + : CSI supervisor - : ITD supervisor ^ : PO supervisor FY 2022 National Laboratory Energy Frontier Research Program Review 3 T. Maeno Aug 2023 February 27, 2024. v1.01
Scientific Data and Computing Center (2/3) Scientific Data and Computing Center (2/3) Emphasis: cross Emphasis: cross- -experiment common efforts across HEP and NP experiment common efforts across HEP and NP Shared personnel, expertise, software stack, R&Ds Data Carousel project (It was initiated by RHIC experiments and now one of the most visible R&D projects in HEP, ATLAS Analysis Object Data volume on disks was reduced by factor 2) Common efforts across NPP and other scientific domains Common efforts across NPP and other scientific domains BNL team (NPPS+SDCC+CSI, Lead PI : A.Klimentov ) is leading the 5-year $10M DOE ASCR REDWOOD project, Resilient Federated Workflows in a Heterogeneous Computing Environment. Three national labs (BNL, ORNL and SLAC) and three universities (CMU, U Massachusetts and U Pittsburgh) are working together to solve one of the most challenging topics: how to efficiently manage extremely large volumes of data and complex workflows for many scientific domains, including distributed system modeling and AI/ML algorithms for payloads brokering and data placement in heterogeneous computing infrastructure. DOE ASCR FOA 24-3210. LoI Capability, Optimization, and Usability of Computing for Extreme Discovery Science [Feb 27, 2024] BNL (SDCC) co-Lead institute (together with ANL, LBNL, and Universities, Lead Institute : FNAL) Evaluation of Commercial clouds for HEP, NP and other domains Evaluation of Commercial clouds for HEP, NP and other domains The BNL team (NPPS) has co-led (US) ATLAS, a Google/AWS cloud project, since its inception in 2015. We are closely monitoring the possibility of using commercial clouds for peak periods of scientific data processing or for specific R&D, not only for NP and HEP, but also for other scientific domains. We are also in contact with other US laboratories and international partners to learn more about their experience. Small-scale R&D projects are under evaluation. BNL-Google discussion on Mon March 11 (agenda) FY 2022 National Laboratory Energy Frontier Research Program Review 4 T. Maeno Aug 2023
Scientific Data and Computing Center Scientific Data and Computing Center (3/3) (3/3) Our highest priorities highest priorities for research computing development are in those areas with the broadest potential for enabling our future programs at BNL, building on our strengths and addressing the complex challenges that await us [HL-LHC, Belle II, DUNE, EIC]. Considered broadly, there are four levels of engagement and context in these efforts: large-scale science collaborations and programs (HEP and NP); accelerator science; small science collaborations; and individual scientific studies. FY 2022 National Laboratory Energy Frontier Research Program Review 5 T. Maeno Aug 2023
NPPS and SDCC NPPS and SDCC NPPS and SDCC are important for the BNL NPP science program Joint topical discussions will be organized by both groups ATLAS SW&C workshop Mar 18-22 (agenda) ACAT 2024 highlights Close collaboration between groups in NPP experiments ATLAS (HL-LHC R&D) Data popularity Smart writing Data Carousel Cloud computing Belle II (is a very successful story for both groups) RHIC experiments Very good collaboration in sPHENIX DUNE (we are actively working on joint plans, thanks Doug and Mike Kirby) ePIC/EIC We have a lot to do together This is one of the Lab priorities There are many discussions We need a coherent and consistent plan AI/ML User services Databases Joint proposals (LDRD, DOE FOA, ) Data preservation FY 2022 National Laboratory Energy Frontier Research Program Review T. Maeno Aug 2023
NPPS and SDCC. Contd NPPS and SDCC. Cont d SDCC needs today NPPS expertise in python CRS: https://git.racf.bnl.gov/gitea/SDCC/CRS (C)entral (R)econstruction (S)ervice - reconstruction workflow manager. The STAR CRS system exists primarily to allow experiments to run reconstruction jobs that require pre-staged input with the pre-staging done asynchronously to the processing done on the farm. Group Quota: https://git.racf.bnl.gov/gitea/SDCC/Group-Quota HTCondor Hierarchical Group Quotas are used to partition the pool logically into queues, with each sub-group containing a single species of job. The groups are organized into a tree, with leaf-nodes having jobs submitted into them and the higher levels serving to classify these jobs. User Chown: https://git.racf.bnl.gov/gitea/SDCC/userchown A program that allows a specified user to copy files as any other user belonging to a specified group to a specific set of directories. Jupyter spawners: https://git.racf.bnl.gov/gitea/SDCC/sdcc_jupyter_spawners Jupyter spawners for configuration at SDCC. Other topics Expertise in java Cloud and distributed storage FY 2022 National Laboratory Energy Frontier Research Program Review T. Maeno Aug 2023