Overview of Virgo Computing Activities

Virgo computing
Michele Punturo
Computing - VW20180709
1
Computing is an “hot topic”?
Virgo computing has been an hot topic in the last weeks
15/06/2018 – presentation of ET computing issues and activities in front of the
INFN “Commissione Calcolo e Reti”
18/06/2018 – Computing issues discussed at VSC
25/06/2018 – discussion on future developments on astroparticle computing in
INFN (Virgo invited together CTA, KM3, Euclid in the INFN presidency)
02/07/2018 – External Computing Committee meeting at EGO (ECC appointed by
the EGO council)
04/07/2018 – discussion on funds in 2019 for all the INFN experiments at TIER1
08/07/2018 – Talk at the Virgo Week
19/07/2018 – C3S meeting in Presidenza INFN on computing challenges
Computing - VW20180709
2
Slides Recycling
Computing - VW20180709
3
Advanced LIGO
EGO/Virgo Site
Temporary
storage and
DBs
DAQ (50MB/s – 84MB/s full raw data)
Detector characterisation
Low latency analysis and detection
CCIN2P3
CC and Tier0-1
Offline Data Analysis
CNAF
CC and Tier0-1
Offline Data Analysis
Nikhef
Offline DA
SurfSARA
Offline DA
PolGRID
Offline DA
Raw Data Transfer
45-50MB/s
GRID and 
local access
h(t) data transfer
Reduced Data Set 
(RDS) transfer
0.87MB/s
via LDR
LIGO via LDR
Advanced Virgo
LIGO
Virgo
Virgo
↔Via GridFTP
Via iROODS ↔
Computing - VW20180709
4
Storage
Currently the storage capabilities at EGO are about 1PB
50% devoted to home directories, archiving special data, output of the local “low
latency analysis”
50% devoted to the circular buffer that is used to store locally the raw data
Less than 4 months of lifetime of the data before overwriting (at 50MB/s)
Too short period for commissioning purposes
Unable to keep O3 on disk
This situation is due to a rapid evolution of the requirements scenario that is making
obsolete the previous computing model specifications:
Increase of the requests by DAQ; data writing rates:
Nominal ~22MB/s
O2: ~ 37MB/s
Current: ~50MB/s
Requests by commissioner to have data corresponding to “special periods” stored locally for
commissioning and noise hunting purposes
Requests by low-latency analysers to have disk space for outputs of their analysis
EGO/Virgo Site
Temporary
storage and
DBs
DAQ (50MB/s – 84MB/s full raw data)
Detector characterisation
Low latency analysis and detection
Computing - VW20180709
5
Storage
The shortage of disk space at the site is also raising the risk of loosing scientific
data
Nota Bene:
“If the incident at the CNAF had occurred during O3, Virgo would have lost science
data”
We had to pass through:
a Council meeting
a STAC meeting
a ECC meeting
Finally we had green light to purchase the storage and the order has been
submitted by EGO
Computing - VW20180709
6
O2 experience on computing and DT
A sequences of internal and external issues affected the data transfer toward the
CCs during O2
Computing STAC 23/05/2017
7
O2 issues highlighted by DT
Problem (1):
iRODS hanging toward CCIN2P3
Problem (2):
un-identified midnight slowing down
Problem (3):
Grid certificate expiration
Problem (4):
Saturation of the disk-to-tape transfer in CCIN2P3
Problem (7):
Similar issue at CNAF
Problem (5):
Freezing of the storage due to lack of disk space
Problem (6):
Firewall freezing
Introduction to Virgo computing
8
Discussion with CCs
We had two meetings with CCs:
16/01/2018: first meeting, presentation of the problems, discussions,
hypothesis, some solution suggested on
Data management (Keyword: Dirac)
Workload management (Keyword: Dirac)
Virgo software replica (Keyword: CernVMS)
From CCs to Virgo: request of requirements
From Virgo to CCs: request of common solutions
07/05/2018: Second meeting
Solution on Data Transfer discussed
A possible common solution proposed by CCs
Computing - VW20180709
9
Strategy toward O3
Radically reduce the data loss risk purchasing a large storage at EGO
Almost solved
Improve the reliability and availability of the computing architecture
at EGO:
Bandwidth increase from 1GB/s to 10Gb/s
Bandwidth tested toward the pop GARR
Some issues toward CCs (see later)
High Availability Firewall has been installed
New Data Transfer Engine?
Virgo request: use the same protocol for CNAF and CCIN2P3
CCs answer: Webdav
Computing - VW20180709
10
Clean solutions to improve critical domains security
-
separate the ITF control and operation network from the others
-
separate the on-line analysis network
-
introduce a separate VTF (Virgo Test Facility) network  (needed also for reliability)
-
introduce a separate R&D network
-
introduce 2-factors authentication in the most critical domains
-
introduce internal intrusion/anomaly detection probes (reactive approach)
-
reorganize the remote access procedures in the new topology (VPN , proxies , ...)
11
Computing - VW20180709
12
Fastpath (for selected
hosts ):access control
only:
9.32/9.55 Gbps
Normal path: deep
inspection
Computing - VW20180709
Federated login and Identity Management: last news
Recently requested to 
join IDEM federation as EGO IdP
Quick solution
: working with GARR to provide the 
ego-gw.it
 IdP  as a service in the GARR cloud
connected to our AD database, and enter IDEM/EduGAIN in a few days
Federate Virgo web applications starting from the most critical for collaborative detection events
management: 
VIM (Virgo Interferometer Monitor)
 web site
need to 
split the application
 in an internal instance and an external federated one (ongoing)
federated authentication  for 
ligo.org 
or 
ego-gw.it
 users
 , direct Virgo users authentication only
when defined in  the LV Virtual Organization common database
For next Web applications: discussing with GARR to set an SP/IdP proxy pilot for a more flexible setup
LSC plans for authorization and IdM: to provide gradually federated services to the LV federated
identities via CoManage (as in the 
gw-astronomy
 instance)
Caveats
ligo.org
 identities (accounts) still needed to access LDG computing resources
in addition Virgo users still need to complement their 
ligo.org
 account with their personal certificate
subject
13
Computing - VW20180709
FIREWALL
Interferometer:
(DAQ, Controls,
Electronics,
Monitoring, …)
VIM
TDS
WWW
VIM-replica
Single sign on 
Federate login access
EGO SP (IdP)
EGO federation
 
AAI
Idem
Renater
SIR
UV
Surfnet
EduID
WIGNER
Poland: Pioneer?
LIGO lab
and LSC
universities
offer access
and
services
through
EduGAIN
VW Apr2018 Computing
14
AA COmanage IdMS
Scheme
Bulk data transfer
O3 Requirements:
Data writing 50MB/s → 100MB/s sustained (parallel) data transfer per remote site
150-200MB/s peak data transfer per remote site
Same protocol/solution for the two sites
Reliable login procedure
O2:
iRods+username/password login @ CCIN2P3
Gridftp+certificate @ CnAF
Solution proposed (previously) by CCs:
Webdav
Tests performed:
CNAF:
Login issues (certificate) 
Throughput always > 100MB/s with peaks of about 200MB/s 
Performance issues @ CCIN2P3:
Easy to login 
Throughput about 12MB/s up to 30MB/s using webdav (100MB/s using iRods) 
Long discussion at ECC meeting:
Waiting for feedbacks from Lyon
Test proposed with FTS
Computing - VW20180709
15
180MB/s since Friday, thanks to the migration
of the iRods server serving Virgo at CCIN2P3
in a 10Gb/s link
Low latency analysis machines
The number of machines devoted to MBTA have been doubled
160 → 320 cores
Additional ~180 cores have been devoted to detchar and cWB
(Condor farm)
Quick investment and installation made possible by the fact that a
virtual machine architecture has been tested and approved for low
latency analysis
Computing - VW20180709
16
F. Carbognani slides
F. Carbognani slides
Computing - VW20180709
17
F. Carbognani slides
F. Carbognani slides
Computing - VW20180709
18
F. Carbognani slides
F. Carbognani slides
Computing - VW20180709
19
F. Carbognani slides
F. Carbognani slides
Computing - VW20180709
20
Offline analysis
Unresolved long standing issue about the under-use of Virgo computing
resources at CCs
Only CW is using substantially CNAF and less regularly Nikhef
Other pipelines (mainly in CCIN2P3) have a negligible CPU impact
Computing - VW20180709
21
Network of
 Computing Centres
LIGO Scientific Collaboration:
1263 collaborators (including GEO)
20 countries
9 computing centres
~1.5 G$ of total investment
Virgo Collaboration:
343 collaborators 
6 countries
5 computing centres
~0.42 G€ of total investment
22
KAGRA Collaboration:
260 collaborators 
12 countries
5 computing centres
~16.4 G¥ of construction costs
 
Virgo ~6-8%
Offline Computing
 
Sept 2016-Sept2017
Computing load distribution
The use of the CNAF is almost mono-
analysis
Diversification is give by OSG access
Offline Computing
23
cw
cw
cw
cw
cw
cw
cw
cw
cw
cw
cw
cw
Future: Increase of CPU power needs
O3 run will start in February 2019
and will last for 1 year
We are signing a new agreement
with LIGO that is forcing us to
provide about 20-25% of the
whole computing 
energy
3 detectors:
Non linear increment for the
coherent pipelines
HL, HV, LV, HLV
4 detectors:
At the end of O3, probably KAGRA
will join the science run
Some of the pipelines will be
tested (based) on GPUs
Offline Computing
24
How to fill up the
resources
Ok, let suppose to find the
money to provide the 25% virgo
quota
Are we able to use it?
Within a parallel investigation
made for INFN, I asked to
(Italian) DA people to project
their needs/intentions in the
next years
With a series of caveat the
projection is →
Offline Computing
25
O3
O4
HPC resources
Numerical relativity is now a key element for the realisation of BNS
templates
In Virgo there are important activities thanks to the groups in Torino and in
Milano Bicocca
They intensively uses the CINECA resources within the CSN4 framework and through
some grant
Being them “structurally” participating to the DA of LV we need to provide computing resources:
Requests:
Offline Computing
26
2018-2019:
------------------
  2M GPU hours (e.g. on galileo @ cineca) [i]
  6M CPU hours on Intel BDW/SKL (Marconi A1/A3 @ cineca) [ii] 
   50 Tb hard drive space [iii]
yrs 2020-2023:
----------------------
6M GPU hours per year [i]
6M GPU hours per year [ii]
150 Tb hard drive space [iii]
Hence …
We need to contribute with new resources (MOU constrain)
But we are unable to use them or to allow access to “LIGO-like” pipelines
We need a solution to this situation
Replicate LIGO environment 
Local installation of Condor as wrapper of the local batch system 
New Workload Manager + Data manager
DIRAC
Positive WM tests 
Data management unclear to me 
Data transfer still to be tested
We are progressing too slow 
LIGO is going in the direction of using Rucio as DM and remain in Condor as WM 
Future development:
In LIGO tests of using Singularity + CVMS for virtualisation and distribution
That is a technology pursued at LHC → supported by our CCs
We need to invest on that
A post-doc is under recruitment @ INFN-Torino to be engaged in that activity
Computing - VW20180709
27
Cost model
In the current cost model EGO reimburses the costs at the French and Italian
CCs, Nikhef and Polgrav are contributing in kind
This model balances the costs between Italy and France but it charges the
largest fraction of costs on EGO shoulders
To move the bill back on the funding agencies doesn’t work, because INFN will be by
far the largest contributor without further balancing
We need to find a cost model that shares in a more clever way the costs for
computing within Virgo:
It should take in account the number of authors
It should take in account the global investment in computing (DAQ, low latency
analysis, storage, human resources)
It should force the Virgo members to account their resources
Offline Computing
28
Cost model
This is not a definitive proposal, but we need to open a discussion also in front of the EGO
Council and the institutions
We have to define our standards:
Needs and standard cost of the storage
Needs and standard costs for CPU
Accessibility requirements
 (LIGO compliant, Virgo compliant, …)
Accountability requirements 
(resources need to be accountable)
Human resource requirements 
(a collaboration member MUST be the interface)
Compute a “
Virgo standard cost
” per author
Each institution in Virgo has to provide in kind and/or in money resources proportional to
the number of authors according to the standard figures we defined
Ghosts are expensive!
Over-contributions can be considered as a net contribution to the experiment by the
institution; obviously we must take in account also the direct contribution to EGO
Offline Computing
29
2019: who pays?
In these days INFN is defining the investments in TIER1 (CNAF)
The decision will be in September
We have 30kHS06
In the original plan it was expected to jump to 70-80kHS06
Difficult in terms of cost and efficiency
What we ask for 2019?
Tape is defined (1PB)
Computing - VW20180709
30
Disk:
We have 656TB (578TB
occupied)
O3:
1MB/s Virgo of RDS +
2MB/s LIGO = 90TB
Disk for DA
Suggest to request a
pledge of 780TB (+124TB)
CPU? 30kHS06, 40kHS06, ?
Organisation
I have appointed as VDAS coordinator in emergency in Sept 2015
Since the beginning of my mandate, ended in Sept. 2017, I highlighted the
need to improve the organisation of the VDAS (changing also the name!)
Recently I proposed a new structure of VDAS, dividing it in Working Packages
and identifying the main projects
This organisation has been shown at the Virgo week in April and proposed to
VSC
Decision pending and a suggestion from this committee is more than welcome
Offline Computing
31
Virgo Computing
Coordination
Online Computing
subsystem
Offline Computing
Management
Sw management
Low latency architecture
Bulk Data Transfer
Local data allocation and
management  & strategy
Analysis pipelines GRID
compatibility
Offline Computation needs
evaluation
Data Management System
Local Computing
Infrastructure
Local computing and
storage infrastructure
management  & strategy
Networking infrastructure
Cyber-security
Local services
( federated login, web
servers ... )
DA coord
Commissioning
coordinator
«Local area»
«Wide area»
LIGO interface
Computing
Centres
Spokesperson /
EGO Direcor
Dedicated hw
management
Offline Computing
32
Reference persons at CCs
In addition, it is crucial to have a MEMBER OF THE VIRGO
COLLABORATION, fully devoted to computing issues, physically or
virtually located at each computing center, acting as reference
persons
Post-doc level
He/she participates to the collaboration life (computing meetings, DA
meetings, …), but he/she has the duty to solve (or to facilitate the solution) of
all the issues related to the use of that CC by the collaboration
Offline Computing
33
Conclusions
Computing is a crucial part of the detector
Computing is a key element of the LIGO-Virgo agreement
It is time for the collaboration (and for all the funding agencies) to
take it seriously
As stated at the last VSC and reported in the minutes, today I consider
concluded the extra-time I devoted to my appointment as VDAS
coordinator (officially ended in Sept. 2017)
I hope that a VDAS coordinator will exist anymore, but I wish all the best to
my successor
Computing - VW20180709
34
Slide Note
Embed
Share

Virgo computing has been a hot topic recently, with various discussions and meetings focusing on computing issues, future developments in astroparticle computing, and funding for INFN experiments. The activities include presentations, committee meetings, talks, and challenges in computing faced by Virgo and related projects like Advanced LIGO. The storage capabilities at EGO/Virgo site are discussed, highlighting challenges in data storage and data transfer rates. The need for adapting to evolving requirements is emphasized to meet the demands of data analysis and storage for current and future projects.

  • Virgo computing
  • Data storage
  • Astroparticle computing
  • Computing challenges
  • Data transfer

Uploaded on Oct 11, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Virgo computing Michele Punturo Computing - VW20180709 1

  2. Computing is an hot topic? Virgo computing has been an hot topic in the last weeks 15/06/2018 presentation of ET computing issues and activities in front of the INFN Commissione Calcolo e Reti 18/06/2018 Computing issues discussed at VSC 25/06/2018 discussion on future developments on astroparticle computing in INFN (Virgo invited together CTA, KM3, Euclid in the INFN presidency) 02/07/2018 External Computing Committee meeting at EGO (ECC appointed by the EGO council) 04/07/2018 discussion on funds in 2019 for all the INFN experiments at TIER1 08/07/2018 Talk at the Virgo Week 19/07/2018 C3S meeting in Presidenza INFN on computing challenges Computing - VW20180709 2

  3. Slides Recycling Computing - VW20180709 3

  4. Advanced Virgo EGO/Virgo Site DAQ (50MB/s 84MB/s full raw data) Detector characterisation Low latency analysis and detection h(t) data transfer Temporary storage and DBs Reduced Data Set (RDS) transfer 0.87MB/s via LDR Virgo Virgo Via GridFTP LIGO Raw Data Transfer 45-50MB/s Advanced LIGO Nikhef Offline DA CNAF CC and Tier0-1 Offline Data Analysis CCIN2P3 CC and Tier0-1 Offline Data Analysis SurfSARA Offline DA LIGO via LDR PolGRID Offline DA GRID and local access Computing - VW20180709 4

  5. EGO/Virgo Site Temporary storage and DBs DAQ (50MB/s 84MB/s full raw data) Detector characterisation Low latency analysis and detection Storage Currently the storage capabilities at EGO are about 1PB 50% devoted to home directories, archiving special data, output of the local low latency analysis 50% devoted to the circular buffer that is used to store locally the raw data Less than 4 months of lifetime of the data before overwriting (at 50MB/s) Too short period for commissioning purposes Unable to keep O3 on disk This situation is due to a rapid evolution of the requirements scenario that is making obsolete the previous computing model specifications: Increase of the requests by DAQ; data writing rates: Nominal ~22MB/s O2: ~ 37MB/s Current: ~50MB/s Requests by commissioner to have data corresponding to special periods stored locally for commissioning and noise hunting purposes Requests by low-latency analysers to have disk space for outputs of their analysis Computing - VW20180709 5

  6. Storage The shortage of disk space at the site is also raising the risk of loosing scientific data Nota Bene: If the incident at the CNAF had occurred during O3, Virgo would have lost science data We had to pass through: a Council meeting a STAC meeting a ECC meeting Finally we had green light to purchase the storage and the order has been submitted by EGO Computing - VW20180709 6

  7. O2 experience on computing and DT A sequences of internal and external issues affected the data transfer toward the CCs during O2 7 Computing STAC 23/05/2017

  8. O2 issues highlighted by DT Problem (1): iRODS hanging toward CCIN2P3 Problem (2): un-identified midnight slowing down Problem (3): Grid certificate expiration Problem (4): Saturation of the disk-to-tape transfer in CCIN2P3 Problem (7): Similar issue at CNAF Problem (5): Freezing of the storage due to lack of disk space Problem (6): Firewall freezing Introduction to Virgo computing 8

  9. Discussion with CCs We had two meetings with CCs: 16/01/2018: first meeting, presentation of the problems, discussions, hypothesis, some solution suggested on Data management (Keyword: Dirac) Workload management (Keyword: Dirac) Virgo software replica (Keyword: CernVMS) From CCs to Virgo: request of requirements From Virgo to CCs: request of common solutions 07/05/2018: Second meeting Solution on Data Transfer discussed A possible common solution proposed by CCs Computing - VW20180709 9

  10. Strategy toward O3 Radically reduce the data loss risk purchasing a large storage at EGO Almost solved Improve the reliability and availability of the computing architecture at EGO: Bandwidth increase from 1GB/s to 10Gb/s Bandwidth tested toward the pop GARR Some issues toward CCs (see later) High Availability Firewall has been installed New Data Transfer Engine? Virgo request: use the same protocol for CNAF and CCIN2P3 CCs answer: Webdav Computing - VW20180709 10

  11. Clean solutions to improve critical domains security New Firewall - separate the ITF control and operation network from the others - separate the on-line analysis network - introduce a separate VTF (Virgo Test Facility) network (needed also for reliability) - introduce a separate R&D network - introduce 2-factors authentication in the most critical domains - introduce internal intrusion/anomaly detection probes (reactive approach) - reorganize the remote access procedures in the new topology (VPN , proxies , ...) 11 Computing - VW20180709

  12. New Firewall Fastpath (for selected hosts ):access control only: 9.32/9.55 Gbps Normal path: deep inspection Stefano s talk 12 Computing - VW20180709

  13. Federated login and Identity Management: last news Federate Login Recently requested to join IDEM federation as EGO IdP Quick solution: working with GARR to provide the ego-gw.it IdP as a service in the GARR cloud connected to our AD database, and enter IDEM/EduGAIN in a few days Federate Virgo web applications starting from the most critical for collaborative detection events management: VIM (Virgo Interferometer Monitor) web site need to split the application in an internal instance and an external federated one (ongoing) federated authentication for ligo.org or ego-gw.it users , direct Virgo users authentication only when defined in the LV Virtual Organization common database For next Web applications: discussing with GARR to set an SP/IdP proxy pilot for a more flexible setup LSC plans for authorization and IdM: to provide gradually federated services to the LV federated identities via CoManage (as in the gw-astronomy instance) Caveats ligo.org identities (accounts) still needed to access LDG computing resources in addition Virgo users still need to complement their ligo.org account with their personal certificate subject 13 Computing - VW20180709

  14. AA COmanage IdMS TDS AAI Scheme Idem Interferometer: (DAQ, Controls, Electronics, Monitoring, ) WWW FIREWALL EGO SP (IdP) LIGO lab and LSC universities offer access and services through EduGAIN Renater UV VIM Single sign on Federate login access SIR VIM-replica Surfnet WIGNER Poland: Pioneer? VW Apr2018 Computing 14 EduID

  15. Bulk data transfer O3 Requirements: Data writing 50MB/s 100MB/s sustained (parallel) data transfer per remote site 150-200MB/s peak data transfer per remote site Same protocol/solution for the two sites Reliable login procedure O2: iRods+username/password login @ CCIN2P3 Gridftp+certificate @ CnAF Solution proposed (previously) by CCs: Webdav Tests performed: CNAF: Login issues (certificate) Throughput always > 100MB/s with peaks of about 200MB/s Performance issues @ CCIN2P3: Easy to login Throughput about 12MB/s up to 30MB/s using webdav (100MB/s using iRods) Long discussion at ECC meeting: Waiting for feedbacks from Lyon Test proposed with FTS 180MB/s since Friday, thanks to the migration of the iRods server serving Virgo at CCIN2P3 in a 10Gb/s link Computing - VW20180709 15

  16. Low latency analysis machines The number of machines devoted to MBTA have been doubled 160 320 cores Additional ~180 cores have been devoted to detchar and cWB (Condor farm) Quick investment and installation made possible by the fact that a virtual machine architecture has been tested and approved for low latency analysis Computing - VW20180709 16

  17. CentOS7 (CL7) + Python F. Carbognani slides ctrlXX and farmnXX machine upgraded to CentOS point release 7.5 (from 7.4) without known impacts. Discovered a possible minor problem on CMT built executables under investigation but the assumption that upgrades of the OS minor index can be done transparently for the Virgo Software seems holding. Transition from tcsh to bash seems stabilized (at least no problems reported by users in the last weeks). Python installation upgraded (ex: gracedb client 1.29.dev1) for OPA Challenge. Anaconda distribution + pip based installations seems working fine l l l l 1 Computing - VW20180709 17

  18. Software releases plans F. Carbognani slides Software release cycles toward O3 have started following the guidelines defined into the Virgo Software Release Management Process dedicated document The main driver for the release cycles is stay possibly synchronized with Virgo Commissioning Runs (CR), OPA challenges, common Ligo/Virgo Engineering Runs (ER) and with foreseen Code Freezes. Software releases timeline: VCS-10.0: 18th May (CL7 wrap-up / 1st CR) VCS-10.1: 29th June (snapshot for 1stOPA challenge) VCS-10.2: 10th Sept (snapshot for 2ndOPA challenge, TBC) VCS-11.0: 1st October (Code Freeze in view of ER 13, see G1800889) l l l 2 Computing - VW20180709 18

  19. Software releases plans F. Carbognani slides The minor software release VCS-10.1 made a snapshot of the code for the 1stOPA challenge. Not much difference respect to 10.0 since few packages were provided for /virgoApp upload. VCS-11.0 will manage the foreseen1stOct milestone (Code Freeze): Software features frozen, software under formal review. from there: Fixes approved by common Ligo/Virgo SCCB New features approved by Runs committees Be ready for much more pressing request for timely code freeze from your (hated?) Software Manager l l l 3 Computing - VW20180709 19

  20. Virgo Software Distributions F. Carbognani slides A CVMFS based Virgo Software Release distributions mechanism is being tested. A test setup is up and running in Cascina from swtest6 (server machine) to swtest3 (client machine). A production machine equipped with sufficient disk area is being prepared and interaction with Virgo Computing Centers will soon start. Experimentations with containers technology (docker and singularity) for RefOS implementation to be used as software distributions and software testing environment are ongoing. l l l 4 Computing - VW20180709 20

  21. Offline analysis Unresolved long standing issue about the under-use of Virgo computing resources at CCs Only CW is using substantially CNAF and less regularly Nikhef Other pipelines (mainly in CCIN2P3) have a negligible CPU impact Computing - VW20180709 21

  22. CPU hours (52 weeks) Network of Computing Centres SUGAR-SU 1% NEMO-UWM 5% IUCAA 2% Sept 2016-Sept2017 VIRGO.NL 1% LIGO-LHO 3% VIRGO.CCIN2P3 0% LIGO-LLO 2% VIRGO.POLGRAV 0% ARCCA-CDF 4% LIGO.OSG 3% VIRGO.OSG 1% ATLAS-AEI 51% Other 6% Virgo ~6-8% LIGO Scientific Collaboration: 1263 collaborators (including GEO) 20 countries 9 computing centres ~1.5 G$ of total investment Virgo Collaboration: 343 collaborators 6 countries 5 computing centres ~0.42 G of total investment KAGRA Collaboration: 260 collaborators 12 countries 5 computing centres ~16.4 G of construction costs VIRGO.CNAF 4% LIGO-CIT 23% Offline Computing 22

  23. Computing load distribution The use of the CNAF is almost mono- analysis Diversification is give by OSG access cw cwcw cw cw cw cw cw cw cw cwcw Offline Computing 23

  24. Future: Increase of CPU power needs O3 run will start in February 2019 and will last for 1 year We are signing a new agreement with LIGO that is forcing us to provide about 20-25% of the whole computing energy 3 detectors: Non linear increment for the coherent pipelines HL, HV, LV, HLV 4 detectors: At the end of O3, probably KAGRA will join the science run Some of the pipelines will be tested (based) on GPUs Offline Computing 24

  25. How to fill up the resources HS06 200000 180000 Ok, let suppose to find the money to provide the 25% virgo quota Are we able to use it? Within a parallel investigation made for INFN, I asked to (Italian) DA people to project their needs/intentions in the next years With a series of caveat the projection is pyCBC-OSG 160000 cWB 140000 CW O4 GRB 120000 Data taking 100000 O3 80000 60000 40000 20000 0 2019 2020 2021 2022 2023 2024 2025 2026 Offline Computing 25

  26. HPC resources Numerical relativity is now a key element for the realisation of BNS templates In Virgo there are important activities thanks to the groups in Torino and in Milano Bicocca They intensively uses the CINECA resources within the CSN4 framework and through some grant Being them structurally participating to the DA of LV we need to provide computing resources: Requests: 2018-2019: ------------------ 2M GPU hours (e.g. on galileo @ cineca) [i] 6M CPU hours on Intel BDW/SKL (Marconi A1/A3 @ cineca) [ii] 50 Tb hard drive space [iii] yrs 2020-2023: ---------------------- 6M GPU hours per year [i] 6M GPU hours per year [ii] 150 Tb hard drive space [iii] Offline Computing 26

  27. Hence We need to contribute with new resources (MOU constrain) But we are unable to use them or to allow access to LIGO-like pipelines We need a solution to this situation Replicate LIGO environment Local installation of Condor as wrapper of the local batch system New Workload Manager + Data manager DIRAC Positive WM tests Data management unclear to me Data transfer still to be tested We are progressing too slow LIGO is going in the direction of using Rucio as DM and remain in Condor as WM Future development: In LIGO tests of using Singularity + CVMS for virtualisation and distribution That is a technology pursued at LHC supported by our CCs We need to invest on that A post-doc is under recruitment @ INFN-Torino to be engaged in that activity Computing - VW20180709 27

  28. Cost model In the current cost model EGO reimburses the costs at the French and Italian CCs, Nikhef and Polgrav are contributing in kind This model balances the costs between Italy and France but it charges the largest fraction of costs on EGO shoulders To move the bill back on the funding agencies doesn t work, because INFN will be by far the largest contributor without further balancing We need to find a cost model that shares in a more clever way the costs for computing within Virgo: It should take in account the number of authors It should take in account the global investment in computing (DAQ, low latency analysis, storage, human resources) It should force the Virgo members to account their resources Offline Computing 28

  29. Cost model This is not a definitive proposal, but we need to open a discussion also in front of the EGO Council and the institutions We have to define our standards: Needs and standard cost of the storage Needs and standard costs for CPU Accessibility requirements(LIGO compliant, Virgo compliant, ) Accountability requirements (resources need to be accountable) Human resource requirements (a collaboration member MUST be the interface) Compute a Virgo standard cost per author Each institution in Virgo has to provide in kind and/or in money resources proportional to the number of authors according to the standard figures we defined Ghosts are expensive! Over-contributions can be considered as a net contribution to the experiment by the institution; obviously we must take in account also the direct contribution to EGO Offline Computing 29

  30. 2019: who pays? In these days INFN is defining the investments in TIER1 (CNAF) The decision will be in September We have 30kHS06 In the original plan it was expected to jump to 70-80kHS06 Difficult in terms of cost and efficiency What we ask for 2019? Tape is defined (1PB) Disk: We have 656TB (578TB occupied) O3: 1MB/s Virgo of RDS + 2MB/s LIGO = 90TB Disk for DA Suggest to request a pledge of 780TB (+124TB) CPU? 30kHS06, 40kHS06, ? CPU Computing - VW20180709 30

  31. Organisation I have appointed as VDAS coordinator in emergency in Sept 2015 Since the beginning of my mandate, ended in Sept. 2017, I highlighted the need to improve the organisation of the VDAS (changing also the name!) Recently I proposed a new structure of VDAS, dividing it in Working Packages and identifying the main projects This organisation has been shown at the Virgo week in April and proposed to VSC Decision pending and a suggestion from this committee is more than welcome ECC meeting Offline Computing 31

  32. Local area Wide area Spokesperson / EGO Direcor DA coord LIGO interface Virgo Computing Coordination Computing Centres Commissioning coordinator Offline Computing Local computing and storage infrastructure management & strategy Analysis pipelines GRID compatibility ECC meeting Low latency architecture Online Computing Management Local Computing Infrastructure Sw management subsystem Offline Computation needs evaluation Networking infrastructure Dedicated hw management Cyber-security Data Management System Local data allocation and management & strategy Local services ( federated login, web servers ... ) Bulk Data Transfer 32 Offline Computing

  33. Reference persons at CCs In addition, it is crucial to have a MEMBER OF THE VIRGO COLLABORATION, fully devoted to computing issues, physically or virtually located at each computing center, acting as reference persons Post-doc level He/she participates to the collaboration life (computing meetings, DA meetings, ), but he/she has the duty to solve (or to facilitate the solution) of all the issues related to the use of that CC by the collaboration ECC meeting Offline Computing 33

  34. Conclusions Computing is a crucial part of the detector Computing is a key element of the LIGO-Virgo agreement It is time for the collaboration (and for all the funding agencies) to take it seriously As stated at the last VSC and reported in the minutes, today I consider concluded the extra-time I devoted to my appointment as VDAS coordinator (officially ended in Sept. 2017) I hope that a VDAS coordinator will exist anymore, but I wish all the best to my successor Computing - VW20180709 34

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#