Latest Updates from DUNE Software and Computing

Slide Note
Embed
Share

Explore the latest developments in DUNE Software and Computing including new websites, hardware upgrades, Redmine sites, CILogon certificates, and important announcements on AFS shutdown at Fermilab. Stay informed about changes and enhancements within the DUNE project.


Uploaded on Sep 10, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. DUNE Software and Computing News and Announcements Tom Junk DUNE Software and Computing General Meeting February 2, 2016

  2. New Web Sites dune-data.fnal.gov - Monte Carlo Challenge 5.0 and future MC MC samples and tiers - Data Files from the 35-ton prototype File list automatically updated from file transfer script samweb usage tips tells you how to access files! dune-young.hep.net - Content copied from lbne-young.hep.net (still not up to date) 02.02.16 Tom Junk | DUNE S&C General News 2

  3. New Build Node dunebuild01.fnal.gov 16 Cores! (AMD Opteron 6320) 32 GB of RAM, 5 GB of swap To be used for building code only (we ll watch for misuse) mrb i j16 now gives you a big boost in speed dCache disks are not mounted /dune/data and /dune/data2 however are still mounted. /build/<makeyourowndirectory> has 2.8 TB in it. Not clear how to use this effectively. Let Tom know if you need something different on it. 16 Cores was chosen based on Lynn Garren s build speed test: https://indico.fnal.gov/contributionDisplay.py?contribId=9&confId=10257 With builds using BlueArc (/dune/app), more cores than 16 gives diminishing returns in speed due to disk i/o bottlenecks. That and the fact that machines with > 16 cores are even less available than the one we got with 16 cores. 02.02.16 Tom Junk | DUNE S&C General News 3

  4. New Redmine Sites duneexo Exotic Physics with DUNE dunefgt Fine-Grained Tracker dunelbl Long-Baseline Physics WG dunendk Nucleon Deay HighLAND Analysis Tool WA105 Dual-Phase protoDUNE 02.02.16 Tom Junk | DUNE S&C General News 4

  5. CILogon Certificates Replacing OSG Grid certificates DUNE VO user entries with OSG Grid Certificates now given entries for CILogon certificates Current OSG Grid certificates remain valid until their expiration no need to hurry and get a replacement CILogon certificate but the next time it s refreshed there will be a new procedure. Eileen and Anne have contacted certificate users of the docdb s and gave instructions for obtaining and using CILogon certificates with the docdb s. CILogon will replace KCA certificates too. - jobsub client called kx509 to generate short-lived certificates using the user s Kerberos ticket. - other uses, like SAM, required the user to execute kx509 or cert.sh (which calls kx509) to get a certificate. get- - Jobsub use of CILogon to be transparent to the users 02.02.16 Tom Junk | DUNE S&C General News 5

  6. AFS at Fermilab is being shut down Feb. 25, 2016 Web sites at /afs/fnal.gov/files/expww are migrated to the NFS storage area /web/sites/. Available on FNALU and dunegpvm01 (but not other dunegpvm s) Home areas in /afs/fnal.gov/home/room[1,2,3]/username being replaced with other networked storage. I was never fond of our AFS home areas anyhow Very small quotas in the home area: 300 MB (!) - Authentication token which expires after 26 hours caused user confusion. - It has its own syntax for managing. Want to know your quota? fs lq. - Not available on grid workers (wouldn t want that anyhow for the replacement.) Professional web sites formerly in ~/public_html now in /publicweb/<letter>/username, mounted on the dunegpvm s. Accessed via http://home.fnal.gov/~username Not every user has one. Put in a Service Desk Ticket to get one if you need your own web site @FNAL. - 02.02.16 Tom Junk | DUNE S&C General News 6

  7. lbnegpvm*.fnal.gov dunegpvm*.fnal.gov Users were in the lbne group active users or recently active users given new accounts in the dune group New dunegpvm11 spun up with new group and new user list. No /lbne/data, /lbne/data2, /lbne/app mounts on new dune machine. Same areas are mounted under /dune Still have /pnfs/lbne mounted (needed as some files are accessible only that way). Current status: migrated lbnegpvm06 lbnegpvm10 to dunegpvm machines. Gave back dunegpvm11. lbnegpvm01 through lbnegpvm05 still old lbne-style machines. Giving users some time to remove lbne from their scripts. It s been a while though and we should just move ahead. Convenience names dunegpvm01 through dunegpvm05. 02.02.16 Tom Junk | DUNE S&C General News 7

  8. BlueArc Dismount on Grid Workers Affects us in particular! - /lbne/data, /lbne/data2 not mounted on dunegpvm6-10 machines, but still mounted on grid worker nodes. - /dune/data, /dune/data2 not mounted on grid worker nodes (!). These mount points were made after the decision to migrate away from BlueArc on the grid was taken. - Two ways to store your data: ifdh cp it to dCache: /pnfs/dune/persistent/users and /pnfs/dune/scratch/users Ask about tape-backed space! (We prefer SAM so the files won t get lost) ifdh cp the files to BlueArc (many people still do this). This too will be disabled! End of 2016 shutdown! 02.02.16 Tom Junk | DUNE S&C General News 8

  9. Metadata Changes Existing data tiers: raw simulated detector-simulated full-reconstructed New data tier: sliced The slicer/stitcher input source only works on raw data limited number of data products it has to know how to slice and stitch. A new problem: The slicer/stitcher reformats events based on a software trigger definition. Do we need to store which trigger def was used in metadata? Tack it on the end of the detector type string? 02.02.16 Tom Junk | DUNE S&C General News 9

  10. A Good Run List Proposal So far only 35-ton has data and thus needs a good-run list. One person s bad data is another person s good data. Alex Himmel suggested it would make SAM dataset queries simpler if good-run status were part of the metadata Can request a new good-run metadata field: arbitrary string so we can encode various kinds of goodness or badness. CDF had good run lists that were distributed as root trees and text files. Didn t make sense to limit public datasets to a particular good- run set because runs would be re-classified and it takes a long time to reprocess everything. Need curation of the good run list. Who decides? Shift tool? Data Quality Team needed to make judgments. For 35-ton, we probably want analyzers to be tightly coupled to the data taking. Label special data runs for special analyses and record run numbers and ranges that are intended for subsequent analyses. 02.02.16 Tom Junk | DUNE S&C General News 10

  11. FIFE News Summer 2016 FIFE Workshop during the week of June 20 Fermilab GPGrid new features: partitionable slots, priority queueing instead of quotas: https://fermipoint.fnal.gov/organization/cs/scd/_layouts/15/WopiFrame.aspx?sourcedoc=/organization/cs/scd/CS%20Liaison%20Meet ings%20Library/CSLiaison_01_13_16.pdf&action=default Job Efficiency Links http://web1.fnal.gov/scoreboard/daily_reports/fife-efficiency.daily.latest http://web1.fnal.gov/scoreboard/weekly_reports/fife-efficiency.weekly.latest http://web1.fnal.gov/scoreboard/monthly_reports/fife-efficiency.monthly.latest 02.02.16 Tom Junk | DUNE S&C General News 11

  12. Job Resource Limits Enforced on FNAL GPGrid Last year the grid was more forgiving about going over - time limits (not CPU, wall-clock time is what counts) - virtual memory size - disk space used But now these limits are enforced. See the page https://cdcvs.fnal.gov/redmine/projects/dune/wiki/Submitting_Jobs_at_Fermilab For examples of how to ask for resources and links to more documentation. What happens if your job goes over the limit? It doesn t get killed, but rather gets Held. To find out what went wrong, jobsub_q --held --user=<username> You can use fifemon.fnal.gov to monitor how many jobs you have in each state. Policy may be different on non-FNAL OSG sites. 02.02.16 Tom Junk | DUNE S&C General News 12

  13. Reminder: DAQ Workshop at CERN Dates: Feb. 25-26 at CERN https://indico.fnal.gov/conferenceDisplay.py?confId=11372 DAQ Hardware, Software, and Offline Computing Infrastructure Ask Maxine (maxine@fnal.gov) about site access for non-CERN users. 02.02.16 Tom Junk | DUNE S&C General News 13

Related


More Related Content