CROSSJACK
This project focuses on creating a platform to help users optimize their job metrics efficiently. With a goal to provide simple and consolidated job feedback, the platform aims to be maintainable, easy to implement new features, and provide specific job metrics. Utilizing technologies like Grafana, Prometheus, and more, the project aims to enhance job visualization and user experience.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
CROSSJACK Job Level Metric Accounting/Visualisation for SCARF
Requirements Main goal for new platform is to help users optimise their jobs Job feedback must be simple and consolidated Platform must be maintainable and easy to implement new features Currently no easy way to get metrics specific to your job ...
SLURM commands sacct seff scontrol
Chosen Technologies Grafana for visualisation Prometheus with cgroups/node exporters for data collection/accounting Jobstats for job feedback Demo
Whats next? Multi node setup GPU metrics and more Productionise
Links: Jobstats - https://github.com/PrincetonUniversity/jobstats Node Exporter - https://github.com/prometheus/node_exporter Cgroups Exporter - https://github.com/treydock/cgroup_exporter Documentation confluence