Hi
I noticed that prometheus-slurm-exporter fails with Slurm 24.11.5 when using -gpus-acct. It just crashes because of a parsing error
Steps to Reproduce:
1- Run the exporter with GPU accounting enabled and with Slurm 24.11.5:
prometheus-slurm-exporter -gpus-acct
2- Attempt to access the metrics server.
$ curl localhost:8080/metrics
curl: (52) Empty reply from server
3- The server will exit
FATA[0072] exit status 1 source="gpus.go:101"
Cause:
1- Slurm 24.11.5 changed sacct --format=Allocgres to AllocTRES
Using the same sacct command used in gpus.go will result in an error:
$ sacct -a -X --format=Allocgres --state=RUNNING --noheader --parsable2
sacct: fatal: AllocGRES has been removed, please use AllocTRES
2- I think the code expects a different format from sinfo -o "%n %G"
it outputs gpu:<type>:X(S:0-1) instead of gpu:X.
Let me know if you need any additional details from my side
Thanks,
Fathi
Hi
I noticed that prometheus-slurm-exporter fails with Slurm 24.11.5 when using -gpus-acct. It just crashes because of a parsing error
Steps to Reproduce:
1- Run the exporter with GPU accounting enabled and with Slurm 24.11.5:
prometheus-slurm-exporter -gpus-acct2- Attempt to access the metrics server.
3- The server will exit
FATA[0072] exit status 1 source="gpus.go:101"Cause:
1- Slurm 24.11.5 changed sacct --format=Allocgres to AllocTRES
Using the same sacct command used in
gpus.gowill result in an error:2- I think the code expects a different format from
sinfo -o "%n %G"it outputs
gpu:<type>:X(S:0-1)instead ofgpu:X.Let me know if you need any additional details from my side
Thanks,
Fathi