Skip to content

prometheus-slurm-exporter GPU Accounting Error #124

@ahmedfathymahmoud

Description

@ahmedfathymahmoud

Hi

I noticed that prometheus-slurm-exporter fails with Slurm 24.11.5 when using -gpus-acct. It just crashes because of a parsing error

Steps to Reproduce:

1- Run the exporter with GPU accounting enabled and with Slurm 24.11.5:
prometheus-slurm-exporter -gpus-acct
2- Attempt to access the metrics server.

$ curl localhost:8080/metrics
curl: (52) Empty reply from server

3- The server will exit
FATA[0072] exit status 1 source="gpus.go:101"

Cause:

1- Slurm 24.11.5 changed sacct --format=Allocgres to AllocTRES
Using the same sacct command used in gpus.go will result in an error:

$ sacct -a -X --format=Allocgres --state=RUNNING --noheader --parsable2
sacct: fatal: AllocGRES has been removed, please use AllocTRES

2- I think the code expects a different format from sinfo -o "%n %G"
it outputs gpu:<type>:X(S:0-1) instead of gpu:X.

Let me know if you need any additional details from my side

Thanks,
Fathi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions