Skip to content

Sensormond changes to support configurable time for logging threshold and better handling of fatal signals#600

Merged
judyjoseph merged 8 commits into
sonic-net:masterfrom
gregoryboudreau:gregoryboudreau/sensormond_enhancements
Jul 22, 2025
Merged

Sensormond changes to support configurable time for logging threshold and better handling of fatal signals#600
judyjoseph merged 8 commits into
sonic-net:masterfrom
gregoryboudreau:gregoryboudreau/sensormond_enhancements

Conversation

@gregoryboudreau

@gregoryboudreau gregoryboudreau commented Mar 21, 2025

Copy link
Copy Markdown
Contributor

Description

Two quality of life improvements for sensormond:

  1. Allow for the user to configure timeout for printing time log warning (keep default at 30), this is set in the the platform_env.conf through the SENSORMOND_WARNING_TIME value
  2. Cascade stop events (from signal handler) into voltage and current updaters to allow for more graceful checking during the update process, current setup has 10 second timeout and given the updaters can take some time to run can result in SIGKILLs getting sent during operations.

Motivation and Context

With a lot of sensors, the logs can be filled with warnings about exceeding the time to read even if it is expected by the vendor. Additionally, with this longer time, the SIGTERM can not be checked in enough time before supervisorctl falls back to using a SIGKILL

How Has This Been Tested?

Tested on Cisco Smartswitch, time configuration now no longer results in logs being triggered every loop and tested killing w/ supervisorctl stop sensormond and no longer see any error logs from system that were being hit when SIGKILL interrupted reads from driver.

Additional Information (Optional)

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@bmridul bmridul self-requested a review March 21, 2025 21:39
@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@arlakshm

arlakshm commented Apr 2, 2025

Copy link
Copy Markdown
Contributor

update platform specific file with the new timeout value iso of updating the supervisord.conf

@anamehra

anamehra commented Apr 2, 2025

Copy link
Copy Markdown
Contributor

As discussed, please checl platform_env.conf in platform dir to provide any platform config.

@rlhui

rlhui commented Apr 16, 2025

Copy link
Copy Markdown

@kenneth-arista will check as well.

@judyjoseph

Copy link
Copy Markdown
Contributor

@bmridul please check the earlier comment we added, Anand has mentioned it in PR

@judyjoseph

Copy link
Copy Markdown
Contributor

@bmridul @anamehra please can you discuss on this internally and get back

@gregoryboudreau

Copy link
Copy Markdown
Contributor Author

ack, am following up internally, will update this PR by next week

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@abdosi

abdosi commented May 21, 2025

Copy link
Copy Markdown
Contributor

@judyjoseph : Please review

@gregoryboudreau

Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Commenter does not have sufficient privileges for PR 600 in repo sonic-net/sonic-platform-daemons

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@abdosi

abdosi commented Jun 4, 2025

Copy link
Copy Markdown
Contributor

@bmridul : code coverage checker failing.

@rlhui

rlhui commented Jun 25, 2025

Copy link
Copy Markdown

@gregoryboudreau , @bmridul , please check code coverage, thanks

@gregoryboudreau

Copy link
Copy Markdown
Contributor Author

@gregoryboudreau , @bmridul , please check code coverage, thanks

Missed this, will update!

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld

Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@abdosi

abdosi commented Jul 16, 2025

Copy link
Copy Markdown
Contributor

@judyjoseph : to help review on this.

Comment thread sonic-sensormond/scripts/sensormond

@judyjoseph judyjoseph left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@judyjoseph judyjoseph merged commit e930a64 into sonic-net:master Jul 22, 2025
5 checks passed
@prabhataravind

Copy link
Copy Markdown

@kperumalbfn could you please approve this for 202411? @rameshraghupathy for viz

@kperumalbfn

Copy link
Copy Markdown

@prabhataravind @rameshraghupathy This is common for all SKUs, do we need this in 202411? Could you please check smartswitch SKUs with this change.

@rameshraghupathy

rameshraghupathy commented Jul 30, 2025

Copy link
Copy Markdown
Contributor

@prabhataravind @rameshraghupathy This is common for all SKUs, do we need this in 202411? Could you please check smartswitch SKUs with this change.

@kperumalbfn @prabhataravind This is not must as the sonic-mgmt already skipping this error message. As long as we have that PR included we don't need this. More over there is a platform counterpart to it and that is not committed to 202411 branch. So, we can skip this.

@prabhataravind

Copy link
Copy Markdown

@prabhataravind @rameshraghupathy This is common for all SKUs, do we need this in 202411? Could you please check smartswitch SKUs with this change.

@kperumalbfn @prabhataravind This is not must as the sonic-mgmt already skipping this error message. As long as we have that PR included we don't need this. More over there is a platform counterpart to it and that is not committed to 202411 branch. So, we can skip this.

@rameshraghupathy what is the other PR in sonic-mgmt to skip this error? Could you please add it here?

@yejianquan

yejianquan commented Jul 30, 2025

Copy link
Copy Markdown
Contributor

@prabhataravind @rameshraghupathy This is common for all SKUs, do we need this in 202411? Could you please check smartswitch SKUs with this change.

@kperumalbfn @prabhataravind This is not must as the sonic-mgmt already skipping this error message. As long as we have that PR included we don't need this. More over there is a platform counterpart to it and that is not committed to 202411 branch. So, we can skip this.

@rameshraghupathy what is the other PR in sonic-mgmt to skip this error? Could you please add it here?

Hi @prabhataravind @rameshraghupathy , please also confirm whether this change is a must to go to 202505

@rameshraghupathy

Copy link
Copy Markdown
Contributor

@prabhataravind @rameshraghupathy This is common for all SKUs, do we need this in 202411? Could you please check smartswitch SKUs with this change.

@kperumalbfn @prabhataravind This is not must as the sonic-mgmt already skipping this error message. As long as we have that PR included we don't need this. More over there is a platform counterpart to it and that is not committed to 202411 branch. So, we can skip this.

@rameshraghupathy what is the other PR in sonic-mgmt to skip this error? Could you please add it here?
@prabhataravind @yejianquan Here is the PR: sonic-net/sonic-mgmt#17263 in sonic-mgmt to skip this error
Hi @prabhataravind @rameshraghupathy , please also confirm whether this change is a must to go to 202505
@yejianquan @prabhataravind We don't need this PR in 202411.

@yejianquan

yejianquan commented Jul 31, 2025

Copy link
Copy Markdown
Contributor

remove the 202505 request label, it's not needed in 202411, suppose 202505 don't need it, please add label again if needed

@gregoryboudreau

Copy link
Copy Markdown
Contributor Author

it addresses the underlying failure that the sonic-mgmt ignore modification covers, if not a major headache to bringing back I'd say it's preferable to have it in 202411 and 202505 but not a must have.

Comment thread sonic-sensormond/scripts/sensormond
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.