Sensormond changes to support configurable time for logging threshold and better handling of fatal signals#600
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
update platform specific file with the new timeout value iso of updating the supervisord.conf |
|
As discussed, please checl platform_env.conf in platform dir to provide any platform config. |
|
@kenneth-arista will check as well. |
|
@bmridul please check the earlier comment we added, Anand has mentioned it in PR |
|
ack, am following up internally, will update this PR by next week |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@judyjoseph : Please review |
|
/azp run |
|
Commenter does not have sufficient privileges for PR 600 in repo sonic-net/sonic-platform-daemons |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@bmridul : code coverage checker failing. |
|
@gregoryboudreau , @bmridul , please check code coverage, thanks |
Missed this, will update! |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…new stop signal handling
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@judyjoseph : to help review on this. |
|
@kperumalbfn could you please approve this for 202411? @rameshraghupathy for viz |
|
@prabhataravind @rameshraghupathy This is common for all SKUs, do we need this in 202411? Could you please check smartswitch SKUs with this change. |
@kperumalbfn @prabhataravind This is not must as the sonic-mgmt already skipping this error message. As long as we have that PR included we don't need this. More over there is a platform counterpart to it and that is not committed to 202411 branch. So, we can skip this. |
@rameshraghupathy what is the other PR in sonic-mgmt to skip this error? Could you please add it here? |
Hi @prabhataravind @rameshraghupathy , please also confirm whether this change is a must to go to 202505 |
|
|
remove the 202505 request label, it's not needed in 202411, suppose 202505 don't need it, please add label again if needed |
|
it addresses the underlying failure that the sonic-mgmt ignore modification covers, if not a major headache to bringing back I'd say it's preferable to have it in 202411 and 202505 but not a must have. |
Description
Two quality of life improvements for sensormond:
SENSORMOND_WARNING_TIMEvalueMotivation and Context
With a lot of sensors, the logs can be filled with warnings about exceeding the time to read even if it is expected by the vendor. Additionally, with this longer time, the SIGTERM can not be checked in enough time before supervisorctl falls back to using a SIGKILL
How Has This Been Tested?
Tested on Cisco Smartswitch, time configuration now no longer results in logs being triggered every loop and tested killing w/
supervisorctl stop sensormondand no longer see any error logs from system that were being hit when SIGKILL interrupted reads from driver.Additional Information (Optional)