[Smartswitch] Fix incorrect reporting of data plane and control plane by DPU#606
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@vvolam @rameshraghupathy Please review |
| def update_dpu_state(self, key, state): | ||
| """ | ||
| Update specific DPU state fields in chassisStateDB using the given key. | ||
| If state is 'down', delete the table first before setting new values. |
There was a problem hiding this comment.
@gpunathilell @vvolam
The original intent is to "Update specific DPU state fields in chassisStateDB using the given key." and not to delete the table. Doing so, will wipe out the "CP, DP" details that was there around the time of midplane failure. The intent was to preserved that data which has important information that will help debug easily.
There was a problem hiding this comment.
@gpunathilell I think you are deleting only the entry in the table right?
There was a problem hiding this comment.
The state is only changed to down at this point, We are not deleting any entries
There was a problem hiding this comment.
@gpunathilell Then can you fix the comment accordingly? It says you are deleting the table first.
| if not self.chassis_state_db: | ||
| self.chassis_state_db = daemon_base.db_connect("CHASSIS_STATE_DB") | ||
|
|
||
| # If state is down, delete the table first |
There was a problem hiding this comment.
@gpunathilell @vvolam We shouldn't delete the entire table. If there is a need to delete the table (I need to understand the requirement fully), save the CP/DP values and restore them.
There was a problem hiding this comment.
@rameshraghupathy I guess Gagan is deleting the table entry here and not entire table as per my understanding. I think entry is deleted here and added at line 799?
@gpunathilell, But why do we have to delete entry and updating entry is not helping here?
There was a problem hiding this comment.
The state is only changed to down at this point, We are not deleting any entries
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull Request Overview
This PR fixes the incorrect reporting of data plane and control plane states for DPUs by updating the state update logic and enhancing test coverage. Key changes include new tests for various DPU state management scenarios, additions to the mock framework for DB connector retrieval, and modifications to the state update logic in the chassisd script to correctly handle CHASSIS_STATE_DB updates.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sonic-chassisd/tests/test_dpu_chassisd.py | Added tests validating deletion, state tracking, duplicate update avoidance, and handling of state changes for DPUs. |
| sonic-chassisd/tests/test_chassisd.py | Added a test to verify midplane state change updates (module updater). |
| sonic-chassisd/tests/mock_swsscommon.py | Introduced getDbConnector() in the mock DB connector for consistency with production code. |
| sonic-chassisd/scripts/chassisd | Updated DPU state and module updater logic including the use of shared constants and refined state update conditions. |
| update_required = False | ||
|
|
||
| self.dpu_state_updater.update_state() | ||
| for s in selectable: | ||
| result = s.pop() | ||
| update_required = True # If there is any selectable object, we need to update the state |
There was a problem hiding this comment.
The 'update_required' flag is being reinitialized for each selectable within the loop instead of accumulating conditions from all objects. Consider initializing 'update_required' to false before the loop and then OR-ing or accumulating the condition for each selectable to ensure proper state update decision.
There was a problem hiding this comment.
The update_required is set this way, because it shows a clear distinction on when we perform the actual update, and when the update is disabled, I think implementing it this way is more clearer to understand
vvolam
left a comment
There was a problem hiding this comment.
LGTM other than minor comment fix
| def update_dpu_state(self, key, state): | ||
| """ | ||
| Update specific DPU state fields in chassisStateDB using the given key. | ||
| If state is 'down', delete the table first before setting new values. |
There was a problem hiding this comment.
@gpunathilell Then can you fix the comment accordingly? It says you are deleting the table first.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@vvolam Please merge this PR |
|
Cherry-pick PR to 202505: #626 |
Description
Motivation and Context
Change was done so that the DPU states are updated on the following conditions:
midplane_state-> The switch erases theDPU_STATEtable when the midplane state changes to down, So as to remove the stale states of control plane and data planecontrol_plane_stateanddata_plane_stateSubscribe to theDPU_STATEtable, if there is an update, where onlymidplane_stateis updated to the table, (meaning there is no control plane or data plane state present in the table). Then update the states back to the table, (If DPUs are online, this means that the states have been removed by switchHow Has This Been Tested?
Set midplane interface to down on DPU:
ifconfig eth0-midplane down-> Observe that control plane and dataplane states are set to downifconfig eth0-midplane up-> Observe that control plane and dataplane states are added backAdditional Information (Optional)