Skip to content

Commit 9047875

Browse files
committed
Fix MIGRATE REGION falsely reported complete on ConfigNode leader switch
When the ConfigNode leader is gracefully stopped (or loses leadership) while AddRegionPeerProcedure is waiting for the coordinator DataNode's AddRegionPeer task to finish, RegionMaintainHandler.waitTaskFinish() is interrupted and returns PROCESSING. The DO_ADD_REGION_PEER state previously treated PROCESSING as a no-op terminal state (return Flow.NO_MORE_STATE), silently ending the AddRegionPeerProcedure without success or rollback. The parent RegionMigrateProcedure had already persisted at CHECK_ADD_REGION_PEER, so the new leader resumed there directly. Its isDataNodeContainsRegion() check only inspects the partition table's location list, which is written at CREATE_NEW_REGION_PEER (long before the snapshot finishes transferring). It therefore passed, the source replica was removed, and the migration was declared a success while the destination replica was still in Adding state and had not received the snapshot. Queries against the region returned incorrect results during the gap (observed: ~17 min until the destination became active). Fix: in the PROCESSING branch, stay at DO_ADD_REGION_PEER and persist it (HAS_MORE_STATE) instead of ending. After recovery the new leader re-enters DO_ADD_REGION_PEER and re-polls the coordinator task until it truly reaches SUCCESS or FAIL. The re-poll is idempotent: the isStateDeserialized() guard skips re-submitting after a restart, and the coordinator DataNode dedups by taskId (putIfAbsent) even on a same-process re-execute, so the AddRegionPeer task is never submitted twice. If the coordinator crashed and lost its task table, the poll returns TASK_NOT_EXIST and falls through to the existing FAIL/rollback path. Add cnLeaderSwitchDuringDoAddPeerTest for each consensus protocol (IoTConsensus, IoTConsensusV2 batch/stream, Ratis). Existing daily ConfigNode-crash ITs all use stopForcibly() (SIGKILL), which kills the process before it can run the PROCESSING branch; the new test uses a graceful stop() (SIGTERM) of the leader among 3 ConfigNodes so the interrupted PROCESSING path is actually exercised across a real leader switch.
1 parent a725ded commit 9047875

8 files changed

Lines changed: 154 additions & 3 deletions

File tree

integration-test/src/test/java/org/apache/iotdb/confignode/it/regionmigration/IoTDBRegionOperationReliabilityITFramework.java

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,27 @@ public class IoTDBRegionOperationReliabilityITFramework {
119119
LOGGER.info("Cluster has been restarted");
120120
};
121121

122+
/**
123+
* Gracefully stop (SIGTERM, not a forcible kill) the ConfigNode that hit the kill point, then
124+
* restart it. A graceful stop lets the ConfigNode run its shutdown hooks, which interrupts the
125+
* in-flight region-operation procedure worker. This reproduces a leader switch / graceful
126+
* shutdown during AddRegionPeer: the interrupted {@code waitTaskFinish()} returns PROCESSING
127+
* while the AddRegionPeerTask is still running on the coordinator DataNode. The procedure must
128+
* NOT silently end here, otherwise the parent RegionMigrateProcedure would falsely treat AddPeer
129+
* as complete and remove the source replica before the destination replica is actually Running.
130+
* See AddRegionPeerProcedure#executeFromState DO_ADD_REGION_PEER PROCESSING branch.
131+
*/
132+
public static Consumer<KillPointContext> actionOfGracefullyRestartConfigNode =
133+
context -> {
134+
Assert.assertTrue(context.getNodeWrapper() instanceof ConfigNodeWrapper);
135+
context.getNodeWrapper().stop();
136+
LOGGER.info("ConfigNode {} gracefully stopped.", context.getNodeWrapper().getId());
137+
Assert.assertFalse(context.getNodeWrapper().isAlive());
138+
context.getNodeWrapper().start();
139+
LOGGER.info("ConfigNode {} restarted.", context.getNodeWrapper().getId());
140+
Assert.assertTrue(context.getNodeWrapper().isAlive());
141+
};
142+
122143
@Before
123144
public void setUp() throws Exception {
124145
EnvFactory.getEnv()
@@ -155,6 +176,28 @@ public void successTest(
155176
killNode);
156177
}
157178

179+
public void successTestWithAction(
180+
final int dataReplicateFactor,
181+
final int schemaReplicationFactor,
182+
final int configNodeNum,
183+
final int dataNodeNum,
184+
KeySetView<String, Boolean> killConfigNodeKeywords,
185+
KeySetView<String, Boolean> killDataNodeKeywords,
186+
Consumer<KillPointContext> actionWhenDetectKeyWords,
187+
KillNode killNode)
188+
throws Exception {
189+
generalTestWithAllOptions(
190+
dataReplicateFactor,
191+
schemaReplicationFactor,
192+
configNodeNum,
193+
dataNodeNum,
194+
killConfigNodeKeywords,
195+
killDataNodeKeywords,
196+
actionWhenDetectKeyWords,
197+
true,
198+
killNode);
199+
}
200+
158201
public void failTest(
159202
final int dataReplicateFactor,
160203
final int schemaReplicationFactor,

integration-test/src/test/java/org/apache/iotdb/confignode/it/regionmigration/pass/daily/iotv1/IoTDBRegionMigrateConfigNodeCrashIoTV1IT.java

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import org.apache.iotdb.consensus.ConsensusFactory;
2929
import org.apache.iotdb.it.env.EnvFactory;
3030
import org.apache.iotdb.it.framework.IoTDBTestRunner;
31+
import org.apache.iotdb.itbase.category.ClusterIT;
3132
import org.apache.iotdb.itbase.category.DailyIT;
3233

3334
import org.junit.Before;
@@ -92,6 +93,29 @@ public void testCnCrashDuringDoAddPeer() throws Exception {
9293
KillNode.CONFIG_NODE);
9394
}
9495

96+
/**
97+
* Gracefully restart (not forcibly kill) the ConfigNode leader while AddRegionPeer is waiting for
98+
* the coordinator's task to finish (DO_ADD_REGION_PEER). The graceful shutdown interrupts the
99+
* procedure worker, so waitTaskFinish() returns PROCESSING. The migration must still finish
100+
* correctly after a leader switch: previously the AddRegionPeerProcedure silently ended on
101+
* PROCESSING, letting the parent procedure remove the source replica before the destination
102+
* replica was actually Running. Uses 3 ConfigNodes so a real leader switch happens.
103+
*/
104+
@Test
105+
// TODO temporarily also run on per-PR Cluster IT (1C3D) to validate; revert to DailyIT-only later
106+
@Category({DailyIT.class, ClusterIT.class})
107+
public void cnLeaderSwitchDuringDoAddPeerTest() throws Exception {
108+
successTestWithAction(
109+
1,
110+
1,
111+
3,
112+
2,
113+
buildSet(AddRegionPeerState.DO_ADD_REGION_PEER),
114+
noKillPoints(),
115+
actionOfGracefullyRestartConfigNode,
116+
KillNode.CONFIG_NODE);
117+
}
118+
95119
@Test
96120
public void cnCrashDuringUpdateCacheTest() throws Exception {
97121
successTest(

integration-test/src/test/java/org/apache/iotdb/confignode/it/regionmigration/pass/daily/iotv2/batch/IoTDBRegionMigrateConfigNodeCrashIoTV2BatchIT.java

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
import org.apache.iotdb.confignode.procedure.state.RegionTransitionState;
2727
import org.apache.iotdb.confignode.procedure.state.RemoveRegionPeerState;
2828
import org.apache.iotdb.it.framework.IoTDBTestRunner;
29+
import org.apache.iotdb.itbase.category.ClusterIT;
2930
import org.apache.iotdb.itbase.category.DailyIT;
3031

3132
import org.junit.Ignore;
@@ -79,6 +80,29 @@ public void testCnCrashDuringDoAddPeer() throws Exception {
7980
KillNode.CONFIG_NODE);
8081
}
8182

83+
/**
84+
* Gracefully restart (not forcibly kill) the ConfigNode leader while AddRegionPeer is waiting for
85+
* the coordinator's task to finish (DO_ADD_REGION_PEER). The graceful shutdown interrupts the
86+
* procedure worker, so waitTaskFinish() returns PROCESSING. The migration must still finish
87+
* correctly after a leader switch: previously the AddRegionPeerProcedure silently ended on
88+
* PROCESSING, letting the parent procedure remove the source replica before the destination
89+
* replica was actually Running. Uses 3 ConfigNodes so a real leader switch happens.
90+
*/
91+
@Test
92+
// TODO temporarily also run on per-PR Cluster IT (1C3D) to validate; revert to DailyIT-only later
93+
@Category({DailyIT.class, ClusterIT.class})
94+
public void cnLeaderSwitchDuringDoAddPeerTest() throws Exception {
95+
successTestWithAction(
96+
1,
97+
1,
98+
3,
99+
2,
100+
buildSet(AddRegionPeerState.DO_ADD_REGION_PEER),
101+
noKillPoints(),
102+
actionOfGracefullyRestartConfigNode,
103+
KillNode.CONFIG_NODE);
104+
}
105+
82106
@Test
83107
public void cnCrashDuringUpdateCacheTest() throws Exception {
84108
successTest(

integration-test/src/test/java/org/apache/iotdb/confignode/it/regionmigration/pass/daily/iotv2/stream/IoTDBRegionMigrateConfigNodeCrashIoTV2StreamIT.java

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import org.apache.iotdb.consensus.ConsensusFactory;
2929
import org.apache.iotdb.it.env.EnvFactory;
3030
import org.apache.iotdb.it.framework.IoTDBTestRunner;
31+
import org.apache.iotdb.itbase.category.ClusterIT;
3132
import org.apache.iotdb.itbase.category.DailyIT;
3233

3334
import org.junit.Before;
@@ -93,6 +94,29 @@ public void testCnCrashDuringDoAddPeer() throws Exception {
9394
KillNode.CONFIG_NODE);
9495
}
9596

97+
/**
98+
* Gracefully restart (not forcibly kill) the ConfigNode leader while AddRegionPeer is waiting for
99+
* the coordinator's task to finish (DO_ADD_REGION_PEER). The graceful shutdown interrupts the
100+
* procedure worker, so waitTaskFinish() returns PROCESSING. The migration must still finish
101+
* correctly after a leader switch: previously the AddRegionPeerProcedure silently ended on
102+
* PROCESSING, letting the parent procedure remove the source replica before the destination
103+
* replica was actually Running. Uses 3 ConfigNodes so a real leader switch happens.
104+
*/
105+
@Test
106+
// TODO temporarily also run on per-PR Cluster IT (1C3D) to validate; revert to DailyIT-only later
107+
@Category({DailyIT.class, ClusterIT.class})
108+
public void cnLeaderSwitchDuringDoAddPeerTest() throws Exception {
109+
successTestWithAction(
110+
1,
111+
1,
112+
3,
113+
2,
114+
buildSet(AddRegionPeerState.DO_ADD_REGION_PEER),
115+
noKillPoints(),
116+
actionOfGracefullyRestartConfigNode,
117+
KillNode.CONFIG_NODE);
118+
}
119+
96120
@Test
97121
public void cnCrashDuringUpdateCacheTest() throws Exception {
98122
successTest(

integration-test/src/test/java/org/apache/iotdb/confignode/it/regionmigration/pass/daily/ratis/IoTDBRegionMigrateConfigNodeCrashForRatisIT.java

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import org.apache.iotdb.confignode.procedure.state.AddRegionPeerState;
2626
import org.apache.iotdb.confignode.procedure.state.RemoveRegionPeerState;
2727
import org.apache.iotdb.it.framework.IoTDBTestRunner;
28+
import org.apache.iotdb.itbase.category.ClusterIT;
2829
import org.apache.iotdb.itbase.category.DailyIT;
2930

3031
import org.junit.Test;
@@ -76,6 +77,29 @@ public void cnCrashDuringDoAddPeerTest() throws Exception {
7677
KillNode.CONFIG_NODE);
7778
}
7879

80+
/**
81+
* Gracefully restart (not forcibly kill) the ConfigNode leader while AddRegionPeer is waiting for
82+
* the coordinator's task to finish (DO_ADD_REGION_PEER). The graceful shutdown interrupts the
83+
* procedure worker, so waitTaskFinish() returns PROCESSING. The migration must still finish
84+
* correctly after a leader switch: previously the AddRegionPeerProcedure silently ended on
85+
* PROCESSING, letting the parent procedure remove the source replica before the destination
86+
* replica was actually Running. Uses 3 ConfigNodes so a real leader switch happens.
87+
*/
88+
@Test
89+
// TODO temporarily also run on per-PR Cluster IT (1C3D) to validate; revert to DailyIT-only later
90+
@Category({DailyIT.class, ClusterIT.class})
91+
public void cnLeaderSwitchDuringDoAddPeerTest() throws Exception {
92+
successTestWithAction(
93+
1,
94+
1,
95+
3,
96+
2,
97+
buildSet(AddRegionPeerState.DO_ADD_REGION_PEER),
98+
noKillPoints(),
99+
actionOfGracefullyRestartConfigNode,
100+
KillNode.CONFIG_NODE);
101+
}
102+
79103
@Test
80104
public void cnCrashDuringUpdateCacheTest() throws Exception {
81105
successTest(

iotdb-core/confignode/src/main/i18n/en/org/apache/iotdb/confignode/i18n/ProcedureMessages.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1001,7 +1001,7 @@ public final class ProcedureMessages {
10011001
public static final String VALIDATE_TABLE_FOR_TABLE_WHEN_SETTING_PROPERTIES =
10021002
"Validate table for table {}.{} when setting properties";
10031003
public static final String WAITTASKFINISH_RETURNS_PROCESSING_WHICH_MEANS_THE_WAITING_HAS_BEEN_INTERRUPTED =
1004-
"waitTaskFinish() returns PROCESSING, which means the waiting has been interrupted, this procedure will end without rollback";
1004+
"waitTaskFinish() returns PROCESSING, which means the waiting has been interrupted (ConfigNode shutdown or leader change); the AddRegionPeer task is still running on the coordinator, this procedure will stay at DO_ADD_REGION_PEER and resume polling after recovery";
10051005

10061006
public static final String FAILED_TO_CREATE_DATABASE_THE_TTL_SHOULD_BE_NON_NEGATIVE = "Failed to create database. The TTL should be non-negative.";
10071007
public static final String FAILED_TO_CREATE_DATABASE_THE_DATAREGIONGROUPNUM_SHOULD_BE_POSITIVE = "Failed to create database. The dataRegionGroupNum should be positive.";

iotdb-core/confignode/src/main/i18n/zh/org/apache/iotdb/confignode/i18n/ProcedureMessages.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -999,7 +999,7 @@ public final class ProcedureMessages {
999999
public static final String VALIDATE_TABLE_FOR_TABLE_WHEN_SETTING_PROPERTIES =
10001000
"Validate table for table {}.{} when setting properties";
10011001
public static final String WAITTASKFINISH_RETURNS_PROCESSING_WHICH_MEANS_THE_WAITING_HAS_BEEN_INTERRUPTED =
1002-
"waitTaskFinish() returns PROCESSING, which means the waiting has been interrupted, this procedure will end without rollback";
1002+
"waitTaskFinish() 返回 PROCESSING,表示等待被中断(ConfigNode 关闭或主节点切换);AddRegionPeer 任务仍在协调者上运行,该流程将停留在 DO_ADD_REGION_PEER 状态,恢复后继续轮询";
10031003

10041004
public static final String FAILED_TO_CREATE_DATABASE_THE_TTL_SHOULD_BE_NON_NEGATIVE = "创建数据库失败。TTL 不能为负数。";
10051005
public static final String FAILED_TO_CREATE_DATABASE_THE_DATAREGIONGROUPNUM_SHOULD_BE_POSITIVE = "创建数据库失败。dataRegionGroupNum 应为正数。";

iotdb-core/confignode/src/main/java/org/apache/iotdb/confignode/procedure/impl/region/AddRegionPeerProcedure.java

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,10 +124,22 @@ protected Flow executeFromState(ConfigNodeProcedureEnv env, AddRegionPeerState s
124124
return warnAndRollBackAndNoMoreState(
125125
env, handler, String.format("%s result is %s", state, result.getTaskStatus()));
126126
case PROCESSING:
127+
// waitTaskFinish() only returns PROCESSING when its polling loop was interrupted by
128+
// an InterruptedException, i.e. this ConfigNode is shutting down / losing leadership
129+
// (a user CANCEL or a coordinator disconnection both go through the FAIL branch
130+
// above). The AddRegionPeerTask is still running on the coordinator DataNode, so we
131+
// must NOT silently end here: doing so would let the parent RegionMigrateProcedure
132+
// proceed to CHECK_ADD_REGION_PEER / REMOVE_REGION_PEER and remove the source replica
133+
// before the destination replica has actually finished receiving the snapshot.
134+
// Instead, stay in DO_ADD_REGION_PEER and persist it; after recovery the new leader
135+
// re-enters this state and re-polls the still-running coordinator task (the
136+
// isStateDeserialized() guard above prevents re-submitting the task) until it really
137+
// reaches SUCCESS or FAIL.
127138
LOGGER.info(
128139
ProcedureMessages
129140
.WAITTASKFINISH_RETURNS_PROCESSING_WHICH_MEANS_THE_WAITING_HAS_BEEN_INTERRUPTED);
130-
return Flow.NO_MORE_STATE;
141+
setNextState(AddRegionPeerState.DO_ADD_REGION_PEER);
142+
break outerSwitch;
131143
case SUCCESS:
132144
setNextState(UPDATE_REGION_LOCATION_CACHE);
133145
break outerSwitch;

0 commit comments

Comments
 (0)