DAOS-19008 control: Erase formatting after failed format --replace#18446
DAOS-19008 control: Erase formatting after failed format --replace#18446tanabarr wants to merge 3 commits into
Conversation
|
Ticket title is 'Aurora daos_user: PMEM Device should Unmount and revert the --replace operation fully if it fails' |
|
Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18446/1/testReport/ |
|
Test stage Functional on EL 9 completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18446/1/execution/node/983/log |
| cmd.Debugf("Invoking SystemErase to clean up after failed format operation") | ||
|
|
||
| eraseReq := &control.SystemEraseReq{} | ||
| eraseResp, err := control.SystemErase(ctx, cmd.ctlInvoker, eraseReq) |
There was a problem hiding this comment.
I don't think this will work... SystemErase doesn't allow you to choose ranks or nodes.
I think you'll need to handle this from the daos_server that owns the engine. If the engine fails to join, and it's a replace operation, blow the storage away. The failure that triggered this request was happening at the join stage.
If the format itself fails, I don't think there's any risk of the engine coming up. If there's a partial failure, it's not a bad idea to clean up, but I think that would have to happen from the server side, too.
There was a problem hiding this comment.
right , this needs a rework
d8cf409 to
6785032
Compare
|
Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18446/4/testReport/ |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-18446/4/execution/node/1394/log |
6785032 to
309615f
Compare
|
Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18446/5/testReport/ |
309615f to
a593fcb
Compare
|
Test stage NLT completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18446/6/testReport/ |
21e8fb7 to
862f9d5
Compare
|
Test stage Unit Test completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-18446/8/testReport/ |
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
Signed-off-by: Tom Nabarro <thomas.nabarro@hpe.com>
862f9d5 to
550d853
Compare
Steps for the author:
After all prior steps are complete: