Implement parallel ItemBlock processing via backup_controller goroutines#8659
Implement parallel ItemBlock processing via backup_controller goroutines#8659ywk253100 merged 1 commit intovmware-tanzu:mainfrom
Conversation
|
Currently in draft, as I have not yet tested the changes in a cluster env. |
fa680a9 to
5cee460
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8659 +/- ##
==========================================
+ Coverage 59.39% 59.47% +0.08%
==========================================
Files 370 371 +1
Lines 39988 40107 +119
==========================================
+ Hits 23749 23853 +104
- Misses 14746 14760 +14
- Partials 1493 1494 +1 ☔ View full report in Codecov by Sentry. |
shawn-hurley
left a comment
There was a problem hiding this comment.
I think I have just one concern otherwise nothing stands out!
kaovilai
left a comment
There was a problem hiding this comment.
Some typos. other than that no objections atm.
Note: I am not as familiar with select-case channel statements.
5cee460 to
d08bb75
Compare
|
Updated in response to PR comments. Will test in cluster tomorrow. |
dad8770 to
097941e
Compare
|
I've tested this in an aws cluster. Two namespaces, 4 pods, 2 volumes, and additional related resources. Using vmware-tanzu/velero-plugin-example#75 to force each item backup to take 5 seconds allowed me to verify parallel processing. Starting with 1 configured worker, up to 16 -- each time dropped the total backup time, and I was able to verify from the logs that the 5-second duration BIA actions were happening in parallel to each other in the expected number. Summary of backup time/results is below. Note that the differing item count on subsequent backups was mainly due to the inclusion of events. If those had been excluded, the time drop from one backup to the next would be more significant. Also note that with a regular real-world backup of this size, the time differences would be much smaller, as I've introduced an artificial time duration per item here. The real-world use case where parallel backup will be most pronounced would be a backup with a large number of small PVCs using CSI snapshots or datamover. |
e1ab7c0 to
fd2ee86
Compare
|
rebased after #8664 merged |
| func (p *ItemBlockWorkerPool) Stop() { | ||
| p.cancelFunc() | ||
| p.logger.Info("ItemBlock worker stopping") | ||
| p.wg.Wait() |
There was a problem hiding this comment.
Add another log to indicate work pool stopped and the wait time
57ea842 to
849b896
Compare
|
@sseago |
Oh. Yes, that looks like it must have been inadvertently added. That file is sometimes left behind if you cancel in the middle of running unit tests. I'll get rid of that and re-push. |
849b896 to
acb6df0
Compare
Signed-off-by: Scott Seago <sseago@redhat.com>
acb6df0 to
fcfb2fd
Compare
perhaps .gitignore candidate. |
|
LGTM. Please wait @ywk253100 have another look since issue #8516 depends on this PR. |
Thank you for contributing to Velero!
Please add a summary of your change
Parallel ItemBlock processing via backup_controller goroutines as described in phase 2 of https://github.com/vmware-tanzu/velero/blob/main/design/backup-performance-improvements.md
Does your change fix a particular issue?
Fixes #8334
Please indicate you've done the following:
make new-changelog) or comment/kind changelog-not-requiredon this PR.site/content/docs/main.