Implement parallel ItemBlock processing via backup_controller goroutines by sseago · Pull Request #8659 · vmware-tanzu/velero

sseago · 2025-01-29T21:22:45Z

Thank you for contributing to Velero!

Please add a summary of your change

Parallel ItemBlock processing via backup_controller goroutines as described in phase 2 of https://github.com/vmware-tanzu/velero/blob/main/design/backup-performance-improvements.md

Does your change fix a particular issue?

Fixes #8334

Please indicate you've done the following:

[x ] Accepted the DCO. Commits without the DCO will delay acceptance.
[x ] Created a changelog file (make new-changelog) or comment /kind changelog-not-required on this PR.
Updated the corresponding documentation in site/content/docs/main.

sseago · 2025-01-29T21:23:17Z

Currently in draft, as I have not yet tested the changes in a cluster env.

codecov · 2025-01-29T21:37:15Z

Codecov Report

Attention: Patch coverage is 91.78082% with 12 lines in your changes missing coverage. Please review.

Project coverage is 59.47%. Comparing base (79707aa) to head (fcfb2fd).
Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/backup/backup.go	87.01%	9 Missing and 1 partial ⚠️
pkg/backup/item_collector.go	66.66%	1 Missing ⚠️
pkg/controller/backup_controller.go	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8659      +/-   ##
==========================================
+ Coverage   59.39%   59.47%   +0.08%     
==========================================
  Files         370      371       +1     
  Lines       39988    40107     +119     
==========================================
+ Hits        23749    23853     +104     
- Misses      14746    14760      +14     
- Partials     1493     1494       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

shawn-hurley

I think I have just one concern otherwise nothing stands out!

pkg/backup/backup.go

kaovilai

Some typos. other than that no objections atm.

Note: I am not as familiar with select-case channel statements.

pkg/backup/backup.go

sseago · 2025-02-05T00:37:00Z

Updated in response to PR comments. Will test in cluster tomorrow.

sseago · 2025-02-06T00:34:59Z

I've tested this in an aws cluster. Two namespaces, 4 pods, 2 volumes, and additional related resources. Using vmware-tanzu/velero-plugin-example#75 to force each item backup to take 5 seconds allowed me to verify parallel processing. Starting with 1 configured worker, up to 16 -- each time dropped the total backup time, and I was able to verify from the logs that the 5-second duration BIA actions were happening in parallel to each other in the expected number.

Summary of backup time/results is below. Note that the differing item count on subsequent backups was mainly due to the inclusion of events. If those had been excluded, the time drop from one backup to the next would be more significant. Also note that with a regular real-world backup of this size, the time differences would be much smaller, as I've introduced an artificial time duration per item here.

The real-world use case where parallel backup will be most pronounced would be a backup with a large number of small PVCs using CSI snapshots or datamover.

mysql-persistent in two namespaces. 5 second BIA delay per item

1 worker: mysql-1worker-01
Started:    2025-02-05 15:08:57 +0000 UTC
Completed:  2025-02-05 15:15:58 +0000 UTC
Elapsed: 07:01
Items backed up:              77

2 workers: mmysql-2worker-01
Started:    2025-02-05 16:04:21 +0000 UTC
Completed:  2025-02-05 16:08:32 +0000 UTC
Elapsed:  04:01
Items backed up:              83

4 workers: mmysql-4worker-01
Started:    2025-02-05 16:33:26 +0000 UTC
Completed:  2025-02-05 16:36:01 +0000 UTC
Elapsed: 02:35
Items backed up:              89

8 workers: mmysql-8worker-01
Started:    2025-02-05 16:48:15 +0000 UTC
Completed:  2025-02-05 16:50:01 +0000 UTC
Elapsed: 01:46
Total items to be backed up:  96

16 workers: mmysql-16worker-01
Started:    2025-02-05 16:58:00 +0000 UTC
Completed:  2025-02-05 16:59:37 +0000 UTC
Elapsed:  01:37
Items backed up:              102

sseago · 2025-02-07T05:55:04Z

rebased after #8664 merged

pkg/backup/backup.go

Lyndon-Li · 2025-02-07T08:52:40Z

pkg/backup/item_block_worker_pool.go

+func (p *ItemBlockWorkerPool) Stop() {
+	p.cancelFunc()
+	p.logger.Info("ItemBlock worker stopping")
+	p.wg.Wait()


Add another log to indicate work pool stopped and the wait time

pkg/backup/item_block_worker_pool.go

pkg/backup/backup.go

Lyndon-Li · 2025-02-12T06:31:39Z

@sseago
For the CI failure, only this PR has the problem.
It looks like a file pkg/cmd/cli/backup/bk-to-be-download-data.tar.gz is added in this PR by mistake, which may be cause of the problem.

sseago · 2025-02-12T15:22:33Z

@sseago For the CI failure, only this PR has the problem. It looks like a file pkg/cmd/cli/backup/bk-to-be-download-data.tar.gz is added in this PR by mistake, which may be cause of the problem.

Oh. Yes, that looks like it must have been inadvertently added. That file is sometimes left behind if you cancel in the middle of running unit tests. I'll get rid of that and re-push.

Signed-off-by: Scott Seago <sseago@redhat.com>

kaovilai · 2025-02-12T20:21:12Z

That file is sometimes left behind if you cancel in the middle of running unit tests. I'll get rid of that and re-push.

perhaps .gitignore candidate.

Lyndon-Li · 2025-02-13T03:03:05Z

LGTM. Please wait @ywk253100 have another look since issue #8516 depends on this PR.

sseago marked this pull request as draft January 29, 2025 21:22

github-actions bot assigned sseago Jan 29, 2025

github-actions bot requested review from anshulahuja98 and ywk253100 January 29, 2025 21:22

github-actions bot added Area/Design Design Documents has-unit-tests labels Jan 29, 2025

sseago force-pushed the parallel-itemblocks branch from fa680a9 to 5cee460 Compare January 29, 2025 21:29

github-actions bot added the has-changelog label Jan 29, 2025

shawn-hurley reviewed Jan 30, 2025

View reviewed changes

pkg/backup/backup.go Show resolved Hide resolved

kaovilai reviewed Feb 3, 2025

View reviewed changes

pkg/backup/backup.go Outdated Show resolved Hide resolved

shubham-pampattiwar reviewed Feb 4, 2025

View reviewed changes

pkg/backup/backup.go Outdated Show resolved Hide resolved

pkg/backup/backup.go Outdated Show resolved Hide resolved

sseago force-pushed the parallel-itemblocks branch from 5cee460 to d08bb75 Compare February 5, 2025 00:36

sseago force-pushed the parallel-itemblocks branch 2 times, most recently from dad8770 to 097941e Compare February 6, 2025 00:29

sseago marked this pull request as ready for review February 6, 2025 00:30

github-actions bot requested a review from shubham-pampattiwar February 6, 2025 00:30

sseago force-pushed the parallel-itemblocks branch 3 times, most recently from e1ab7c0 to fd2ee86 Compare February 7, 2025 05:54

Lyndon-Li reviewed Feb 7, 2025

View reviewed changes

pkg/backup/item_block_worker_pool.go Outdated Show resolved Hide resolved

pkg/backup/item_block_worker_pool.go Outdated Show resolved Hide resolved

pkg/backup/item_block_worker_pool.go Outdated Show resolved Hide resolved

sseago force-pushed the parallel-itemblocks branch 3 times, most recently from 57ea842 to 849b896 Compare February 11, 2025 23:23

Lyndon-Li reviewed Feb 12, 2025

View reviewed changes

pkg/backup/backup.go Outdated Show resolved Hide resolved

sseago force-pushed the parallel-itemblocks branch from 849b896 to acb6df0 Compare February 12, 2025 16:58

Implement parallel ItemBlock processing via backup_controller goroutines

fcfb2fd

Signed-off-by: Scott Seago <sseago@redhat.com>

sseago force-pushed the parallel-itemblocks branch from acb6df0 to fcfb2fd Compare February 12, 2025 17:03

shubham-pampattiwar approved these changes Feb 12, 2025

View reviewed changes

Lyndon-Li approved these changes Feb 13, 2025

View reviewed changes

ywk253100 approved these changes Feb 14, 2025

View reviewed changes

ywk253100 merged commit e3a6406 into vmware-tanzu:main Feb 14, 2025

Lyndon-Li mentioned this pull request Mar 24, 2025

Backing up resources in parallel #2888

Closed

Conversation

sseago commented Jan 29, 2025

Please add a summary of your change

Does your change fix a particular issue?

Please indicate you've done the following:

Uh oh!

sseago commented Jan 29, 2025

Uh oh!

codecov bot commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

shawn-hurley left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kaovilai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sseago commented Feb 5, 2025

Uh oh!

sseago commented Feb 6, 2025

Uh oh!

sseago commented Feb 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lyndon-Li Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lyndon-Li commented Feb 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sseago commented Feb 12, 2025

Uh oh!

kaovilai commented Feb 12, 2025

Uh oh!

Lyndon-Li commented Feb 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov bot commented Jan 29, 2025 •

edited

Loading

Lyndon-Li commented Feb 12, 2025 •

edited

Loading