Skip to content

Do evaluation with gloo backend, and only on process 0#6

Open
Nanne wants to merge 1 commit into
yxgeee:masterfrom
Nanne:cpu_dist
Open

Do evaluation with gloo backend, and only on process 0#6
Nanne wants to merge 1 commit into
yxgeee:masterfrom
Nanne:cpu_dist

Conversation

@Nanne

@Nanne Nanne commented Nov 8, 2020

Copy link
Copy Markdown

Thanks for publishing this great code base! In playing around with it we ran into some issues with being unable to evaluate on large databases because the extract_features() function requires (because of the NCCL backend) that data is send via the GPU.

Additionally, by relying on the NCCL backend (which doesn't support gather() or point-to-point communication), the data was actually shared among all processes, which caused us issues even when the data did fit on GPU (but then redundant copies caused CPU memory issues). It also meant that the evaluation was actually performed N times (where N = world_size).

To fix these issues I've rewritten the extract_features() function to use the gloo backend for gathering all the data. This way it doesn't have to go via the GPU, but once extracted the data can stay in CPU memory, it also makes it possible to gather everything on process 0, and then do the evaluation only once.

I didn't thoroughly compare, but I think its now always better to use --sync_gather. Either way both cases should be faster and use less memory than before.

(I didn't want to take too many GPUs with CVPR deadline so close, so I only tested with 2 GPUs, but it should scale to larger world_size.)

@Nanne

Nanne commented Nov 8, 2020

Copy link
Copy Markdown
Author

Oops, it seems extract_features() is also used in net_vlad_img.py and netvlad_img_sfrs.py, and I think my changes will break the code there.

It looks like the training code can be fairly easily adjusted to use my proposed extract_features(), but I don't have time to do (and test) it now. Seems worthwhile (and easy enough) to make these changes though, and it should also reduce evaluation speed and memory usage during training.

Actually, It also will fail if using PCA during evaluation, but that should probably also only happen on process 0, and not on all processes. Sorry, will update PR to make it work for all places where extract_features is called.

@yxgeee

yxgeee commented Nov 9, 2020

Copy link
Copy Markdown
Owner

Thanks a lot for your contribution! The large memory request of GPU and CPU seems a significant issue in this codebase. You provide quite a good solution. I will look into your code, and adjust the training part to match it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants