vkInvalidateMappedMemoryRanges is not called.
It looks to me that vkInvalidateMappedMemoryRanges must be called before reading the mapped memory on the host because the mapped memory is created without VK_MEMORY_PROPERTY_HOST_COHERENT_BIT. The paragraph from the spec (source: https://docs.vulkan.org/spec/latest/chapters/synchronization.html) I refer to:
If a memory object does not have the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT property, then vkFlushMappedMemoryRanges must be called in order to guarantee that writes to the memory object from the host are made available to the host domain, where they can be further made available to the device domain via a domain operation. Similarly, vkInvalidateMappedMemoryRanges must be called to guarantee that writes which are available to the host domain are made visible to host operations.
- Host cached memory is not requested.
The mapped memory may be created with VK_MEMORY_PROPERTY_HOST_CACHED_BIT for performance boost. I tested vkcube on an integrated GPU with VK_LAYER_LUNARG_screenshot enabled and VK_LUNARG_SCREENSHOT_FRAMES set to 0-1000-1. Waiting for 1000 frames to be queued and saved took around 31 seconds before introducing this optimization and around 17 seconds after introducing it. To confirm no performance drawback when screenshots are rarely taken, I additionally tested with VK_LUNARG_SCREENSHOT_FRAMES set to 100-1000-100. Both before and after introducing the optimization took around 17 seconds. It also signifies that the overhead of taking screenshots becomes very small after introducing the optimization since there is a small difference in performance between taking 1000 screenshots (VK_LUNARG_SCREENSHOT_FRAMES set to 0-1000-1) and taking 10 screenshots (VK_LUNARG_SCREENSHOT_FRAMES set to 100-1000-100).
vkInvalidateMappedMemoryRangesis not called.It looks to me that
vkInvalidateMappedMemoryRangesmust be called before reading the mapped memory on the host because the mapped memory is created withoutVK_MEMORY_PROPERTY_HOST_COHERENT_BIT. The paragraph from the spec (source: https://docs.vulkan.org/spec/latest/chapters/synchronization.html) I refer to:The mapped memory may be created with
VK_MEMORY_PROPERTY_HOST_CACHED_BITfor performance boost. I tested vkcube on an integrated GPU withVK_LAYER_LUNARG_screenshotenabled andVK_LUNARG_SCREENSHOT_FRAMESset to0-1000-1. Waiting for 1000 frames to be queued and saved took around 31 seconds before introducing this optimization and around 17 seconds after introducing it. To confirm no performance drawback when screenshots are rarely taken, I additionally tested withVK_LUNARG_SCREENSHOT_FRAMESset to100-1000-100. Both before and after introducing the optimization took around 17 seconds. It also signifies that the overhead of taking screenshots becomes very small after introducing the optimization since there is a small difference in performance between taking 1000 screenshots (VK_LUNARG_SCREENSHOT_FRAMESset to0-1000-1) and taking 10 screenshots (VK_LUNARG_SCREENSHOT_FRAMESset to100-1000-100).