vnc_worker: improve VNC server compatibility and performance#3197
vnc_worker: improve VNC server compatibility and performance#3197bitranox wants to merge 17 commits intomicrosoft:mainfrom
Conversation
|
@microsoft-github-policy-service agree |
f0803dc to
1fea435
Compare
1fea435 to
e93c2de
Compare
Upgrade VNC server from RFB 3.3-only to RFB 3.3/3.7/3.8, fixing compatibility with modern clients like noVNC that expect the 3.8 security handshake. Make DesktopSize pseudo-encoding a soft capability instead of a hard requirement. Allow new VNC clients to take over by disconnecting the previous session. New features: - Tile-based dirty detection (64x64 tiles): only changed regions are sent instead of the full framebuffer, significantly reducing bandwidth - Zlib compression (encoding type 6): when the client advertises zlib support, tile pixel data is compressed using a persistent zlib stream with Sync flush per the RFB spec - Cursor pseudo-encoding (-239): sends a default arrow cursor with white fill and black outline so the pointer is visible in all clients including noVNC, without requiring server-side cursor rendering - Previous framebuffer tracking for incremental updates Protocol changes: - Server now offers RFB 3.8 and falls back to 3.7/3.3 - RFB 3.8 security handshake with SecurityResult - Cursor sent as part of the first FramebufferUpdate (not during SetEncodings) to avoid stream corruption with pipelined clients
e93c2de to
01267b8
Compare
|
Thanks for your contribution! As someone who's very not familiar with the VNC protocols this looks fine to me, but it is failing our format check currently. |
Anything I need to do ? Any linter I can test myself to pass ? |
|
Well right now our CI is busted haha, but we'll fix that. Locally you'll need to pass both clippy (the standard rust linter) and our custom lints, which can be run with |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
i fixed it. would be nice to see that merged, need it to run openvmm on proxmox with the noVMC viewer ... |
|
#3201 will fix our CI, this should just need a rebase after that gets merged. |
There was a problem hiding this comment.
Pull request overview
This PR upgrades the in-tree VNC server to better interoperate with modern VNC clients and to reduce bandwidth/CPU by sending incremental screen updates with optional zlib compression. It also improves UX/compatibility via a default cursor pseudo-encoding and relaxes the DesktopSize requirement, while adding support for new clients taking over an existing session.
Changes:
- Upgrade handshake to offer RFB 3.8 (with fallback to 3.7/3.3) and negotiate security types.
- Implement tile-based dirty detection (64x64), previous-framebuffer tracking, and optional zlib encoding for changed tiles.
- Add cursor pseudo-encoding support and allow client reconnection to preempt an existing session.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| workers/vnc_worker/vnc/src/rfb.rs | Adds the Cursor pseudo-encoding constant used during framebuffer updates. |
| workers/vnc_worker/vnc/src/lib.rs | Implements protocol negotiation updates, incremental/tiled updates, zlib encoding, cursor sending, and makes DesktopSize optional. |
| workers/vnc_worker/vnc/Cargo.toml | Adds flate2 dependency for zlib compression support. |
| workers/vnc_worker/src/lib.rs | Updates server state machine to allow a new client to disconnect/take over from an existing connection. |
| Cargo.lock | Records the new flate2 dependency for the vnc crate. |
| let dest_depth = fmt.bits_per_pixel as usize / 8; | ||
| let shift_r = 24 - fmt.red_max.get().count_ones(); | ||
| let shift_g = 16 - fmt.green_max.get().count_ones(); | ||
| let shift_b = 8 - fmt.red_max.get().count_ones(); | ||
| let no_convert = dest_depth == 4 | ||
| && shift_r == fmt.red_shift as u32 | ||
| && shift_g == fmt.green_shift as u32 | ||
| && shift_b == fmt.blue_shift as u32; |
There was a problem hiding this comment.
shift_b is computed from red_max instead of blue_max, which will produce incorrect blue-channel conversion for non-8-bit blue formats and can also make no_convert detection wrong. Compute shift_b from fmt.blue_max (and ensure the fast-path checks reflect the correct source channel layout).
There was a problem hiding this comment.
this is intentional from the original code. If we change that to blue_max, grey windows decorators get black in RealVNC, but correct in noVNC
# this is intentional !
let shift_b = 8 - fmt.red_max.get().count_ones();
There was a problem hiding this comment.
In virtually all real pixel formats (RGB888, RGB565, RGB555), red and blue have the same number of bits. So red_max.count_ones() == blue_max.count_ones() and it doesn't matter which one you use.
Why changing to blue_max breaks colors:
A specific VNC client (RealVNC) is sending an incorrect or non-standard blue_max value. If a client sends blue_max = 0 or a value with fewer bits set than expected, then shift_b = 8 - 0 = 8, and b >> 8 = 0 for all pixels - zeroing out the blue channel entirely. - or it uses for instance RGB332: red=7, green=7, blue=3 (see below, count_ones() issue)
noVNC is sending an correct blue_max value --> it works
RealVNC is sending an incorrect or non-standard blue_max value --> grey turns into black
So using fmt.red_max.get() is the workaround at the moment.
I investigated this thoroughly again :
Additional issue: count_ones() is the wrong function here entirely. It counts set bits, not bit width. For spec-conforming values (2^N - 1 like 255, 31, 63) it happens to give the right answer, but for any non-conforming value it diverges. The correct approach is 16 - leading_zeros() which gives the actual bit width regardless of bit pattern.
The RFB spec says: max values "should be 2^N - 1 where N is the number of bits used for that colour" - using "should" not "must", but every real implementation assumes this form.
Plan:
- Fix shift_b to use blue_max (correct per spec)
- Switch all three channels from count_ones() to 16 - leading_zeros() (robust)
- Add validation in SetPixelFormat that max values are non-zero and of the form 2^N - 1, rejecting or ignoring non-conforming formats early
Key Points:
- The spec says "should" not "must" - but every real implementation assumes 2^N - 1
- They are independent per channel (red and blue can differ, e.g. RGB332: red=7, green=7, blue=3)
- Sum of bit widths should equal depth
- Shifts + widths must not overlap and must fit within bits_per_pixel
let red_bits = 16 - fmt.red_max.get().leading_zeros();
let green_bits = 16 - fmt.green_max.get().leading_zeros();
let blue_bits = 16 - fmt.blue_max.get().leading_zeros();
let shift_r = 24 - red_bits;
let shift_g = 16 - green_bits;
let shift_b = 8 - blue_bits;
This way we fix the root cause properly rather than relying on red_max as a workaround for buggy client values.
Since this needs a lot of visual testing with different clients, I would tackle that later.
workers/vnc_worker/vnc/src/lib.rs
Outdated
| match dest_depth { | ||
| 1 => out.push(p2 as u8), | ||
| 2 => out.extend_from_slice(&(p2 as u16).to_ne_bytes()), | ||
| 4 => out.extend_from_slice(&p2.to_ne_bytes()), | ||
| _ => unreachable!(), | ||
| } |
There was a problem hiding this comment.
convert_pixels will panic on client-selected pixel formats where bits_per_pixel/8 is not 1, 2, or 4 (e.g., 24bpp is common) due to unreachable!(). Since the client controls SetPixelFormat, this is a remotely-triggerable crash; validate/normalize the pixel format and return an error or support 24bpp instead of panicking.
There was a problem hiding this comment.
this is pre-existing behavior from the upstream code, not something introduced by this PR.
The RFB protocol specification only defines bits_per_pixel values of 8, 16, and 32. A value of 24 would be non-conforming, no standard VNC client sends it. However, since the pixel format is client-controlled via SetPixelFormat, a malicious or buggy client could theoretically trigger the panic.
Fixed in 9f79ef7: we now validate bits_per_pixel at SetPixelFormat time and reject anything other than 8/16/32 with a proper error, so the unreachable!() in convert_pixels is now genuinely unreachable.
| if version.0 == rfb::PROTOCOL_VERSION_38 { | ||
| // RFB 3.8: server sends SecurityResult after negotiation. | ||
| socket | ||
| .write_all( | ||
| rfb::SecurityResult { | ||
| status: rfb::SECURITY_RESULT_STATUS_OK.into(), | ||
| } | ||
| .as_bytes(), | ||
| ) | ||
| .await?; | ||
| } |
There was a problem hiding this comment.
RFB 3.7/3.8 security negotiation: the client’s selected security type is read but never validated/acted on, and SecurityResult is only sent for protocol 3.8. 3.7 clients typically expect a SecurityResult too, and unsupported selections should result in a failure response/connection close.
| if version.0 == rfb::PROTOCOL_VERSION_38 { | |
| // RFB 3.8: server sends SecurityResult after negotiation. | |
| socket | |
| .write_all( | |
| rfb::SecurityResult { | |
| status: rfb::SECURITY_RESULT_STATUS_OK.into(), | |
| } | |
| .as_bytes(), | |
| ) | |
| .await?; | |
| } | |
| if chosen_type != rfb::SECURITY_TYPE_NONE { | |
| socket | |
| .write_all( | |
| rfb::SecurityResult { | |
| status: rfb::SECURITY_RESULT_STATUS_FAILED.into(), | |
| } | |
| .as_bytes(), | |
| ) | |
| .await?; | |
| return Err(std::io::Error::new( | |
| std::io::ErrorKind::InvalidData, | |
| "client selected unsupported RFB security type", | |
| ) | |
| .into()); | |
| } | |
| // Send SecurityResult after successful negotiation. | |
| // Although the requirement is explicit in RFB 3.8, 3.7 clients | |
| // commonly expect the same response before ClientInit. | |
| socket | |
| .write_all( | |
| rfb::SecurityResult { | |
| status: rfb::SECURITY_RESULT_STATUS_OK.into(), | |
| } | |
| .as_bytes(), | |
| ) | |
| .await?; |
There was a problem hiding this comment.
The validation part is legit. fixed in bae9522. We now reject the connection if the client selects a security type we didn't offer.
However, sending SecurityResult for RFB 3.7 as suggested would actually break the protocol. Per the RFB spec, SecurityResult for security type None was only introduced in 3.8. In 3.7, the server proceeds directly to ClientInit after the security type exchange, sending an unexpected 4-byte message would desync the stream. So we keep SecurityResult gated on 3.8 only.
workers/vnc_worker/vnc/src/lib.rs
Outdated
| [(zlib_stream.total_out() - before_out) as usize..], | ||
| FlushCompress::Sync, | ||
| ) | ||
| .unwrap(); |
There was a problem hiding this comment.
The zlib compression path uses .unwrap() on flate2::Compress::compress(...), which can panic and take down the VNC worker. Since compression is client-driven (encoding negotiation), handle errors explicitly and map them into Error/I/O failure rather than panicking.
| .unwrap(); | |
| .map_err(|e| { | |
| std::io::Error::other(format!( | |
| "zlib compression failed: {e}" | |
| )) | |
| })?; |
There was a problem hiding this comment.
fixed in 9c514c6. The .unwrap() is replaced with proper error propagation via std::io::Error, so a compression failure now cleanly disconnects the client instead of panicking the worker
workers/vnc_worker/vnc/src/lib.rs
Outdated
| let cw: u16 = 18; | ||
| let ch: u16 = 18; | ||
| let bpp = fmt.bits_per_pixel as usize / 8; | ||
| let mask_stride = ((cw as usize) + 7) / 8; | ||
| let mut pixels = Vec::with_capacity(cw as usize * ch as usize * bpp); | ||
| for y in 0..ch as usize { | ||
| for x in 0..cw as usize { | ||
| let byte_i = x / 8; | ||
| let bit = 7 - (x % 8); | ||
| let in_mask = | ||
| byte_i < mask_stride && (MASK[y][byte_i] >> bit) & 1 == 1; | ||
| let in_fill = | ||
| byte_i < mask_stride && (FILL[y][byte_i] >> bit) & 1 == 1; | ||
| if in_mask && in_fill { | ||
| // White fill | ||
| pixels.extend_from_slice(&[0xFF, 0xFF, 0xFF, 0x00][..bpp]); | ||
| } else if in_mask { | ||
| // Black outline | ||
| pixels.extend_from_slice(&[0x00, 0x00, 0x00, 0x00][..bpp]); | ||
| } else { | ||
| // Transparent | ||
| pixels.extend_from_slice(&[0x00; 4][..bpp]); | ||
| } |
There was a problem hiding this comment.
Cursor pseudo-encoding pixel data is written as raw [R,G,B,(0)] bytes sliced by bpp, but it isn’t converted according to the negotiated PixelFormat (channel shifts/maxes and endianness). This will render incorrect colors/garbage for common formats like RGB565 or big-endian; generate cursor pixels using the same conversion logic as framebuffer tiles.
There was a problem hiding this comment.
fixed in 452f08d. The cursor pixel data now goes through the same convert_pixels path as framebuffer tiles, so it correctly respects the client's negotiated pixel format (channel shifts, maxes, bit depth, endianness). Previously the raw hardcoded bytes would
have rendered as garbage on anything other than 32bpp little-endian XRGB.
|
|
||
| if no_convert { | ||
| out.extend_from_slice(src.as_bytes()); | ||
| return; | ||
| } | ||
|
|
||
| for &p in src { | ||
| let (r, g, b) = (p & 0xff0000, p & 0xff00, p & 0xff); | ||
| let p2 = r >> shift_r << fmt.red_shift | ||
| | g >> shift_g << fmt.green_shift | ||
| | b >> shift_b << fmt.blue_shift; | ||
| match dest_depth { | ||
| 1 => out.push(p2 as u8), | ||
| 2 => out.extend_from_slice(&(p2 as u16).to_ne_bytes()), | ||
| 4 => out.extend_from_slice(&p2.to_ne_bytes()), | ||
| _ => unreachable!(), |
There was a problem hiding this comment.
Pixel byte order is emitted using to_ne_bytes() / src.as_bytes() regardless of fmt.big_endian_flag. If a client requests big-endian pixel format, the server will send incorrectly ordered bytes; use to_be_bytes/to_le_bytes (and adjust the fast-path) based on big_endian_flag.
There was a problem hiding this comment.
fixed in a67b239. convert_pixels now uses to_be_bytes/to_le_bytes based on big_endian_flag instead of unconditionally using to_ne_bytes. The zero-copy fast-path is also gated on !big_endian since src.as_bytes() is native-endian.
|
Ditto to what @smalis-msft said. Thank you so much for this contribution! I don't think we have true VNC expertise amongst the maintainers/reviewers. That makes your contribution especially valuable, but a bit tricky to build confidence in from code (at least for me). I took a look over this and it also looks reasonable, but I think we'll want a few more eyes on it too. Please review the feedback left by copilot and respond to it. If you don't agree with the feedback, that's fine, but please help us by leaving a brief statement why not. |
Validate bits_per_pixel in SetPixelFormat to only accept 8, 16, and 32 as defined by the RFB spec. Previously a non-conforming client could trigger an unreachable!() panic by sending e.g. 24bpp. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reject the connection if the client selects a security type other than None, since that is the only type we offer. On RFB 3.8, send a SecurityResult failure before disconnecting. On 3.7, SecurityResult is not sent per spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace .unwrap() on flate2 compress with proper error propagation via std::io::Error, since compression is triggered by client-negotiated encoding. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cursor pseudo-encoding pixel data was written as raw hardcoded bytes, ignoring the client's negotiated PixelFormat (channel shifts, maxes, endianness). This would render incorrect colors on formats like RGB565 or big-endian. Now uses the same convert_pixels path as framebuffer tiles. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pixel bytes were emitted using to_ne_bytes() regardless of the client's big_endian_flag, producing incorrect byte order for big-endian clients. Now uses to_be_bytes/to_le_bytes based on the flag, and the fast-path is disabled for big-endian formats. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Ok, should be good to take one more merge/rebase with main to unblock CI |
| let (new_width, new_height) = self.fb.resolution(); | ||
| if new_width != width || new_height != height { | ||
| // Send the new desktop size. | ||
| width = new_width; | ||
| height = new_height; | ||
| socket | ||
| .write_all( | ||
| rfb::FramebufferUpdate { | ||
| message_type: rfb::SC_MESSAGE_TYPE_FRAMEBUFFER_UPDATE, | ||
| padding: 0, | ||
| rectangle_count: 1.into(), | ||
| } | ||
| .as_bytes(), | ||
| ) | ||
| .await?; | ||
| socket | ||
| .write_all( | ||
| rfb::Rectangle { | ||
| x: 0.into(), | ||
| y: 0.into(), | ||
| width: width.into(), | ||
| height: height.into(), | ||
| encoding_type: rfb::ENCODING_TYPE_DESKTOP_SIZE.into(), | ||
| force_full_update = true; | ||
| if self.supports_desktop_resize { | ||
| // Notify the client of the new desktop size. | ||
| socket | ||
| .write_all( | ||
| rfb::FramebufferUpdate { | ||
| message_type: rfb::SC_MESSAGE_TYPE_FRAMEBUFFER_UPDATE, | ||
| padding: 0, | ||
| rectangle_count: 1.into(), | ||
| } | ||
| .as_bytes(), | ||
| ) | ||
| .await?; | ||
| socket | ||
| .write_all( | ||
| rfb::Rectangle { | ||
| x: 0.into(), | ||
| y: 0.into(), | ||
| width: width.into(), | ||
| height: height.into(), | ||
| encoding_type: rfb::ENCODING_TYPE_DESKTOP_SIZE.into(), | ||
| } | ||
| .as_bytes(), | ||
| ) | ||
| .await?; | ||
| } | ||
| } |
There was a problem hiding this comment.
When the framebuffer resolution changes, width/height are updated even if supports_desktop_resize is false. That means subsequent rectangles are computed in the new coordinate space without informing the client, which can desync protocol state or cause out-of-bounds updates for clients that don't implement the DesktopSize pseudo-encoding. Consider either (a) keeping width/height fixed unless DesktopSize is supported (and cropping/letterboxing), or (b) disconnecting with a clear error when a resize occurs without DesktopSize support.
There was a problem hiding this comment.
fixed in 7871e02. If the framebuffer resolution changes and the client did not advertise DesktopSize pseudo-encoding support, we now cleanly disconnect with an error instead of silently sending frames in the new coordinate space. Clients that support DesktopSize continue to receive the resize notification as before.
Though I havent seen a client that doesnt support that.
workers/vnc_worker/vnc/src/lib.rs
Outdated
| match input.pixel_format.bits_per_pixel { | ||
| 8 | 16 | 32 => {} | ||
| bpp => return Err(Error::UnsupportedPixelFormat(bpp)), | ||
| } |
There was a problem hiding this comment.
SetPixelFormat validation only checks bits_per_pixel, but convert_pixels performs shifts using client-provided red_shift/green_shift/blue_shift and can panic (or behave unpredictably in release) if a malicious client supplies shift values >= 32 or inconsistent max/shift combinations. Add validation/clamping for shift ranges (e.g., < bits_per_pixel and < 32) and consider rejecting non-true-color formats to avoid remotely-triggerable crashes.
| match input.pixel_format.bits_per_pixel { | |
| 8 | 16 | 32 => {} | |
| bpp => return Err(Error::UnsupportedPixelFormat(bpp)), | |
| } | |
| let bpp = input.pixel_format.bits_per_pixel; | |
| match bpp { | |
| 8 | 16 | 32 => {} | |
| _ => return Err(Error::UnsupportedPixelFormat(bpp)), | |
| } | |
| // This implementation only supports true-color formats, and the | |
| // color component shifts and masks must fit within the advertised | |
| // pixel width to avoid panics or invalid conversions on untrusted | |
| // client input. | |
| if input.pixel_format.true_color_flag == 0 { | |
| return Err(Error::UnsupportedPixelFormat(bpp)); | |
| } | |
| let bpp_u32 = u32::from(bpp); | |
| let pixel_mask = (1u64 << bpp_u32) - 1; | |
| let red_shift = u32::from(input.pixel_format.red_shift); | |
| let green_shift = u32::from(input.pixel_format.green_shift); | |
| let blue_shift = u32::from(input.pixel_format.blue_shift); | |
| let red_max = u64::from(input.pixel_format.red_max.get()); | |
| let green_max = u64::from(input.pixel_format.green_max.get()); | |
| let blue_max = u64::from(input.pixel_format.blue_max.get()); | |
| let red_mask = match red_max.checked_shl(red_shift) { | |
| Some(mask) if red_shift < bpp_u32 && red_shift < 32 && mask <= pixel_mask => mask, | |
| _ => return Err(Error::UnsupportedPixelFormat(bpp)), | |
| }; | |
| let green_mask = match green_max.checked_shl(green_shift) { | |
| Some(mask) | |
| if green_shift < bpp_u32 | |
| && green_shift < 32 | |
| && mask <= pixel_mask => | |
| { | |
| mask | |
| } | |
| _ => return Err(Error::UnsupportedPixelFormat(bpp)), | |
| }; | |
| let blue_mask = match blue_max.checked_shl(blue_shift) { | |
| Some(mask) if blue_shift < bpp_u32 && blue_shift < 32 && mask <= pixel_mask => mask, | |
| _ => return Err(Error::UnsupportedPixelFormat(bpp)), | |
| }; | |
| if (red_mask & green_mask) != 0 | |
| || (red_mask & blue_mask) != 0 | |
| || (green_mask & blue_mask) != 0 | |
| { | |
| return Err(Error::UnsupportedPixelFormat(bpp)); | |
| } |
There was a problem hiding this comment.
fixed in f589cb2. We now reject non-true-color formats and any shift >= 32 to prevent panics from out-of-range bit shifts in convert_pixels.The full mask overlap validation in the suggestion is overkill for this case though - overlapping masks would just produce odd colors, not crashes or security issues, and real VNC clients always send sensible formats.
| socket | ||
| .write_all(rfb::ProtocolVersion(rfb::PROTOCOL_VERSION_33).as_bytes()) | ||
| .write_all(rfb::ProtocolVersion(rfb::PROTOCOL_VERSION_38).as_bytes()) | ||
| .await?; | ||
|
|
||
| let mut version = rfb::ProtocolVersion::new_zeroed(); | ||
| socket.read_exact(version.as_mut_bytes()).await?; | ||
|
|
||
| if version.0 != rfb::PROTOCOL_VERSION_33 { | ||
| return Err(Error::UnsupportedVersion(version)); | ||
| } | ||
| match version.0 { | ||
| rfb::PROTOCOL_VERSION_33 => { | ||
| // RFB 3.3: server dictates security type as a u32. | ||
| socket | ||
| .write_all( | ||
| rfb::Security33 { | ||
| padding: [0; 3], | ||
| security_type: rfb::SECURITY_TYPE_NONE, | ||
| } | ||
| .as_bytes(), | ||
| ) | ||
| .await?; | ||
| } | ||
| rfb::PROTOCOL_VERSION_37 | rfb::PROTOCOL_VERSION_38 => { | ||
| // RFB 3.7/3.8: server sends a list of supported security types. | ||
| socket | ||
| .write_all(rfb::Security37 { type_count: 1 }.as_bytes()) | ||
| .await?; | ||
| socket.write_all(&[rfb::SECURITY_TYPE_NONE]).await?; |
There was a problem hiding this comment.
This PR changes user-visible VNC behavior/capabilities (RFB 3.8 negotiation, zlib encoding, cursor pseudo-encoding, reconnection behavior). The OpenVMM Guide has a Graphical Console/VNC page (Guide/src/reference/openvmm/graphical_console.md); consider updating it (or adding a brief note) so the documented client compatibility/features stay accurate.
There was a problem hiding this comment.
updated in 7213e79. The VNC guide page now documents the protocol versions (3.3/3.7/3.8), zlib compression, cursor pseudo-encoding, DesktopSize handling (including the disconnect behavior for clients without support), QEMU extended key events, and client
reconnection behavior. Also added noVNC to the tested clients list.
If the framebuffer resolution changes but the client did not advertise DesktopSize pseudo-encoding, disconnect with an error instead of silently sending frames in the new coordinate space, which would desync the protocol state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reject non-true-color formats and shift values >= 32 in SetPixelFormat to prevent panics in convert_pixels from out-of-range bit shifts on malicious client input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Document RFB 3.3/3.7/3.8 protocol support, zlib compression, cursor pseudo-encoding, DesktopSize handling, QEMU extended key events, client reconnection behavior, and add noVNC to the tested clients list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
workers/vnc_worker/src/lib.rs
Outdated
| let mut vncserver = vnc::Server::new("OpenVMM VM".into(), socket, view, input); | ||
| let mut timer = PolledTimer::new(driver); | ||
| let (abort_send, abort_recv) = mesh::oneshot(); | ||
| let connection = Box::pin(async move { |
There was a problem hiding this comment.
A lot of this stuff is duplicated with the code that handles the initial connection. Can you factor this out?
There was a problem hiding this comment.
factored out in c2fa3c2. The connection setup (vnc::Server, timer, abort channel, async task) now lives in a single start_connection helper that both the initial-connect and reconnect paths call. Removes ~20 lines of duplication.
workers/vnc_worker/vnc/src/lib.rs
Outdated
| ) | ||
| .await?; | ||
| } | ||
| return Err(std::io::Error::new( |
There was a problem hiding this comment.
Can we have a new error enum variant here instead of this synthetic IO error?
There was a problem hiding this comment.
Done in 0db65e4. Replaced with a proper Error::UnsupportedSecurityType(u8) variant that also captures the rejected type value.
workers/vnc_worker/vnc/src/lib.rs
Outdated
| // Send the new desktop size. | ||
| if !self.supports_desktop_resize { | ||
| return Err(std::io::Error::new( | ||
| std::io::ErrorKind::Unsupported, |
There was a problem hiding this comment.
- No "synthetic" IO errors please.
- Is this the appropriate thing to do, to terminate the connection?
There was a problem hiding this comment.
The synthetic IO error is already replaced with Error::ResizeUnsupported in 0db65e4.
As for disconnecting, yes, this is the standard approach. Both QEMU and TigerVNC disconnect clients that don't support DesktopSize when the resolution changes. The RFB spec doesn't define fallback behavior for this case, and continuing to send framebuffer updates with dimensions the client doesn't expect risks protocol desync or garbled display. Disconnecting is clean, the client can simply reconnect and pick up the new resolution.
The alternative would be cropping/padding to the old resolution, but that adds complexity for a worse user experience.
|
|
||
| let full_update = force_full_update || self.prev_fb.len() != fb_size; | ||
|
|
||
| // Find dirty tiles. |
There was a problem hiding this comment.
This is OK for now if you find it improves performance.
We should actually be able to get dirty tiles from the device itself with a little more work (Windows guests send them, UEFI does, not sure about Linux guests... but we can also enable dirty bit tracking and scan that periodically)...
But we can tackle this separately.
There was a problem hiding this comment.
Agreed, getting dirty regions from the device would be the ideal path since it avoids the per-frame comparison entirely. The tile-based diffing here is a reasonable stopgap that already cuts bandwidth significantly vs full-frame sends, especially for mostly-static screens.
There was a problem hiding this comment.
What already exists:
- The synthetic video protocol (vm/devices/uidevices/src/video/protocol.rs) has full support for dirty regions: DirtMessage, Rectangle, up to 255 dirty rects per message, and a FeatureChangeMessage with is_dirt_needed flag
- The device parses incoming dirty messages and creates Request::Dirt(Vec) in vm/devices/uidevices/src/video/mod.rs
The gap, a single TODO:
- Line 493 of video/mod.rs: // TODO: Support dirt requests, the dirty rectangles are parsed but never forwarded anywhere
What would need to happen to wire it up:
- Forward dirty rects from the synthetic video device through a channel
- Add a dirty region receiver to the VNC worker (alongside or replacing the current Updater signal)
- VNC worker uses device-provided rects instead of full-frame diffing, with diffing as fallback for guests that don't send dirty regions (like some Linux guests).
I will tackle that as soon that PR got merged in
There was a problem hiding this comment.
correction : Linux guests do use the synthetic video device.
- hyperv_fb (older fbdev driver) does NOT send dirty rects. It just sets the framebuffer location and relies on the host to scan/poll for changes. So tile diffing would still be needed as fallback here.
- hyperv_drm (newer DRM driver) does send SYNTHVID_DIRT messages via drm_framebuffer_funcs.dirty. So dirty tracking from the device would work with modern Linux guests using this driver.
- Cursor neither Linux driver sends SYNTHVID_POINTER_SHAPE. Cursor is software-rendered guest-side in both cases.
So for dirty tracking: Windows guests and Linux guests with hyperv_drm would provide device-level dirty rects. Linux guests with the older hyperv_fb would still need tile diffing as fallback.
I will tackle that as soon that PR got merged in
|
|
||
| if send_cursor { | ||
| send_cursor = false; | ||
| // 18x18 arrow cursor with white fill and 2px black outline. |
There was a problem hiding this comment.
I think this is OK for now to make things more usable. But really we should get the actual cursor from the device.
There was a problem hiding this comment.
Agreed. Getting the real cursor from the device would be the right long-term solution. It would give correct cursor shapes for resize handles, text I-beams, wait spinners, etc. The hardcoded arrow is a usability improvement over no cursor at all, which is what clients got before when the server didn't send cursor pseudo-encoding.
There was a problem hiding this comment.
The synthetic video protocol (vm/devices/uidevices/src/video/protocol.rs) has complete cursor support:
- MESSAGE_POINTER_SHAPE (type 8) , sends cursor image: ARGB 32bpp or monochrome, up to 96x96, with hotspot coordinates, chunked transmission support
- MESSAGE_POINTER_POSITION (type 7) , sends cursor position + visibility
- FeatureChangeMessage.is_pointer_shape_updates_needed , feature negotiation flag
The gap: In vm/devices/uidevices/src/video/mod.rs (lines ~488-491), the messages are received but intentionally ignored:
Request::PointerPosition { is_visible, x, y } => {
let _ = (is_visible, x, y); // Ignored
}
Request::PointerShape => {} // Not parsed
What it would take:
- Parse MESSAGE_POINTER_SHAPE fully in the video device (accumulate chunks)
- Extend FramebufferControl trait (video_core/src/lib.rs) with cursor methods
- Add cursor storage to the framebuffer crate
- Extend vnc::Framebuffer trait to expose cursor data
- VNC worker polls for cursor changes and sends RFB cursor encoding updates
This is straightforward plumbing, the protocol, message structures, and channel infrastructure all exist already.
Linux Guests: No path currently, neither Linux driver sends SYNTHVID_POINTER_SHAPE.
Cursor is software-rendered guest-side in both cases.
UEFI: No cursor data
GOP is output-only, doesn't provide cursor shapes.
I will tackle that as soon that PR got merged in
| let pf = &input.pixel_format; | ||
| match pf.bits_per_pixel { | ||
| 8 | 16 | 32 => {} | ||
| bpp => return Err(Error::UnsupportedPixelFormat(bpp)), |
There was a problem hiding this comment.
Is terminating the connection the right thing to do?
There was a problem hiding this comment.
Yes, disconnecting is the standard approach here. The RFB protocol defines no response message for SetPixelFormat, so there's no way to signal rejection to the client. The only options are: use the invalid format (risking panics or UB in pixel conversion), silently ignore it (protocol desync since the client believes the new format is active), or disconnect.
QEMU, TigerVNC, and libvncserver all disconnect on invalid pixel formats for the same reason.
Add UnsupportedSecurityType and ResizeUnsupported variants to the Error enum instead of constructing std::io::Error::new with string messages.
Whow, we’re moving from “barely works” to “fully blown, with all the whistles and bells.” should I put inline comments for that issues for future reference ? If You prefer that I implement the dirty & cursor parts already now, let me know. I would prefer to merge the current stage and then apply that additional changes on top of that - otherwise the PR will get quite big ... greetings from good old Vienna |
…ariant Add Error::ZlibCompression variant instead of constructing std::io::Error::other with a string message.
The start_connection helper uses PolledSocket<socket2::Socket> in its explicit type signature, which requires socket2 as a direct dependency.
Summary
Upgrade the VNC server to improve compatibility with modern clients and reduce bandwidth usage.
Protocol upgrade
Performance
Cursor
Compatibility
Testing
Tested with TigerVNC, RealVNC and noVNC on Windows 11 guest running under openvmm/KVM: