feat: SessionExpired auto retry (#232)#337
feat: SessionExpired auto retry (#232)#337TheNotoBarth wants to merge 2 commits intoDearVa:0.7.0from
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds transparent recovery for MCP tool invocations by detecting a “session expired” failure, restarting the corresponding MCP client, and retrying the tool call once so transient authentication/session issues don’t surface to the user.
Changes:
- Added MCP session-expired detection and a dedicated recovery/retry catch block around MCP tool invocation.
- Implemented MCP client restart (stop + start) followed by a one-time retry of the failed MCP function call.
- Added helper method to detect session-expired conditions from exception chains.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var newFunction = mcpPlugin.GetEnabledFunctions() | ||
| .OfType<McpChatFunction>() | ||
| .FirstOrDefault(f => f.KernelFunction.Name == content.FunctionName) ?? throw new InvalidOperationException($"Function '{content.FunctionName}' not found after MCP client restart."); | ||
|
|
||
| var kernelArgs = content.Arguments != null ? new KernelArguments(content.Arguments) : []; |
There was a problem hiding this comment.
After restarting the MCP client, the code looks up the function by content.FunctionName in the reloaded mcpPlugin.GetEnabledFunctions(). However, function names can be deduplicated/renamed in the plugin scope by mutating KernelFunction.Metadata.Name (see ChatPluginManager.ChatPluginSnapshot.EnsureUniqueFunctionName, where names get a _1 suffix). A restart recreates MCP tools with their original names, so the retry lookup can fail even though the tool exists, causing recovery to fail with "Function ... not found after MCP client restart.". Consider preserving a stable MCP-tool identifier/name for retries (separate from the deduped display/kernel name) or reapplying the same deduplication and refreshing the kernel/plugin scope before retrying so the names match.
| var newFunction = mcpPlugin.GetEnabledFunctions() | ||
| .OfType<McpChatFunction>() | ||
| .FirstOrDefault(f => f.KernelFunction.Name == content.FunctionName) ?? throw new InvalidOperationException($"Function '{content.FunctionName}' not found after MCP client restart."); | ||
|
|
||
| var kernelArgs = content.Arguments != null ? new KernelArguments(content.Arguments) : []; | ||
| var retryResult = await newFunction.KernelFunction.InvokeAsync(context.Kernel, kernelArgs, cancellationToken); | ||
| resultContent = new FunctionResultContent(content, retryResult); |
There was a problem hiding this comment.
The retry path invokes newFunction.KernelFunction.InvokeAsync(...) directly instead of using the normal FunctionCallContent.InvokeAsync(...) path used in the primary attempt. This bypasses any invocation behavior tied to FunctionCallContent (e.g., consistent function resolution, filters/pipeline behavior, or future cross-cutting logic added around content.InvokeAsync). If possible, refresh/rebuild the kernel plugin scope after MCP restart and re-run the original content.InvokeAsync(...) so retries go through the same codepath and remain consistent.
| var newFunction = mcpPlugin.GetEnabledFunctions() | |
| .OfType<McpChatFunction>() | |
| .FirstOrDefault(f => f.KernelFunction.Name == content.FunctionName) ?? throw new InvalidOperationException($"Function '{content.FunctionName}' not found after MCP client restart."); | |
| var kernelArgs = content.Arguments != null ? new KernelArguments(content.Arguments) : []; | |
| var retryResult = await newFunction.KernelFunction.InvokeAsync(context.Kernel, kernelArgs, cancellationToken); | |
| resultContent = new FunctionResultContent(content, retryResult); | |
| _ = mcpPlugin.GetEnabledFunctions() | |
| .OfType<McpChatFunction>() | |
| .FirstOrDefault(f => f.KernelFunction.Name == content.FunctionName) ?? throw new InvalidOperationException($"Function '{content.FunctionName}' not found after MCP client restart."); | |
| resultContent = await content.InvokeAsync(context.Kernel, cancellationToken); |
| var message = current.Message; | ||
| if (message.Contains("Session", StringComparison.OrdinalIgnoreCase) && | ||
| message.Contains("Expired", StringComparison.OrdinalIgnoreCase)) | ||
| { | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
IsMcpSessionExpiredException relies on substring matching in exception messages ("Session" + "Expired"). This is brittle (message text can change across library versions, transports, or localization) and can lead to both false negatives and false positives. Prefer detecting session-expired via structured signals where available (e.g., a specific MCP/JSON-RPC error code/type, or an HTTP status like 401/403 if surfaced as an HttpRequestException), and only fall back to message matching if there is no better discriminator.
| var message = current.Message; | |
| if (message.Contains("Session", StringComparison.OrdinalIgnoreCase) && | |
| message.Contains("Expired", StringComparison.OrdinalIgnoreCase)) | |
| { | |
| return true; | |
| } | |
| } | |
| return false; | |
| } | |
| if (TryGetAuthFailureStatusCode(current, out var statusCode) && | |
| (statusCode == System.Net.HttpStatusCode.Unauthorized || | |
| statusCode == System.Net.HttpStatusCode.Forbidden)) | |
| { | |
| return true; | |
| } | |
| if (HasSessionExpiredErrorCode(current)) | |
| { | |
| return true; | |
| } | |
| if (LooksLikeSessionExpiredMessage(current.Message)) | |
| { | |
| return true; | |
| } | |
| } | |
| return false; | |
| } | |
| private static bool TryGetAuthFailureStatusCode( | |
| Exception ex, | |
| out System.Net.HttpStatusCode statusCode) | |
| { | |
| if (ex is System.Net.Http.HttpRequestException { StatusCode: { } httpStatusCode }) | |
| { | |
| statusCode = httpStatusCode; | |
| return true; | |
| } | |
| var statusCodeValue = GetExceptionPropertyValue(ex, "StatusCode") ?? | |
| GetExceptionPropertyValue(ex, "ResponseStatusCode"); | |
| switch (statusCodeValue) | |
| { | |
| case int intCode when Enum.IsDefined(typeof(System.Net.HttpStatusCode), intCode): | |
| statusCode = (System.Net.HttpStatusCode)intCode; | |
| return true; | |
| case long longCode when longCode >= int.MinValue && | |
| longCode <= int.MaxValue && | |
| Enum.IsDefined(typeof(System.Net.HttpStatusCode), (int)longCode): | |
| statusCode = (System.Net.HttpStatusCode)(int)longCode; | |
| return true; | |
| case System.Net.HttpStatusCode enumCode: | |
| statusCode = enumCode; | |
| return true; | |
| case string textCode when int.TryParse(textCode, out var parsedCode) && | |
| Enum.IsDefined(typeof(System.Net.HttpStatusCode), parsedCode): | |
| statusCode = (System.Net.HttpStatusCode)parsedCode; | |
| return true; | |
| case string textCode when Enum.TryParse<System.Net.HttpStatusCode>(textCode, true, out var parsedStatusCode): | |
| statusCode = parsedStatusCode; | |
| return true; | |
| default: | |
| statusCode = default; | |
| return false; | |
| } | |
| } | |
| private static bool HasSessionExpiredErrorCode(Exception ex) | |
| { | |
| return IsSessionExpiredErrorCodeValue(GetExceptionPropertyValue(ex, "ErrorCode")) || | |
| IsSessionExpiredErrorCodeValue(GetExceptionPropertyValue(ex, "Code")); | |
| } | |
| private static object? GetExceptionPropertyValue(Exception ex, string propertyName) | |
| { | |
| return ex.GetType().GetProperty(propertyName)?.GetValue(ex); | |
| } | |
| private static bool IsSessionExpiredErrorCodeValue(object? value) | |
| { | |
| switch (value) | |
| { | |
| case null: | |
| return false; | |
| case int intCode: | |
| return intCode is 401 or 403; | |
| case long longCode: | |
| return longCode is 401L or 403L; | |
| case string stringCode: | |
| return string.Equals(stringCode, "session_expired", StringComparison.OrdinalIgnoreCase) || | |
| string.Equals(stringCode, "sessionexpired", StringComparison.OrdinalIgnoreCase) || | |
| string.Equals(stringCode, "unauthorized", StringComparison.OrdinalIgnoreCase) || | |
| string.Equals(stringCode, "forbidden", StringComparison.OrdinalIgnoreCase); | |
| default: | |
| var enumName = value.ToString(); | |
| return string.Equals(enumName, "Unauthorized", StringComparison.OrdinalIgnoreCase) || | |
| string.Equals(enumName, "Forbidden", StringComparison.OrdinalIgnoreCase) || | |
| string.Equals(enumName, "SessionExpired", StringComparison.OrdinalIgnoreCase); | |
| } | |
| } | |
| private static bool LooksLikeSessionExpiredMessage(string? message) | |
| { | |
| return !string.IsNullOrWhiteSpace(message) && | |
| Regex.IsMatch( | |
| message, | |
| @"\bsession\s+expired\b", | |
| RegexOptions.IgnoreCase | RegexOptions.CultureInvariant); | |
| } |
Description
This PR adds transparent MCP session-expired recovery for HTTP MCP tool calls. When the server returns a session expiration error, the client now restarts the MCP plugin and retries the tool call once automatically.
Type of Change
Current Behavior
like issue #232
Updated/Expected Behavior
When a session-expired MCP error is detected, the MCP client is restarted automatically and the tool call is retried once. If the retry succeeds, the user and AI receive the successful result without seeing the transient failure.
Implementation Details
ChatServicefor MCP function invocation failures.Checklist
Fixed Issues
Fixes #232