Skip to content

refactor(api): 重新引入业务错误重构,并收敛 diff lint 作用域#373

Merged
Cx330-502 merged 4 commits into
raids-lab:mainfrom
Cx330-502:refactor_api
Jun 17, 2026
Merged

refactor(api): 重新引入业务错误重构,并收敛 diff lint 作用域#373
Cx330-502 merged 4 commits into
raids-lab:mainfrom
Cx330-502:refactor_api

Conversation

@Cx330-502

Copy link
Copy Markdown
Member

背景

此前 refactor API error handling and sync frontend compatibility 曾被误推到 main 后又被回滚。
本 PR 以正常评审流程重新引入该重构(revert the revert),并补充 lint 作用域调整,保证迁移方向清晰且可落地。

主要变更

  1. 后端错误处理体系重构
  • 新增 bizerr 分组错误体系(400/401/403/404/405/409/413/429/500/502/503/504)
  • 新增统一错误出口 resputil.HandleError
  • resputil 旧接口保留兼容(deprecated),支持平滑迁移
  • 在 account/statistics/gpu_analysis/system_config 等模块按新错误体系改造
  1. 前端错误处理与兼容
  • 错误码改为“新码 + legacy 码”双轨
  • 请求层改为策略化处理,支持按错误组分类处理(如 409 冲突)
  • 新增 markApiErrorHandled,避免页面已处理后再次全局重复 toast
  • 登录、SSHD、系统配置等场景完成新旧错误码兼容
  1. lint 与开发约束
  • 后端 make lint 保持 lint-full + lint-diff 双阶段
  • lint-diffforbidigo 作用域收敛为仅检查 internal/handlerinternal/service
  • 更新中英文 README,补充错误处理规范与前端错误码使用方式

兼容性与风险

  • 接口响应结构保持不变:code / msg / data
  • legacy 错误码仍可识别,兼容历史前端处理逻辑
  • 主要风险为迁移期存在新旧写法并存;已通过 lint 规则限制新增代码方向

开发注意文档位置

  • 后端错误处理规范(英文):backend/README.mdError Handling Rules
  • 后端错误处理规范(中文):backend/README.zh-CN.md错误处理规则
  • 前端错误码与处理规范(英文):frontend/README.mdError Code Usage
  • 前端错误码与处理规范(中文):frontend/README.zh-CN.md错误码使用
  • 前端错误码生成入口:frontend/Makefilegenerate-error-code

@AkashiSensei

Copy link
Copy Markdown
Contributor

不能先合这个求求了,这个合了我们的都合不进去了

@Cx330-502 Cx330-502 added the pending Pending: wait for prerequisite PRs label Mar 18, 2026
@AkashiSensei AkashiSensei requested a review from Copilot March 18, 2026 08:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

该 PR 旨在重新引入后端“业务错误体系(bizerr)+ 统一错误出口(resputil.HandleError)”的重构,并同步完成前端错误码双轨兼容与 lint-diff 约束收敛,确保迁移方向清晰可执行。

Changes:

  • 后端新增 bizerr 分组错误码体系,并引入 resputil.HandleError 作为统一错误出口(旧接口保留但标记 deprecated)。
  • 前端错误码生成与使用方式升级:新增 error group 常量、新旧错误码双轨(legacy)与请求层策略化处理(含 markApiErrorHandled)。
  • 工具链调整:后端 lint 拆分为 lint-full + lint-diff,并在 CI 中准备 upstream base;补充中英文 README 规范说明。

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
frontend/src/utils/toast.ts toast 侧增加“已处理标记”以避免冲突类错误 fallback 重复提示
frontend/src/services/generator.py 错误码生成脚本增强:支持 bizerr 结构体 tag、deprecated-only、prefix、group 常量输出
frontend/src/services/error_code_legacy.ts 新增:生成的 legacy 错误码常量文件(供前端双轨兼容)
frontend/src/services/error_code.ts 更新:新 bizerr 错误码常量与分组常量输出
frontend/src/services/client.ts 请求层错误处理策略化:冲突组 delegate + fallback、双轨错误码识别、markApiErrorHandled
frontend/src/routes/auth/-components/login-form.tsx 登录页对新旧错误码兼容,并在页面自行处理时报错去重
frontend/src/routes/admin/more/index.tsx 管理端系统配置场景适配新错误码并去重 toast
frontend/src/components/job/detail/s-s-h-port-dialog.tsx SSHD not found 场景兼容 legacy 错误码并去重 toast
frontend/README.zh-CN.md 补充前端错误码使用规范与示例(中文)
frontend/README.md 补充前端错误码使用规范与示例(英文)
frontend/Makefile generate-error-code 改为从 bizerr/groups.go 生成新码,并生成 legacy 文件
backend/internal/service/statistics.go service 层将参数/DB 错误改为 bizerr 返回(便于统一 HandleError)
backend/internal/resputil/response.go Response 的 code 类型对齐 bizerr.BizCode;保留旧接口并标记 deprecated
backend/internal/resputil/handle.go 新增统一错误出口 HandleError:从 error 提取 BizError 并映射 HTTP 状态码
backend/internal/resputil/code.go 旧错误码常量迁移为 bizerr.BizCode,并整体标记 deprecated(兼容历史代码)
backend/internal/handler/system_config.go system-config 部分场景切换到 bizerr + HandleError
backend/internal/handler/statistics.go statistics handler 全面切换到 bizerr + HandleError
backend/internal/handler/gpu_analysis.go gpu-analysis 部分场景切换到 bizerr + HandleError
backend/internal/handler/account.go account handler 大量路径切换到 bizerr + HandleError,并新增若干 gorm 错误抬升工具
backend/internal/bizerr/init.go 新增:bizerr 分组初始化与 FromError 提取工具
backend/internal/bizerr/groups.go 新增:bizerr 分组与 code tag 声明(生成 TS 错误码的来源)
backend/internal/bizerr/common/bizerrdb.go 新增:gorm 常见错误抬升为 bizerr 的通用方法
backend/internal/bizerr/bizerr.go 新增:BizCode/BizError 定义与 New/Wrap 能力(基于 cockroachdb/errors)
backend/go.mod 引入 cockroachdb/errors 等依赖
backend/go.sum 依赖校验更新
backend/docs/swagger.yaml swagger 中 Response.code 改为 integer 描述(移除旧 ErrorCode enum 定义)
backend/docs/swagger.json 同步 swagger.json 更新
backend/docs/docs.go 同步 swagger docs.go 更新
backend/README.zh-CN.md 补充后端错误处理规则(中文)
backend/README.md 补充后端错误处理规则(英文)
backend/Makefile lint 拆分 full/diff,并增加缓存目录设置
backend/.golangci-full.yml full lint 增加对 deprecated resputil 使用的静态检查排除规则
backend/.golangci-diff.yml 新增 diff lint 配置:forbidigo 仅约束 handler/service 并禁止 fmt.Errorf/errors.New 等
.github/workflows/backend-pr.yml CI 调整:为 lint-diff 准备 upstream/main,并改为直接运行 make lint
Comments suppressed due to low confidence (2)

backend/internal/handler/system_config.go:136

  • 【核心规范】UpdateLLMConfig 这里对 "validation failed" 分支返回的是 bizerr.Conflict.ResourceStatusError(HTTP 409),但 swagger 注释目前仍写的是 @Failure 400;另外非该分支仍在用旧的 resputil.Error(..., ServiceError)(会统一按旧逻辑返回 500),与“顶层统一 HandleError”迁移方向不一致。建议:1) swagger 补充/修正 409;2) 将非 validation 分支也改为 resputil.HandleError(c, err) 或用 bizerr.Internal.* 进行 Wrap。
	// Service 内部会判断:如果 Key == "********" 则不更新 DB 中的 Key
	err := mgr.service.UpdateLLMConfig(c.Request.Context(), serviceCfg, req.Validate)
	if err != nil {
		if strings.Contains(err.Error(), "validation failed") {
			resputil.HandleError(c, bizerr.Conflict.ResourceStatusError.New("LLM connection check failed. Please verify your settings."))
			return
		}
		resputil.Error(c, err.Error(), resputil.ServiceError)
		return

backend/internal/handler/gpu_analysis.go:97

  • 【核心规范】TriggerPodAnalysis 在功能未开启时改为返回 409(bizerr.Conflict.ResourceStatusError),但 swagger 仍只声明了 400/500;同时后续 err != nil 仍走旧的 resputil.Error(..., ServiceError),会丢失 service 层可能返回的更精确业务码/HTTP 状态。建议补充 @Failure 409,并将错误出口统一切到 resputil.HandleError(c, err)。
// @Success		200		{object}	resputil.Response[model.GpuAnalysis] "分析成功,返回创建的分析记录"
// @Failure		400		{object}	resputil.Response[any] "请求参数错误"
// @Failure		500		{object}	resputil.Response[any] "分析过程中发生错误"
// @Router			/v1/admin/gpu-analysis/trigger [post]
func (mgr *GpuAnalysisMgr) TriggerPodAnalysis(c *gin.Context) {
	if !mgr.configService.IsGpuAnalysisEnabled(c.Request.Context()) {
		resputil.HandleError(c, bizerr.Conflict.ResourceStatusError.New("GPU Analysis feature is currently disabled."))
		return
	}

	var req TriggerAnalysisReq
	if err := c.ShouldBindJSON(&req); err != nil {
		resputil.BadRequestError(c, err.Error())
		return
	}
	analysisResult, err := mgr.service.AnalyzePodByName(c.Request.Context(), req.Namespace, req.PodName)
	if err != nil {
		klog.Errorf("Manual analysis trigger failed for pod %s: %v", req.PodName, err)
		resputil.Error(c, err.Error(), resputil.ServiceError)
		return

You can also share your feedback on Copilot code review. Take the survey.

Comment thread backend/internal/bizerr/bizerr.go Outdated
Comment on lines +58 to +68
func (c BizCode) Wrap(cause error, msg string) error {
if cause == nil {
return nil
}
be := &BizError{Code: c, Message: msg}
// 1. errors.Mark 将 be 标记为 cause 的一部分,使得 errors.Is(err, be) 成立
// 2. errors.WithStack 加上堆栈
// 3. errors.WithMessage 加上新的描述
err := errors.Mark(cause, be)
err = errors.WithMessage(err, be.Error())
return errors.WithStack(err)
Comment on lines +32 to +44
func handleUnexpectedError(c *gin.Context, err error) {
// 如果完全不是业务错误,记录详细堆栈到日志(%+v 是关键)
fmt.Printf("Unexpected Error: %+v\n", err)
respond(c, http.StatusInternalServerError, &bizerr.BizError{
Code: 50000,
Message: "Internal Server Error",
})
}

func logServerError(err error, bErr *bizerr.BizError) {
if bErr.Code.Group() >= firstServerErrorGroup {
fmt.Printf("Business 5xx Error: %+v\n", err)
}
Comment thread backend/internal/resputil/response.go Outdated
// Success 200 OK - 常规查询、修改成功
func Success(c *gin.Context, data any) {
wrapResponse(c, "", data, OK)
emit(c, http.StatusOK, OK, "success", data)
Comment on lines +21 to +35
const markToastHandled = (error: unknown) => {
if (!error || typeof error !== 'object') {
return
}

const handledError = error as {
isHandledByBiz?: boolean
fallbackLogTimer?: ReturnType<typeof setTimeout>
}
handledError.isHandledByBiz = true

if (handledError.fallbackLogTimer) {
clearTimeout(handledError.fallbackLogTimer)
handledError.fallbackLogTimer = undefined
}
Comment thread backend/Makefile
Comment on lines +135 to +141
.PHONY: lint-diff
lint-diff: golangci-lint
@echo "$(BLUE)Linting changed code (ignore deprecated)...$(RESET)"
$(GOLANGCI_LINT) run \
-c .golangci-diff.yml\
--new-from-rev=upstream/main \
--timeout 5m
Comment on lines 1753 to 1757
token := util.GetToken(c)
// Check if current user is in account (does not require admin role)
if err := mgr.checkUserInAccount(c, token.UserID, req.ID); err != nil {
resputil.HTTPError(c, httpStatusForbidden, "Forbidden: User is not in account", resputil.NotSpecified)
resputil.HandleError(c, err)
return
Comment on lines +78 to +86
action: 'toast',
match: (code) => code === LEGACY_ERROR_LEGACY_TOKEN_NOT_SUPPORTED,
message: '不再支持这种登陆方式,直接通过 LDAP 登录即可',
},
{
action: 'toast',
match: (code) => code === ERROR_INVALID_REQUEST || code === ERROR_PARAMETER_ERROR,
message: (errorResponse) => `请求参数有误, ${errorResponse.msg}`,
},
Comment thread backend/README.zh-CN.md Outdated
```

```go
// 新错误码写这里

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照会上说的,新增错误码应该谨慎或者不新增,现在覆盖的已经比较全面了

@AkashiSensei AkashiSensei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

神,代码没细看,看看 Copilot 怎么说好了

@AkashiSensei

Copy link
Copy Markdown
Contributor

该合这个了

…bility""

This reverts commit 241a3a7.

Also scope diff forbidigo checks to internal/handler and internal/service only.
Comment thread backend/internal/bizerr/bizerr.go Outdated
// 1. errors.Mark 将 be 标记为 cause 的一部分,使得 errors.Is(err, be) 成立
// 2. errors.WithStack 加上堆栈
// 3. errors.WithMessage 加上新的描述
err := errors.Mark(cause, be)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BizCode.Wrap uses cockroachdb/errors.Mark(cause, be), but resputil.HandleError later extracts the business error with errors.As(err, &bErr). The marker reference is not part of the unwrap/cause chain, so wrapped errors such as bizerr.Internal.DatabaseError.Wrap(...) are not recognized by bizerr.FromError and fall into handleUnexpectedError, returning HTTP 500 with code 50000 and message Internal Server Error instead of the intended biz code/message.

This affects the newly migrated wrapped paths, for example loadResourceMetadata returning DatabaseError.Wrap(...); it should preserve 50002 and the wrapped message, but currently loses both. It will be worse for future BadRequest/NotFound wraps because the HTTP status will also become 500.

Suggested fix: make the wrapped error explicitly carry *BizError in the unwrap chain or implement an As method on the wrapper, instead of relying on Mark for errors.As. Alternatively, change FromError and response extraction to use a single marker-aware mechanism that can recover both code and message. Please add a unit test like BadRequest.ParameterError.Wrap(errors.New("x"), "bad input") -> HandleError returns HTTP 400/code 40002/msg bad input.

},
{
action: 'toast',
match: (code) => code === ERROR_INVALID_REQUEST || code === ERROR_PARAMETER_ERROR,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This global policy treats code 40002 as the new ERROR_PARAMETER_ERROR, but 40002 is also still LEGACY_ERROR_BUSINESS_LOGIC_ERROR. That means legacy business-logic errors from endpoints not migrated in this PR are now shown as 请求参数有误, ... before the page-level handler gets a chance to handle them.

A concrete existing path is job creation billing precheck: backend/internal/handler/vcjob/vcjob.go still returns resputil.BusinessLogicError (40002) with a billing precheck blocked: message, and the frontend still has useJobCreateBillingBlock checking LEGACY_ERROR_BUSINESS_LOGIC_ERROR for that special dialog. With this policy, users will see an incorrect parameter-error toast in addition to the billing block handling.

Suggested fix: do not reuse 40002 for the new ParameterError while legacy BusinessLogicError is still emitted, or add a legacy-aware policy before this branch that preserves the old silent/delegate behavior for legacy business-logic errors. The safer option is to assign ParameterError a non-conflicting new code and keep 40002 reserved for the legacy meaning until all emitters are migrated.

@Jinghao-coding

Copy link
Copy Markdown
Member

@Cx330-502 我在代码附近留了两条需要优先处理的 inline review,都是运行时行为回归风险:

  1. BizCode.Wrap 使用 errors.Mark 后,resputil.HandleError 通过 errors.As 取不到 *BizError,会把 wrapped bizerr 吞成 generic 500/code 50000:
    refactor(api): 重新引入业务错误重构,并收敛 diff lint 作用域 #373 (comment)

  2. 前端新旧错误码 40002 语义冲突:新 ERROR_PARAMETER_ERROR 和 legacy BusinessLogicError 共用 40002,导致未迁移接口的业务错误被全局误报成参数错误:
    refactor(api): 重新引入业务错误重构,并收敛 diff lint 作用域 #373 (comment)

建议这两处修掉后再合。

Comment thread backend/internal/resputil/response.go Outdated
// Success 200 OK - 常规查询、修改成功
func Success(c *gin.Context, data any) {
wrapResponse(c, "", data, OK)
emit(c, http.StatusOK, OK, "success", data)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes every successful response from msg: "" to msg: "success". That looks harmless for generic consumers, but some existing frontend success paths intentionally use response.msg || <localized/default message> so the backend can optionally provide a specific message while the frontend falls back to a localized business message.

For example, frontend/src/components/model/model-download-dialog.tsx uses const message = response.msg || defaultMessage and currently shows 已提交模型下载任务 / 已提交数据集下载任务 unless the backend returns a specific message. After this change, the generic success string always wins, so users will see success instead of the localized/business-specific toast for normal model/dataset download creation. Similar patterns exist in a few other consumers that treat empty msg as "no backend message".

Suggested fix: keep Success emitting an empty message for compatibility, or only set msg: "success" for endpoints that explicitly need it. If the response contract should change globally, update the frontend call sites to ignore generic success and keep their localized defaults.

@Jinghao-coding

Copy link
Copy Markdown
Member

@Cx330-502 我重新拉了最新 commit 0f028c88 检查,之前 3 个阻塞点都已经修掉了,新增的 bizerr / resputil 测试也覆盖到了关键路径。本地也跑过:

  • go test ./internal/bizerr ./internal/resputil
  • pnpm lint

GitHub 上 backend/frontend/storage 的 lint 和 build checks 也都过了。

剩下几个是非阻塞建议,不影响这次合入判断,可以后续 follow-up:

  1. backend/internal/resputil/handle.go 里 5xx / unexpected error 日志目前用 fmt.Printf,建议后续统一到项目日志体系,例如 klog.Errorf / klog.ErrorS,方便生产环境统一采集、分级和检索。

  2. 当前 PR 仍是迁移期,新旧错误出口并存,例如部分 handler 还在使用 deprecated resputil.Error / BadRequestError。这可以接受,但建议后续按模块继续收敛,避免后面新增代码复制旧写法。

  3. 对已经改成 bizerr.Conflict.ResourceStatusError 的接口,swagger 注释里有些仍只写了 400/500。建议后续统一补充 409,避免 API 文档和实际 HTTP status 不一致。

以上不阻塞合入。

@Cx330-502 Cx330-502 merged commit 224d4e5 into raids-lab:main Jun 17, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pending Pending: wait for prerequisite PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants