Refactor expanded OCR match phases by HellbringerOnline · Pull Request #10655 · SubtitleEdit/subtitleedit

HellbringerOnline · 2026-04-19T20:19:43Z

Summary

This PR is the first step in splitting the current nOCR work into reviewable pieces.

The goal of this step is only to make expanded nOCR matching easier to reason about and test. It does not change which expanded template wins.

What changed

extracted the two implicit expanded-match passes into explicit phases inside NOcrDb
- Exact
- Relaxed
introduced an internal expanded-match candidate model so the matching state is explicit instead of being spread across nested loops
moved per-template matching into helper methods for:
- collecting expanded candidates
- checking foreground/background sampled points
- evaluating size/width-percent constraints
kept the current selection behavior intact
- first matching candidate from the Exact phase wins
- if there is no exact candidate, first matching candidate from the Relaxed phase wins

Why this step exists

The current expanded matching logic was previously embedded directly in GetMatchExpanded(...) as two large loops with implicit rules.

That made it hard to:

see which candidates actually passed matching
test exact vs relaxed behavior in isolation
prepare follow-up fixes without changing behavior accidentally

This refactor makes those rules explicit first, so later behavioral changes can be reviewed separately.

What is intentionally not included

This PR does not:

change candidate ranking
select the "best" candidate
change overlap handling
add tail recovery for overlapping expanded templates
include Unicode/spellcheck normalization work

Those belong in later PRs.

Tests

Added NOcrDbExpandedMatchTests regression coverage for the current behavior:

multiple exact candidates still return the first database entry
exact phase still wins over relaxed phase
multiple relaxed candidates still return the first database entry

Validation note

I attempted to run:

dotnet test tests/UI/UITests.csproj --filter "FullyQualifiedName~NOcrDbExpandedMatchTests" -p:BaseOutputPath="D:\Рабочий стол\subtitleedit\artifacts\test-output-stage1\"

At the moment, this stops on an unrelated compile error already present on fresh upstream/main:

src/UI/Features/Video/CutVideo/CutVideoViewModel.cs:202
invalid token / missing ;

So the current validation blocker is outside the OCR changes in this PR.

HellbringerOnline · 2026-04-19T20:28:40Z

AppVeyor failure here appears to come from the current upstream/main, not from this OCR refactor.

I checked the failing PR build (5.0.20917) and it stops on:

src/UI/Features/Video/CutVideo/CutVideoViewModel.cs:202
CS1525: Invalid expression term '}'
CS1002: ; expected

I also checked the previous AppVeyor build for plain upstream/main (5.0.20916, commit 4e945766) and it fails on the exact same file/line/error before this PR.

So the current red CI status is inherited from the base branch. This PR only touches NOcrDb plus OCR regression tests.

Refactor expanded OCR match phases

d8418e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor expanded OCR match phases#10655

Refactor expanded OCR match phases#10655
HellbringerOnline wants to merge 1 commit intoSubtitleEdit:mainfrom
HellbringerOnline:refactor/nocr-expanded-match-stage1

HellbringerOnline commented Apr 19, 2026 •

edited

Loading

Uh oh!

HellbringerOnline commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HellbringerOnline commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why this step exists

What is intentionally not included

Tests

Validation note

Uh oh!

HellbringerOnline commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HellbringerOnline commented Apr 19, 2026 •

edited

Loading