Skip to content

Refactor expanded OCR match phases#10655

Open
HellbringerOnline wants to merge 1 commit intoSubtitleEdit:mainfrom
HellbringerOnline:refactor/nocr-expanded-match-stage1
Open

Refactor expanded OCR match phases#10655
HellbringerOnline wants to merge 1 commit intoSubtitleEdit:mainfrom
HellbringerOnline:refactor/nocr-expanded-match-stage1

Conversation

@HellbringerOnline
Copy link
Copy Markdown
Contributor

@HellbringerOnline HellbringerOnline commented Apr 19, 2026

Summary

This PR is the first step in splitting the current nOCR work into reviewable pieces.

The goal of this step is only to make expanded nOCR matching easier to reason about and test. It does not change which expanded template wins.

What changed

  • extracted the two implicit expanded-match passes into explicit phases inside NOcrDb
    • Exact
    • Relaxed
  • introduced an internal expanded-match candidate model so the matching state is explicit instead of being spread across nested loops
  • moved per-template matching into helper methods for:
    • collecting expanded candidates
    • checking foreground/background sampled points
    • evaluating size/width-percent constraints
  • kept the current selection behavior intact
    • first matching candidate from the Exact phase wins
    • if there is no exact candidate, first matching candidate from the Relaxed phase wins

Why this step exists

The current expanded matching logic was previously embedded directly in GetMatchExpanded(...) as two large loops with implicit rules.

That made it hard to:

  • see which candidates actually passed matching
  • test exact vs relaxed behavior in isolation
  • prepare follow-up fixes without changing behavior accidentally

This refactor makes those rules explicit first, so later behavioral changes can be reviewed separately.

What is intentionally not included

This PR does not:

  • change candidate ranking
  • select the "best" candidate
  • change overlap handling
  • add tail recovery for overlapping expanded templates
  • include Unicode/spellcheck normalization work

Those belong in later PRs.

Tests

Added NOcrDbExpandedMatchTests regression coverage for the current behavior:

  • multiple exact candidates still return the first database entry
  • exact phase still wins over relaxed phase
  • multiple relaxed candidates still return the first database entry

Validation note

I attempted to run:

dotnet test tests/UI/UITests.csproj --filter "FullyQualifiedName~NOcrDbExpandedMatchTests" -p:BaseOutputPath="D:\Рабочий стол\subtitleedit\artifacts\test-output-stage1\"

At the moment, this stops on an unrelated compile error already present on fresh upstream/main:

  • src/UI/Features/Video/CutVideo/CutVideoViewModel.cs:202
  • invalid token / missing ;

So the current validation blocker is outside the OCR changes in this PR.

@HellbringerOnline
Copy link
Copy Markdown
Contributor Author

AppVeyor failure here appears to come from the current upstream/main, not from this OCR refactor.

I checked the failing PR build (5.0.20917) and it stops on:

  • src/UI/Features/Video/CutVideo/CutVideoViewModel.cs:202
  • CS1525: Invalid expression term '}'
  • CS1002: ; expected

I also checked the previous AppVeyor build for plain upstream/main (5.0.20916, commit 4e945766) and it fails on the exact same file/line/error before this PR.

So the current red CI status is inherited from the base branch. This PR only touches NOcrDb plus OCR regression tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant