Skip to content

Add wildcard ~ support for Japanese grammar patterns#2363

Open
RadiantSol wants to merge 2 commits intoyomidevs:masterfrom
RadiantSol:claude/wizardly-meitner
Open

Add wildcard ~ support for Japanese grammar patterns#2363
RadiantSol wants to merge 2 commits intoyomidevs:masterfrom
RadiantSol:claude/wizardly-meitner

Conversation

@RadiantSol
Copy link
Copy Markdown

@RadiantSol RadiantSol commented Mar 30, 2026

Only drawbacks to this implementation is the addition of potentially a lot more lookups and the inability to scan for X~Y~Z entries.

AI Summary

  • Added insertWildcard text processor for Japanese that generates wildcard variants of grammar patterns
  • Supports X~Y patterns (e.g., "いくら騒いでも" → "いくら~でも")
  • Generates up to 51 variants per input string to avoid excessive processing
  • Integrated new processor into Japanese language descriptor pipeline
  • Added comprehensive test suite with 91 test cases covering various pattern scenarios
  • Updated TypeScript type definitions

…midevs#2336)

Add a wildcard text preprocessor that generates variants of scanned text
with middle portions replaced by ~, enabling grammar dictionaries to use
entries like "いくら~でも" that match text like "いくら騒いでも".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RadiantSol RadiantSol changed the title Add wildcard support for Japanese grammar patterns Add wildcard ~ support for Japanese grammar patterns Mar 30, 2026
@RadiantSol RadiantSol marked this pull request as ready for review March 30, 2026 03:09
@RadiantSol RadiantSol requested a review from a team as a code owner March 30, 2026 03:09
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 30, 2026

Merging this PR will degrade performance by 62.53%

❌ 1 regressed benchmark
✅ 4 untouched benchmarks
⏩ 2 skipped benchmarks1

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
Translator.prototype.findTerms - (n=47) 123.5 ms 329.6 ms -62.53%

Comparing RadiantSol:claude/wizardly-meitner (2c66da9) with master (530d165)

Open in CodSpeed

Footnotes

  1. 2 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@keanu-thakalath
Copy link
Copy Markdown

maybe we could avoid the performance hit by doing this in the deinflection layer instead? a new wildcardInflection transform with empty conditionsIn and conditionsOut only emits candidates for known patterns instead of 51x before transforming.

the only issue is we'd have to start maintaining a hardcoded list of known patterns, or create a new system for dynamically creating rules from imported dictionaries.

are you using a dictionary that already has these wildcard entries? i don't see any in jitendex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants