fix(pdf): support CJK characters in PDF snapshots (#103)#160
Open
m719 wants to merge 1 commit into
Open
Conversation
CJK characters (Chinese, Japanese, Korean) rendered as mojibake in generated PDFs because jsPDF's built-in fonts only cover Latin-1. Bundle Noto Sans CJK (TTF, tracked via Git LFS) and lazy-load it only when CJK characters are detected in the content. Changes: - Add cjk-font.ts: regex-based detection, fetch + base64 conversion with chunk-based processing, promise lock with GC after use - Update pdf-generator.ts: integrate CJK font, add setFont helper that forces 'normal' style for CJK, add timeouts to image loaders, strip <script> tags, clean up dead code - Update extraction.ts: CJK support in fallback generators, restore courier for code blocks when not in CJK mode - Copy fonts to build output via vite config - Add unit tests for needsCJKFont and sanitizeFilename Works on both Chrome and Firefox (MV3).
There was a problem hiding this comment.
Pull request overview
This PR addresses issue #103 by adding on-demand CJK font support to the jsPDF-based PDF snapshot generation so Japanese/Chinese/Korean text renders correctly across browsers (Chrome/Firefox MV3), while avoiding overhead for Latin-only pages.
Changes:
- Add a new CJK font utility (
cjk-font.ts) that detects CJK characters and registers a bundled Noto Sans CJK font with jsPDF only when needed. - Integrate CJK font selection into both the shared PDF generator and content-script fallback generators; export and reuse
sanitizeFilename. - Ensure the font file is included in build output via Vite, and add unit tests for
needsCJKFontandsanitizeFilename.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
vite.config.ts |
Copies bundled fonts into the dist/<browser>/assets/fonts output so they can be fetched at runtime. |
tests/unit/pdf-generator.test.ts |
Adds unit tests covering sanitizeFilename behavior (including CJK titles). |
tests/unit/cjk-font.test.ts |
Adds unit tests validating needsCJKFont across multiple Unicode ranges and non-CJK text. |
src/shared/extraction/pdf-generator.ts |
Loads/registers the CJK font on demand, uses a font helper for style selection, adds timeouts for image helpers, strips dangerous elements, exports sanitizeFilename, and removes unused native-PDF code paths. |
src/shared/extraction/cjk-font.ts |
Implements CJK detection + on-demand font fetch/base64 conversion + jsPDF registration with a promise lock. |
src/content/extraction.ts |
Updates fallback PDF generators to use the same CJK font logic and shared sanitizeFilename. |
src/assets/fonts/NotoSansCJK-Regular.ttf |
Adds the Noto Sans CJK font tracked via Git LFS (pointer file in Git). |
.gitattributes |
Configures *.ttf files to be tracked via Git LFS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1 @@ | |||
| *.ttf filter=lfs diff=lfs merge=lfs -text | |||
Comment on lines
+9
to
+10
| import { jsPDF } from 'jspdf'; | ||
| import { loggers } from '../utils/logger'; |
Comment on lines
49
to
65
| /** | ||
| * Generate PDF from extracted content | ||
| * Tries native print first, falls back to jsPDF | ||
| * Generate PDF from extracted content using jsPDF | ||
| */ | ||
| export async function generatePDF( | ||
| content: ExtractedContent, | ||
| options: PDFOptions = {} | ||
| ): Promise<PDFGenerationResult | null> { | ||
| const opts = { ...DEFAULT_OPTIONS, ...options }; | ||
|
|
||
| // Try jsPDF generation (most reliable for extensions) | ||
| const jspdfResult = await generateWithJsPDF(content, opts); | ||
| if (jspdfResult) { | ||
| return jspdfResult; | ||
| } | ||
|
|
||
| log.error('[PDFGenerator] All PDF generation methods failed'); | ||
| log.error('[PDFGenerator] PDF generation failed'); | ||
| return null; | ||
| } |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #160 +/- ##
==========================================
- Coverage 33.40% 33.39% -0.02%
==========================================
Files 92 93 +1
Lines 16556 16602 +46
Branches 5238 5440 +202
==========================================
+ Hits 5531 5544 +13
- Misses 11025 11058 +33
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes:
Works on both Chrome and Firefox (MV3).
Proposed changes
Related issues
How to test this PR
Test pages:
Checklist
type(scope?): description (#issue)Further comments
The font file is ~19 MB (TTF) tracked via Git LFS. It adds ~5 MB compressed to the extension package. It is only fetched at runtime when CJK characters are detected, so Latin-only PDFs have zero overhead.
Alternatives considered:
Page.printToPDF(which respects system fonts), and fall back to html2canvas on Firefox. This would avoid storing the font entirely, but the Firefox output would be image-based with no selectable text.