fix: select correct unicode subset from font provider CSS#42
Conversation
Multi-subset fonts (e.g., Chakra Petch) return CSS with multiple @font-face blocks ordered by subset (thai, latin, latin-ext, etc). The first match was returned regardless of subset, downloading the wrong font file (e.g., Thai instead of Latin). Add a `subset` parameter (defaulting to "latin") that parses the /* subset-name */ CSS comments and extracts URLs only from the matching block, with fallback to the full CSS when the requested subset is not found.
Store the subset parameter in _attach_font_provider_metadata so that _get_font_variant_files can pass it to _get_fonturl when preloading font variants. Without this, set_default_font() would always load latin variants regardless of the subset originally requested.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an optional Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant Loader as pyfonts.{google,bunny}
participant Utils as pyfonts.utils
participant CSS as ExternalCSSProvider
participant Cache as pyfonts.cache
Caller->>Loader: load_*_font(family, subset="thai", allowed_formats...)
Loader->>Utils: _get_fonturl(family, weight, italic, allowed_formats, subset="thai")
Utils->>Cache: _cache_key(family, weight, italic, allowed_formats, subset="thai")
Cache-->>Utils: cache key
Utils->>CSS: HTTP GET provider CSS
CSS-->>Utils: CSS text (may include /* subset */ blocks)
Utils->>Utils: _parse_css_subsets(css_text)
Utils->>Utils: select subset block (or fallback to full CSS)
Utils-->>Loader: font URL
Loader->>Loader: download/cache font file
Loader->>Loader: _attach_font_provider_metadata(..., subset="thai")
Loader-->>Caller: FontProperties (metadata.subset = "thai")
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/changelog.md (1)
1-4: Consider adding PR/issue reference for traceability.Other changelog entries include links to the related issues and PRs (e.g.,
[issue#23](...), [PR#39](...)). Adding a reference would maintain consistency.📝 Suggested update
## Unreleased -- **NEW** and **FIX**: Add `subset` parameter to `load_google_font()` and `load_bunny_font()` (defaults to `"latin"`). Multi-subset fonts (e.g., Chakra Petch) now return the correct character set instead of whichever subset appears first in the provider CSS. +- **NEW** and **FIX**: Add `subset` parameter to `load_google_font()` and `load_bunny_font()` (defaults to `"latin"`). Multi-subset fonts (e.g., Chakra Petch) now return the correct character set instead of whichever subset appears first in the provider CSS. ([PR `#42`](https://github.com/y-sunflower/pyfonts/pull/42))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/changelog.md` around lines 1 - 4, Update the Unreleased changelog bullet for the load_google_font() and load_bunny_font() change to include the related issue and/or PR references (e.g., “[issue `#X`](link), [PR `#Y`](link)”) for traceability; edit the same line that mentions the new `subset` parameter so it mirrors other entries’ format by appending the appropriate issue/PR links next to the message about multi-subset font fixes.tests/test_bunny.py (1)
59-72: Test coverage looks good, but assertion may be fragile.The assertion
assert subset in urlrelies on the subset name appearing literally in the URL path, which is an implementation detail of the font provider. If Bunny Fonts changes their URL structure, this test could break.Consider also asserting that different subsets produce different URLs, which is the core invariant:
♻️ Optional: Add invariant assertion
`@pytest.mark.parametrize`("subset", ["latin", "thai"]) def test_get_fonturl_subset(subset): url = _get_fonturl( endpoint="https://fonts.bunny.net/css", family="Chakra Petch", weight=400, italic=False, allowed_formats=["woff", "ttf", "otf"], use_cache=False, subset=subset, ) assert isinstance(url, str) assert subset in url + + +def test_get_fonturl_subset_differ(): + """Ensure different subsets produce different URLs.""" + url_latin = _get_fonturl( + endpoint="https://fonts.bunny.net/css", + family="Chakra Petch", + weight=400, + italic=False, + allowed_formats=["woff", "ttf", "otf"], + use_cache=False, + subset="latin", + ) + url_thai = _get_fonturl( + endpoint="https://fonts.bunny.net/css", + family="Chakra Petch", + weight=400, + italic=False, + allowed_formats=["woff", "ttf", "otf"], + use_cache=False, + subset="thai", + ) + assert url_latin != url_thai🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_bunny.py` around lines 59 - 72, The assertion that the literal subset string appears in the URL is fragile; update test_get_fonturl_subset to (a) keep the existing isinstance(url, str) check, and (b) instead of asserting subset in url, call _get_fonturl twice with two different subset values (e.g., "latin" and "thai") and assert that the resulting URLs differ to ensure subset selection affects output; reference the test function test_get_fonturl_subset and the helper _get_fonturl when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pyfonts/utils.py`:
- Around line 93-105: The current logic in the block using
_parse_css_subsets/subsets/search_text allows falling back to the full css_text
whenever no font_urls are found, which causes a requested subset to be bypassed
if it exists but has no allowed_formats matches; change the fallback so that you
only search css_text when the requested subset key is not present in subsets
(i.e., if subset not in subsets then set search_text = css_text and run the
regex), otherwise if subset exists but yields no matches leave font_urls empty
(do not re-run the regex against css_text) so the subset constraint is
preserved; adjust the font_urls assignment around search_text and the
conditional that re-runs the regex accordingly.
---
Nitpick comments:
In `@docs/changelog.md`:
- Around line 1-4: Update the Unreleased changelog bullet for the
load_google_font() and load_bunny_font() change to include the related issue
and/or PR references (e.g., “[issue `#X`](link), [PR `#Y`](link)”) for traceability;
edit the same line that mentions the new `subset` parameter so it mirrors other
entries’ format by appending the appropriate issue/PR links next to the message
about multi-subset font fixes.
In `@tests/test_bunny.py`:
- Around line 59-72: The assertion that the literal subset string appears in the
URL is fragile; update test_get_fonturl_subset to (a) keep the existing
isinstance(url, str) check, and (b) instead of asserting subset in url, call
_get_fonturl twice with two different subset values (e.g., "latin" and "thai")
and assert that the resulting URLs differ to ensure subset selection affects
output; reference the test function test_get_fonturl_subset and the helper
_get_fonturl when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 47a22b54-f989-4243-ba7d-7e1bb77b4453
📒 Files selected for processing (9)
docs/changelog.mdpyfonts/bunny.pypyfonts/cache.pypyfonts/google.pypyfonts/main.pypyfonts/utils.pytests/test_bunny.pytests/test_cache.pytests/test_utils.py
|
thanks for the PR! feel free to ping me whenever you need a review |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_google.py (1)
71-71: Make monkeypatch resilient to future request kwargsUse a kwargs-tolerant stub to avoid brittle failures if
_get_fonturllater passesparams,headers, ortimeout.Proposed change
- monkeypatch.setattr("pyfonts.utils.requests.get", lambda _: DummyResponse()) + monkeypatch.setattr( + "pyfonts.utils.requests.get", + lambda *args, **kwargs: DummyResponse(), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_google.py` at line 71, The monkeypatch currently sets pyfonts.utils.requests.get to a lambda that only accepts one positional arg which will break if _get_fonturl later passes extra kwargs; replace the stub with a function or lambda that accepts both *args and **kwargs (e.g., lambda *args, **kwargs: DummyResponse()) so the monkeypatch in tests/test_google.py remains resilient to added params, headers, or timeout when _get_fonturl calls requests.get.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_google.py`:
- Around line 83-85: The test currently only checks that the variable url is a
valid URL using _is_url and _is_valid_raw_url; update the test to assert the
exact expected URL string returned from the stubbed CSS by adding a strict
equality assertion against the known expected URL value for this regression path
(use the existing variable name url in tests/test_google.py and keep the helper
checks if desired). Ensure the expected string matches the URL produced by the
stubbed CSS in your test fixture so the test fails if the wrong subset URL is
selected.
---
Nitpick comments:
In `@tests/test_google.py`:
- Line 71: The monkeypatch currently sets pyfonts.utils.requests.get to a lambda
that only accepts one positional arg which will break if _get_fonturl later
passes extra kwargs; replace the stub with a function or lambda that accepts
both *args and **kwargs (e.g., lambda *args, **kwargs: DummyResponse()) so the
monkeypatch in tests/test_google.py remains resilient to added params, headers,
or timeout when _get_fonturl calls requests.get.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 282f7525-9015-4128-9fcb-0961078a671b
📒 Files selected for processing (3)
pyfonts/utils.pytests/test_google.pytests/test_utils.py
✅ Files skipped from review due to trivial changes (1)
- tests/test_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
- pyfonts/utils.py
Refactor the formatting of function definitions and exception messages for improved readability in cache and font loading functions.
|
It seems the subset had higher impact that just allowing to select "latin" for some fonts. |
|
Thanks for the PR, this fixes an issue I haven't spotted! I've opened an issue since I found some more issues related to Bunny fonts that I think are out of scope this PR: #43 Also, the tests are only failing for Bunny fonts, which I find hard to understand why. Any idea why this would be? This doesn't seem to be a big issue since, as I said, I believe the issues related to Bunny fonts are not related to your changes (already existing in |
|
Another thing is that it's hard to test the output font object since |
It's because setting a subset leads to a different import. In this import the font name include variants (italic/weight). |
This is part of the bunny fonts issue actually. |
Add functionality to improve subset selection in font URL retrieval by allowing multiple blocks for the same subset. Introduce a new method to filter @font-face blocks based on the requested font style (italic or normal).
|
@JosephBARBIERDARNAL I just pushed 3b61db3 which should help with bunny. I'm not 100% I covered all cases, but most common are. |
|
Great, this looks good to me! Thanks again |
|
I'll update my draft blog post on brand+quarto+code and will likely publish it next week. |
Adds a
subsetparameter (defaulting to"latin") toload_google_font()andload_bunny_font()that controls which Unicode subset is downloaded from the provider CSS.Previously,
_get_fonturl()extracted the first format-matching URL from the CSS regardless of which@font-faceblock it belonged to. For multi-subset fonts like Chakra Petch, the Thai subset appeared before Latin in the CSS, so the Thai font file (94 glyphs, no A-Z) was returned instead of the Latin one.The fix parses the
/* subset-name */CSS comments to identify subset blocks and extracts URLs only from the requested block, falling back to the full CSS when the subset is not found. The subset is also stored in provider metadata so thatset_default_font()variant preloading uses the same subset.