fix: select correct unicode subset from font provider CSS by mcanouil · Pull Request #42 · y-sunflower/pyfonts

mcanouil · 2026-03-29T17:19:12Z

Adds a subset parameter (defaulting to "latin") to load_google_font() and load_bunny_font() that controls which Unicode subset is downloaded from the provider CSS.

Previously, _get_fonturl() extracted the first format-matching URL from the CSS regardless of which @font-face block it belonged to. For multi-subset fonts like Chakra Petch, the Thai subset appeared before Latin in the CSS, so the Thai font file (94 glyphs, no A-Z) was returned instead of the Latin one.

The fix parses the /* subset-name */ CSS comments to identify subset blocks and extracts URLs only from the requested block, falling back to the full CSS when the subset is not found. The subset is also stored in provider metadata so that set_default_font() variant preloading uses the same subset.

@font-face

Multi-subset fonts (e.g., Chakra Petch) return CSS with multiple @font-face blocks ordered by subset (thai, latin, latin-ext, etc). The first match was returned regardless of subset, downloading the wrong font file (e.g., Thai instead of Latin). Add a `subset` parameter (defaulting to "latin") that parses the /* subset-name */ CSS comments and extracts URLs only from the matching block, with fallback to the full CSS when the requested subset is not found.

Store the subset parameter in _attach_font_provider_metadata so that _get_font_variant_files can pass it to _get_fonturl when preloading font variants. Without this, set_default_font() would always load latin variants regardless of the subset originally requested.

coderabbitai · 2026-03-29T17:19:27Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds an optional subset (default "latin") through font loaders and utilities, parses provider CSS into subset-labeled blocks, makes cache keys subset-aware, and records the requested subset in provider metadata so URL generation, downloads, and variants reflect the requested character subset.

Changes

Cohort / File(s)	Summary
Documentation `docs/changelog.md`	Added "Unreleased" entry documenting the new `subset` parameter and its effect on multi-subset fonts.
Public API (loaders) `pyfonts/google.py`, `pyfonts/bunny.py`	Added optional `subset: str = "latin"` to `load_google_font()` and `load_bunny_font()` and threaded it into URL construction and metadata attachment.
Cache Layer `pyfonts/cache.py`	Extended `_cache_key()` signature to accept `subset` and include it in the SHA-256 cache key so cache entries are differentiated by subset.
Utilities `pyfonts/utils.py`	Added `_parse_css_subsets()`; extended `_get_fonturl()` to accept `subset`, prefer subset-specific CSS blocks (fall back to full CSS), include `subset` in cache key derivation; `_attach_font_provider_metadata()` now records `subset`.
Main / Variant discovery `pyfonts/main.py`	Variant URL generation now passes `provider_metadata["subset"]` into `_get_fonturl()` so all provider variant URLs reflect the requested subset.
Tests `tests/test_bunny.py`, `tests/test_cache.py`, `tests/test_utils.py`, `tests/test_google.py`	Added tests for CSS subset parsing, subset-aware URL extraction and fallback, cache-key differentiation by subset, and case-insensitive/whitespace-tolerant subset lookup.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Loader as pyfonts.{google,bunny}
    participant Utils as pyfonts.utils
    participant CSS as ExternalCSSProvider
    participant Cache as pyfonts.cache

    Caller->>Loader: load_*_font(family, subset="thai", allowed_formats...)
    Loader->>Utils: _get_fonturl(family, weight, italic, allowed_formats, subset="thai")
    Utils->>Cache: _cache_key(family, weight, italic, allowed_formats, subset="thai")
    Cache-->>Utils: cache key
    Utils->>CSS: HTTP GET provider CSS
    CSS-->>Utils: CSS text (may include /* subset */ blocks)
    Utils->>Utils: _parse_css_subsets(css_text)
    Utils->>Utils: select subset block (or fallback to full CSS)
    Utils-->>Loader: font URL
    Loader->>Loader: download/cache font file
    Loader->>Loader: _attach_font_provider_metadata(..., subset="thai")
    Loader-->>Caller: FontProperties (metadata.subset = "thai")

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Get all font variants #39: Modifies the same provider utilities and font-loading pipeline (pyfonts/utils.py, pyfonts/google.py, pyfonts/bunny.py, pyfonts/main.py), indicating overlapping changes to metadata and URL extraction logic.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and accurately describes the main purpose of the PR: adding subset selection functionality to correctly extract the requested Unicode subset from font provider CSS.
Description check	✅ Passed	The description comprehensively explains the problem being solved (incorrect subset selection for multi-subset fonts), the solution (subset parameter with CSS comment parsing), and implementation details (metadata storage for variant preloading).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

docs/changelog.md (1)

1-4: Consider adding PR/issue reference for traceability.

Other changelog entries include links to the related issues and PRs (e.g., [issue #23](...), [PR #39](...)). Adding a reference would maintain consistency.

📝 Suggested update

 ## Unreleased
 
-- **NEW** and **FIX**: Add `subset` parameter to `load_google_font()` and `load_bunny_font()` (defaults to `"latin"`). Multi-subset fonts (e.g., Chakra Petch) now return the correct character set instead of whichever subset appears first in the provider CSS.
+- **NEW** and **FIX**: Add `subset` parameter to `load_google_font()` and `load_bunny_font()` (defaults to `"latin"`). Multi-subset fonts (e.g., Chakra Petch) now return the correct character set instead of whichever subset appears first in the provider CSS. ([PR `#42`](https://github.com/y-sunflower/pyfonts/pull/42))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/changelog.md` around lines 1 - 4, Update the Unreleased changelog bullet
for the load_google_font() and load_bunny_font() change to include the related
issue and/or PR references (e.g., “[issue `#X`](link), [PR `#Y`](link)”) for
traceability; edit the same line that mentions the new `subset` parameter so it
mirrors other entries’ format by appending the appropriate issue/PR links next
to the message about multi-subset font fixes.

tests/test_bunny.py (1)

59-72: Test coverage looks good, but assertion may be fragile.

The assertion assert subset in url relies on the subset name appearing literally in the URL path, which is an implementation detail of the font provider. If Bunny Fonts changes their URL structure, this test could break.

Consider also asserting that different subsets produce different URLs, which is the core invariant:

♻️ Optional: Add invariant assertion

 `@pytest.mark.parametrize`("subset", ["latin", "thai"])
 def test_get_fonturl_subset(subset):
     url = _get_fonturl(
         endpoint="https://fonts.bunny.net/css",
         family="Chakra Petch",
         weight=400,
         italic=False,
         allowed_formats=["woff", "ttf", "otf"],
         use_cache=False,
         subset=subset,
     )
     assert isinstance(url, str)
     assert subset in url
+
+
+def test_get_fonturl_subset_differ():
+    """Ensure different subsets produce different URLs."""
+    url_latin = _get_fonturl(
+        endpoint="https://fonts.bunny.net/css",
+        family="Chakra Petch",
+        weight=400,
+        italic=False,
+        allowed_formats=["woff", "ttf", "otf"],
+        use_cache=False,
+        subset="latin",
+    )
+    url_thai = _get_fonturl(
+        endpoint="https://fonts.bunny.net/css",
+        family="Chakra Petch",
+        weight=400,
+        italic=False,
+        allowed_formats=["woff", "ttf", "otf"],
+        use_cache=False,
+        subset="thai",
+    )
+    assert url_latin != url_thai

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_bunny.py` around lines 59 - 72, The assertion that the literal
subset string appears in the URL is fragile; update test_get_fonturl_subset to
(a) keep the existing isinstance(url, str) check, and (b) instead of asserting
subset in url, call _get_fonturl twice with two different subset values (e.g.,
"latin" and "thai") and assert that the resulting URLs differ to ensure subset
selection affects output; reference the test function test_get_fonturl_subset
and the helper _get_fonturl when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyfonts/utils.py`:
- Around line 93-105: The current logic in the block using
_parse_css_subsets/subsets/search_text allows falling back to the full css_text
whenever no font_urls are found, which causes a requested subset to be bypassed
if it exists but has no allowed_formats matches; change the fallback so that you
only search css_text when the requested subset key is not present in subsets
(i.e., if subset not in subsets then set search_text = css_text and run the
regex), otherwise if subset exists but yields no matches leave font_urls empty
(do not re-run the regex against css_text) so the subset constraint is
preserved; adjust the font_urls assignment around search_text and the
conditional that re-runs the regex accordingly.

---

Nitpick comments:
In `@docs/changelog.md`:
- Around line 1-4: Update the Unreleased changelog bullet for the
load_google_font() and load_bunny_font() change to include the related issue
and/or PR references (e.g., “[issue `#X`](link), [PR `#Y`](link)”) for traceability;
edit the same line that mentions the new `subset` parameter so it mirrors other
entries’ format by appending the appropriate issue/PR links next to the message
about multi-subset font fixes.

In `@tests/test_bunny.py`:
- Around line 59-72: The assertion that the literal subset string appears in the
URL is fragile; update test_get_fonturl_subset to (a) keep the existing
isinstance(url, str) check, and (b) instead of asserting subset in url, call
_get_fonturl twice with two different subset values (e.g., "latin" and "thai")
and assert that the resulting URLs differ to ensure subset selection affects
output; reference the test function test_get_fonturl_subset and the helper
_get_fonturl when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 47a22b54-f989-4243-ba7d-7e1bb77b4453

📥 Commits

Reviewing files that changed from the base of the PR and between 2df071e and 9592a58.

📒 Files selected for processing (9)

docs/changelog.md
pyfonts/bunny.py
pyfonts/cache.py
pyfonts/google.py
pyfonts/main.py
pyfonts/utils.py
tests/test_bunny.py
tests/test_cache.py
tests/test_utils.py

JosephBARBIERDARNAL · 2026-03-29T17:47:10Z

thanks for the PR! feel free to ping me whenever you need a review

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/test_google.py (1)

71-71: Make monkeypatch resilient to future request kwargs

Use a kwargs-tolerant stub to avoid brittle failures if _get_fonturl later passes params, headers, or timeout.

Proposed change

-    monkeypatch.setattr("pyfonts.utils.requests.get", lambda _: DummyResponse())
+    monkeypatch.setattr(
+        "pyfonts.utils.requests.get",
+        lambda *args, **kwargs: DummyResponse(),
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/test_google.py` at line 71, The monkeypatch currently sets
pyfonts.utils.requests.get to a lambda that only accepts one positional arg
which will break if _get_fonturl later passes extra kwargs; replace the stub
with a function or lambda that accepts both *args and **kwargs (e.g., lambda
*args, **kwargs: DummyResponse()) so the monkeypatch in tests/test_google.py
remains resilient to added params, headers, or timeout when _get_fonturl calls
requests.get.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_google.py`:
- Around line 83-85: The test currently only checks that the variable url is a
valid URL using _is_url and _is_valid_raw_url; update the test to assert the
exact expected URL string returned from the stubbed CSS by adding a strict
equality assertion against the known expected URL value for this regression path
(use the existing variable name url in tests/test_google.py and keep the helper
checks if desired). Ensure the expected string matches the URL produced by the
stubbed CSS in your test fixture so the test fails if the wrong subset URL is
selected.

---

Nitpick comments:
In `@tests/test_google.py`:
- Line 71: The monkeypatch currently sets pyfonts.utils.requests.get to a lambda
that only accepts one positional arg which will break if _get_fonturl later
passes extra kwargs; replace the stub with a function or lambda that accepts
both *args and **kwargs (e.g., lambda *args, **kwargs: DummyResponse()) so the
monkeypatch in tests/test_google.py remains resilient to added params, headers,
or timeout when _get_fonturl calls requests.get.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 282f7525-9015-4128-9fcb-0961078a671b

📥 Commits

Reviewing files that changed from the base of the PR and between 9592a58 and 2a7bf23.

📒 Files selected for processing (3)

pyfonts/utils.py
tests/test_google.py
tests/test_utils.py

✅ Files skipped from review due to trivial changes (1)

tests/test_utils.py

🚧 Files skipped from review as they are similar to previous changes (1)

pyfonts/utils.py

Refactor the formatting of function definitions and exception messages for improved readability in cache and font loading functions.

mcanouil · 2026-03-29T17:52:26Z

~~There are several failing tests on main related to bunny fonts download because the family assertion is incorrect.~~

It seems the subset had higher impact that just allowing to select "latin" for some fonts.
I'm looking into it.

JosephBARBIERDARNAL · 2026-04-03T08:32:18Z

Thanks for the PR, this fixes an issue I haven't spotted!

I've opened an issue since I found some more issues related to Bunny fonts that I think are out of scope this PR: #43

Also, the tests are only failing for Bunny fonts, which I find hard to understand why. Any idea why this would be? This doesn't seem to be a big issue since, as I said, I believe the issues related to Bunny fonts are not related to your changes (already existing in main).

JosephBARBIERDARNAL · 2026-04-03T08:34:46Z

Another thing is that it's hard to test the output font object since .get_style(), .get_weight() etc doesn't represent well what will be displayed to users. This also is out of scope this PR, but if you have thoughts on that I'm interested.

mcanouil · 2026-04-03T08:35:43Z

Also, the tests are only failing for Bunny fonts, which I find hard to understand why. Any idea why this would be? This doesn't seem to be a big issue since, as I said, I believe the issues related to Bunny fonts are not related to your changes (already existing in main).

It's because setting a subset leads to a different import. In this import the font name include variants (italic/weight).
I'm looking at a way to have full imports but still keep the ability to target one as primary when using bunny.

mcanouil · 2026-04-03T08:54:50Z

Another thing is that it's hard to test the output font object since .get_style(), .get_weight() etc doesn't represent well what will be displayed to users. This also is out of scope this PR, but if you have thoughts on that I'm interested.

This is part of the bunny fonts issue actually.
The order from the provider is not deterministic which can lead to have the wrong style for the right weight and vice-versa.
From my tests, the weights selection seems to be more reliable, so fixing how style is retrieved from search text (i.e., https://fonts.bunny.net/css?family=Roboto:ital,wght@1,400' this gets messy) is one step in the good direction I believe.

@font-face

Add functionality to improve subset selection in font URL retrieval by allowing multiple blocks for the same subset. Introduce a new method to filter @font-face blocks based on the requested font style (italic or normal).

mcanouil · 2026-04-03T09:23:07Z

@JosephBARBIERDARNAL I just pushed 3b61db3 which should help with bunny. I'm not 100% I covered all cases, but most common are.
I've tested it for tables/figures in https://m.canouil.dev/quarto-brand-renderings/brands/brand-1/python.html

JosephBARBIERDARNAL · 2026-04-08T13:21:44Z

Great, this looks good to me! Thanks again

mcanouil · 2026-04-08T13:37:48Z

I'll update my draft blog post on brand+quarto+code and will likely publish it next week.
Thanks for the python package ;)

mcanouil added 4 commits March 29, 2026 19:08

docs: add changelog entry for subset parameter

6e4e9c7

docs: mark changelog entry as both new and fix

9592a58

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

Comment thread pyfonts/utils.py Outdated

mcanouil added 2 commits March 29, 2026 19:46

fix: improve subset selection in font URL retrieval

4aa4f79

test: add tests for font URL retrieval with subset handling

2a7bf23

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

Comment thread tests/test_google.py Outdated

mcanouil added 2 commits March 29, 2026 19:48

core: styling

038d3a2

Refactor the formatting of function definitions and exception messages for improved readability in cache and font loading functions.

test: ensure the assertion checks the expected URL directly

ba48477

chore: use ruff instead of black

ea30e4a

mcanouil marked this pull request as draft April 3, 2026 08:25

feat: enhance font subset selection and filtering by style

3b61db3

Add functionality to improve subset selection in font URL retrieval by allowing multiple blocks for the same subset. Introduce a new method to filter @font-face blocks based on the requested font style (italic or normal).

mcanouil marked this pull request as ready for review April 3, 2026 09:23

JosephBARBIERDARNAL approved these changes Apr 8, 2026

View reviewed changes

JosephBARBIERDARNAL merged commit a4c2240 into y-sunflower:main Apr 8, 2026
9 checks passed

mcanouil deleted the fix/font-subset-selection branch April 8, 2026 13:37

Conversation

mcanouil commented Mar 29, 2026

Uh oh!

coderabbitai bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JosephBARBIERDARNAL commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mcanouil commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JosephBARBIERDARNAL commented Apr 3, 2026

Uh oh!

JosephBARBIERDARNAL commented Apr 3, 2026

Uh oh!

mcanouil commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcanouil commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcanouil commented Apr 3, 2026

Uh oh!

JosephBARBIERDARNAL commented Apr 8, 2026

Uh oh!

Uh oh!

mcanouil commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 29, 2026 •

edited

Loading

JosephBARBIERDARNAL commented Mar 29, 2026 •

edited

Loading

mcanouil commented Mar 29, 2026 •

edited

Loading

mcanouil commented Apr 3, 2026 •

edited

Loading

mcanouil commented Apr 3, 2026 •

edited

Loading