Skip to content

fix: select correct unicode subset from font provider CSS#42

Merged
JosephBARBIERDARNAL merged 10 commits intoy-sunflower:mainfrom
mcanouil:fix/font-subset-selection
Apr 8, 2026
Merged

fix: select correct unicode subset from font provider CSS#42
JosephBARBIERDARNAL merged 10 commits intoy-sunflower:mainfrom
mcanouil:fix/font-subset-selection

Conversation

@mcanouil
Copy link
Copy Markdown
Contributor

Adds a subset parameter (defaulting to "latin") to load_google_font() and load_bunny_font() that controls which Unicode subset is downloaded from the provider CSS.

Previously, _get_fonturl() extracted the first format-matching URL from the CSS regardless of which @font-face block it belonged to. For multi-subset fonts like Chakra Petch, the Thai subset appeared before Latin in the CSS, so the Thai font file (94 glyphs, no A-Z) was returned instead of the Latin one.

The fix parses the /* subset-name */ CSS comments to identify subset blocks and extracts URLs only from the requested block, falling back to the full CSS when the subset is not found. The subset is also stored in provider metadata so that set_default_font() variant preloading uses the same subset.

Multi-subset fonts (e.g., Chakra Petch) return CSS with multiple
@font-face blocks ordered by subset (thai, latin, latin-ext, etc).
The first match was returned regardless of subset, downloading the
wrong font file (e.g., Thai instead of Latin).

Add a `subset` parameter (defaulting to "latin") that parses the
/* subset-name */ CSS comments and extracts URLs only from the
matching block, with fallback to the full CSS when the requested
subset is not found.
Store the subset parameter in _attach_font_provider_metadata so that
_get_font_variant_files can pass it to _get_fonturl when preloading
font variants. Without this, set_default_font() would always load
latin variants regardless of the subset originally requested.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an optional subset (default "latin") through font loaders and utilities, parses provider CSS into subset-labeled blocks, makes cache keys subset-aware, and records the requested subset in provider metadata so URL generation, downloads, and variants reflect the requested character subset.

Changes

Cohort / File(s) Summary
Documentation
docs/changelog.md
Added "Unreleased" entry documenting the new subset parameter and its effect on multi-subset fonts.
Public API (loaders)
pyfonts/google.py, pyfonts/bunny.py
Added optional subset: str = "latin" to load_google_font() and load_bunny_font() and threaded it into URL construction and metadata attachment.
Cache Layer
pyfonts/cache.py
Extended _cache_key() signature to accept subset and include it in the SHA-256 cache key so cache entries are differentiated by subset.
Utilities
pyfonts/utils.py
Added _parse_css_subsets(); extended _get_fonturl() to accept subset, prefer subset-specific CSS blocks (fall back to full CSS), include subset in cache key derivation; _attach_font_provider_metadata() now records subset.
Main / Variant discovery
pyfonts/main.py
Variant URL generation now passes provider_metadata["subset"] into _get_fonturl() so all provider variant URLs reflect the requested subset.
Tests
tests/test_bunny.py, tests/test_cache.py, tests/test_utils.py, tests/test_google.py
Added tests for CSS subset parsing, subset-aware URL extraction and fallback, cache-key differentiation by subset, and case-insensitive/whitespace-tolerant subset lookup.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant Loader as pyfonts.{google,bunny}
    participant Utils as pyfonts.utils
    participant CSS as ExternalCSSProvider
    participant Cache as pyfonts.cache

    Caller->>Loader: load_*_font(family, subset="thai", allowed_formats...)
    Loader->>Utils: _get_fonturl(family, weight, italic, allowed_formats, subset="thai")
    Utils->>Cache: _cache_key(family, weight, italic, allowed_formats, subset="thai")
    Cache-->>Utils: cache key
    Utils->>CSS: HTTP GET provider CSS
    CSS-->>Utils: CSS text (may include /* subset */ blocks)
    Utils->>Utils: _parse_css_subsets(css_text)
    Utils->>Utils: select subset block (or fallback to full CSS)
    Utils-->>Loader: font URL
    Loader->>Loader: download/cache font file
    Loader->>Loader: _attach_font_provider_metadata(..., subset="thai")
    Loader-->>Caller: FontProperties (metadata.subset = "thai")
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Get all font variants #39: Modifies the same provider utilities and font-loading pipeline (pyfonts/utils.py, pyfonts/google.py, pyfonts/bunny.py, pyfonts/main.py), indicating overlapping changes to metadata and URL extraction logic.
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately describes the main purpose of the PR: adding subset selection functionality to correctly extract the requested Unicode subset from font provider CSS.
Description check ✅ Passed The description comprehensively explains the problem being solved (incorrect subset selection for multi-subset fonts), the solution (subset parameter with CSS comment parsing), and implementation details (metadata storage for variant preloading).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/changelog.md (1)

1-4: Consider adding PR/issue reference for traceability.

Other changelog entries include links to the related issues and PRs (e.g., [issue #23](...), [PR #39](...)). Adding a reference would maintain consistency.

📝 Suggested update
 ## Unreleased
 
-- **NEW** and **FIX**: Add `subset` parameter to `load_google_font()` and `load_bunny_font()` (defaults to `"latin"`). Multi-subset fonts (e.g., Chakra Petch) now return the correct character set instead of whichever subset appears first in the provider CSS.
+- **NEW** and **FIX**: Add `subset` parameter to `load_google_font()` and `load_bunny_font()` (defaults to `"latin"`). Multi-subset fonts (e.g., Chakra Petch) now return the correct character set instead of whichever subset appears first in the provider CSS. ([PR `#42`](https://github.com/y-sunflower/pyfonts/pull/42))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/changelog.md` around lines 1 - 4, Update the Unreleased changelog bullet
for the load_google_font() and load_bunny_font() change to include the related
issue and/or PR references (e.g., “[issue `#X`](link), [PR `#Y`](link)”) for
traceability; edit the same line that mentions the new `subset` parameter so it
mirrors other entries’ format by appending the appropriate issue/PR links next
to the message about multi-subset font fixes.
tests/test_bunny.py (1)

59-72: Test coverage looks good, but assertion may be fragile.

The assertion assert subset in url relies on the subset name appearing literally in the URL path, which is an implementation detail of the font provider. If Bunny Fonts changes their URL structure, this test could break.

Consider also asserting that different subsets produce different URLs, which is the core invariant:

♻️ Optional: Add invariant assertion
 `@pytest.mark.parametrize`("subset", ["latin", "thai"])
 def test_get_fonturl_subset(subset):
     url = _get_fonturl(
         endpoint="https://fonts.bunny.net/css",
         family="Chakra Petch",
         weight=400,
         italic=False,
         allowed_formats=["woff", "ttf", "otf"],
         use_cache=False,
         subset=subset,
     )
     assert isinstance(url, str)
     assert subset in url
+
+
+def test_get_fonturl_subset_differ():
+    """Ensure different subsets produce different URLs."""
+    url_latin = _get_fonturl(
+        endpoint="https://fonts.bunny.net/css",
+        family="Chakra Petch",
+        weight=400,
+        italic=False,
+        allowed_formats=["woff", "ttf", "otf"],
+        use_cache=False,
+        subset="latin",
+    )
+    url_thai = _get_fonturl(
+        endpoint="https://fonts.bunny.net/css",
+        family="Chakra Petch",
+        weight=400,
+        italic=False,
+        allowed_formats=["woff", "ttf", "otf"],
+        use_cache=False,
+        subset="thai",
+    )
+    assert url_latin != url_thai
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_bunny.py` around lines 59 - 72, The assertion that the literal
subset string appears in the URL is fragile; update test_get_fonturl_subset to
(a) keep the existing isinstance(url, str) check, and (b) instead of asserting
subset in url, call _get_fonturl twice with two different subset values (e.g.,
"latin" and "thai") and assert that the resulting URLs differ to ensure subset
selection affects output; reference the test function test_get_fonturl_subset
and the helper _get_fonturl when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyfonts/utils.py`:
- Around line 93-105: The current logic in the block using
_parse_css_subsets/subsets/search_text allows falling back to the full css_text
whenever no font_urls are found, which causes a requested subset to be bypassed
if it exists but has no allowed_formats matches; change the fallback so that you
only search css_text when the requested subset key is not present in subsets
(i.e., if subset not in subsets then set search_text = css_text and run the
regex), otherwise if subset exists but yields no matches leave font_urls empty
(do not re-run the regex against css_text) so the subset constraint is
preserved; adjust the font_urls assignment around search_text and the
conditional that re-runs the regex accordingly.

---

Nitpick comments:
In `@docs/changelog.md`:
- Around line 1-4: Update the Unreleased changelog bullet for the
load_google_font() and load_bunny_font() change to include the related issue
and/or PR references (e.g., “[issue `#X`](link), [PR `#Y`](link)”) for traceability;
edit the same line that mentions the new `subset` parameter so it mirrors other
entries’ format by appending the appropriate issue/PR links next to the message
about multi-subset font fixes.

In `@tests/test_bunny.py`:
- Around line 59-72: The assertion that the literal subset string appears in the
URL is fragile; update test_get_fonturl_subset to (a) keep the existing
isinstance(url, str) check, and (b) instead of asserting subset in url, call
_get_fonturl twice with two different subset values (e.g., "latin" and "thai")
and assert that the resulting URLs differ to ensure subset selection affects
output; reference the test function test_get_fonturl_subset and the helper
_get_fonturl when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 47a22b54-f989-4243-ba7d-7e1bb77b4453

📥 Commits

Reviewing files that changed from the base of the PR and between 2df071e and 9592a58.

📒 Files selected for processing (9)
  • docs/changelog.md
  • pyfonts/bunny.py
  • pyfonts/cache.py
  • pyfonts/google.py
  • pyfonts/main.py
  • pyfonts/utils.py
  • tests/test_bunny.py
  • tests/test_cache.py
  • tests/test_utils.py

Comment thread pyfonts/utils.py Outdated
@JosephBARBIERDARNAL
Copy link
Copy Markdown
Collaborator

JosephBARBIERDARNAL commented Mar 29, 2026

thanks for the PR! feel free to ping me whenever you need a review

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_google.py (1)

71-71: Make monkeypatch resilient to future request kwargs

Use a kwargs-tolerant stub to avoid brittle failures if _get_fonturl later passes params, headers, or timeout.

Proposed change
-    monkeypatch.setattr("pyfonts.utils.requests.get", lambda _: DummyResponse())
+    monkeypatch.setattr(
+        "pyfonts.utils.requests.get",
+        lambda *args, **kwargs: DummyResponse(),
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_google.py` at line 71, The monkeypatch currently sets
pyfonts.utils.requests.get to a lambda that only accepts one positional arg
which will break if _get_fonturl later passes extra kwargs; replace the stub
with a function or lambda that accepts both *args and **kwargs (e.g., lambda
*args, **kwargs: DummyResponse()) so the monkeypatch in tests/test_google.py
remains resilient to added params, headers, or timeout when _get_fonturl calls
requests.get.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_google.py`:
- Around line 83-85: The test currently only checks that the variable url is a
valid URL using _is_url and _is_valid_raw_url; update the test to assert the
exact expected URL string returned from the stubbed CSS by adding a strict
equality assertion against the known expected URL value for this regression path
(use the existing variable name url in tests/test_google.py and keep the helper
checks if desired). Ensure the expected string matches the URL produced by the
stubbed CSS in your test fixture so the test fails if the wrong subset URL is
selected.

---

Nitpick comments:
In `@tests/test_google.py`:
- Line 71: The monkeypatch currently sets pyfonts.utils.requests.get to a lambda
that only accepts one positional arg which will break if _get_fonturl later
passes extra kwargs; replace the stub with a function or lambda that accepts
both *args and **kwargs (e.g., lambda *args, **kwargs: DummyResponse()) so the
monkeypatch in tests/test_google.py remains resilient to added params, headers,
or timeout when _get_fonturl calls requests.get.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 282f7525-9015-4128-9fcb-0961078a671b

📥 Commits

Reviewing files that changed from the base of the PR and between 9592a58 and 2a7bf23.

📒 Files selected for processing (3)
  • pyfonts/utils.py
  • tests/test_google.py
  • tests/test_utils.py
✅ Files skipped from review due to trivial changes (1)
  • tests/test_utils.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyfonts/utils.py

Comment thread tests/test_google.py Outdated
Refactor the formatting of function definitions and exception messages for improved readability in cache and font loading functions.
@mcanouil
Copy link
Copy Markdown
Contributor Author

mcanouil commented Mar 29, 2026

There are several failing tests on main related to bunny fonts download because the family assertion is incorrect.

It seems the subset had higher impact that just allowing to select "latin" for some fonts.
I'm looking into it.

@mcanouil mcanouil marked this pull request as draft April 3, 2026 08:25
@JosephBARBIERDARNAL
Copy link
Copy Markdown
Collaborator

Thanks for the PR, this fixes an issue I haven't spotted!

I've opened an issue since I found some more issues related to Bunny fonts that I think are out of scope this PR: #43

Also, the tests are only failing for Bunny fonts, which I find hard to understand why. Any idea why this would be? This doesn't seem to be a big issue since, as I said, I believe the issues related to Bunny fonts are not related to your changes (already existing in main).

@JosephBARBIERDARNAL
Copy link
Copy Markdown
Collaborator

Another thing is that it's hard to test the output font object since .get_style(), .get_weight() etc doesn't represent well what will be displayed to users. This also is out of scope this PR, but if you have thoughts on that I'm interested.

@mcanouil
Copy link
Copy Markdown
Contributor Author

mcanouil commented Apr 3, 2026

Also, the tests are only failing for Bunny fonts, which I find hard to understand why. Any idea why this would be? This doesn't seem to be a big issue since, as I said, I believe the issues related to Bunny fonts are not related to your changes (already existing in main).

It's because setting a subset leads to a different import. In this import the font name include variants (italic/weight).
I'm looking at a way to have full imports but still keep the ability to target one as primary when using bunny.

@mcanouil
Copy link
Copy Markdown
Contributor Author

mcanouil commented Apr 3, 2026

Another thing is that it's hard to test the output font object since .get_style(), .get_weight() etc doesn't represent well what will be displayed to users. This also is out of scope this PR, but if you have thoughts on that I'm interested.

This is part of the bunny fonts issue actually.
The order from the provider is not deterministic which can lead to have the wrong style for the right weight and vice-versa.
From my tests, the weights selection seems to be more reliable, so fixing how style is retrieved from search text (i.e., https://fonts.bunny.net/css?family=Roboto:ital,wght@1,400' this gets messy) is one step in the good direction I believe.

Add functionality to improve subset selection in font URL retrieval by allowing multiple blocks for the same subset. Introduce a new method to filter @font-face blocks based on the requested font style (italic or normal).
@mcanouil
Copy link
Copy Markdown
Contributor Author

mcanouil commented Apr 3, 2026

@JosephBARBIERDARNAL I just pushed 3b61db3 which should help with bunny. I'm not 100% I covered all cases, but most common are.
I've tested it for tables/figures in https://m.canouil.dev/quarto-brand-renderings/brands/brand-1/python.html

@mcanouil mcanouil marked this pull request as ready for review April 3, 2026 09:23
@JosephBARBIERDARNAL
Copy link
Copy Markdown
Collaborator

Great, this looks good to me! Thanks again

@JosephBARBIERDARNAL JosephBARBIERDARNAL merged commit a4c2240 into y-sunflower:main Apr 8, 2026
9 checks passed
@mcanouil
Copy link
Copy Markdown
Contributor Author

mcanouil commented Apr 8, 2026

I'll update my draft blog post on brand+quarto+code and will likely publish it next week.
Thanks for the python package ;)

@mcanouil mcanouil deleted the fix/font-subset-selection branch April 8, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants