Fix language-pack generation in lancheck.php (v2.3.5+ regressions)#5703
Fix language-pack generation in lancheck.php (v2.3.5+ regressions)#5703Kanonimpresor wants to merge 5 commits into
Conversation
Admin > Tools > Language Check > Generate Pack could not produce a usable language pack on installs using the new $config = [...] format or modern language files. This addresses the blocking/critical regressions: - Pre-flight path check (e107inc#7): fall back to $e107_paths when the $LANGUAGES/$THEMES/$HELP_DIRECTORY globals are empty (new config format), instead of aborting with the LANG_LAN_26 preference label. - XML manifest temp path (e107inc#9): write to e_SYSTEM."temp/lancheck/" (created on demand) instead of the removed e_FILE."public/" location, so the <lang>/<lang>.xml manifest is actually shipped in the ZIP. - XML escaping (e107inc#5): htmlspecialchars(..., ENT_QUOTES|ENT_XML1) the name/email/url attributes so a quote in USERNAME/USEREMAIL no longer produces invalid XML that the importer rejects. - Duplicate manifest (e107inc#10): exclude the on-disk <lang>.xml from the file list before adding the freshly-generated manifest, avoiding two entries for the same archive path. - Modern syntax parsing (e107inc#6): fill_phrases_array() now also recognises `const KEY = '...';` and `return [ 'KEY' => '...' ];` in addition to define(), so byte-perfect modern packs are no longer reported as "File missing!" / "n missing phrases". Refs e107inc#5652
- fclose() (e107inc#1): close the actual open handle ($fp) instead of the undefined $writeit variable. - Error message (e107inc#2): show $status['message'] rather than the boolean $status['error'] flag, so the user no longer sees a literal "1". - Locale parsing (e107inc#3): include `const` and `=` in findLocale()'s strip list so modern `const CORE_LC = "es";` declarations resolve to the correct locale instead of an empty string. Refs e107inc#5652
) When a pack fails validation the message was just "Errors found: [x] missing", giving no indication of what to fix. Build an explicit breakdown of missing files, missing/invalid phrases, files with BOM/illegal characters and non-UTF8 phrases, plus a hint pointing to the Verify tab and the E107_DEBUG_LEVEL bypass. Refs e107inc#5652
There was a problem hiding this comment.
Olá de novo, Martin, and thanks for the clean follow-through on #5652.
I traced this against the live code and tested the parser branches locally (no live install, so "traced" not "reproduced"). Tidy scope: one file, +83/-11, the commit split matches what we agreed, and php -l is clean on 8.5.
What checks out:
- Parsers (
bug 6): I ran the three branches against real core LAN files. Legacydefine()still parses exactly (forumEnglish_global.php-> 9, siteinfo -> 11), so the back-compat path survives. The new branches pick up the modern files:lan_user.php-> 107 keys (the commented-outdefine()s inside the/* */block are stripped first),admin/lan_admin.php-> 420, plusconst CORE_LC/CORE_LC2. The$retloc[$type] = array()init is what stops a phrase-less modern file reading asFile missing!. bug 3resolvesconst CORE_LC = "es"->es_ES;bug 4reads the real session keys;bug 9writes undere_SYSTEM."temp/lancheck/"with a recursivemkdir;bug 10filters the on-disk<lang>.xmlbefore re-adding the fresh manifest.
I've left one inline note on the bug 7 path resolution. @Deltik has already flagged the regex (tokenizer) and the attribute building (toAttributes()), and I agree on both. Worth knowing: the toAttributes() switch also absorbs the bug 5 escaping, so you won't need the manual htmlspecialchars(..., ENT_XML1) there once it's in.
One minor, non-blocking thing: the error breakdown and the Tip: line are hardcoded English, in a tool that's all about translation. Shipping no new strings is a fair call for a hotfix, but a follow-up to fold them into LANG_LAN_* would let translators have them too.
| $paths = is_array($e107_paths) ? $e107_paths : array(); | ||
| $handlersDir = $HANDLERS_DIRECTORY ?: ($paths['handlers'] ?? ($paths['HANDLERS_DIRECTORY'] ?? 'e107_handlers/')); | ||
| $languagesDir = $LANGUAGES_DIRECTORY ?: ($paths['languages'] ?? ($paths['LANGUAGES_DIRECTORY'] ?? 'e107_languages/')); | ||
| $themesDir = $THEMES_DIRECTORY ?: ($paths['themes'] ?? ($paths['THEMES_DIRECTORY'] ?? 'e107_themes/')); | ||
| $helpDir = $HELP_DIRECTORY ?: ($paths['help'] ?? ($paths['HELP_DIRECTORY'] ?? 'e107_docs/help/')); | ||
| $pluginsDir = $PLUGINS_DIRECTORY ?: ($paths['plugins'] ?? ($paths['PLUGINS_DIRECTORY'] ?? 'e107_plugins/')); | ||
|
|
||
| if(($HANDLERS_DIRECTORY != "e107_handlers/") || ( $LANGUAGES_DIRECTORY != "e107_languages/") || ($THEMES_DIRECTORY != "e107_themes/") || ($HELP_DIRECTORY != "e107_docs/help/") || ($PLUGINS_DIRECTORY != "e107_plugins/")) | ||
| if(($handlersDir != "e107_handlers/") || ($languagesDir != "e107_languages/") || ($themesDir != "e107_themes/") || ($helpDir != "e107_docs/help/") || ($pluginsDir != "e107_plugins/")) |
There was a problem hiding this comment.
This $e107_paths fallback never actually fires. class2.php unsets $e107_paths immediately after initCore() (class2.php#L239), so on any admin page global $e107_paths is null and $paths becomes array(). Every lookup here then collapses to the hardcoded literal.
On a default install that still lands on the right answer, so the reported bug is fixed. But on the new $config format the non-standard-layout guard just below can no longer fire: the themes/languages/help globals are empty and $e107_paths is gone, so all three resolve to the defaults regardless of what the site actually uses.
getFileList() already does this the canonical way a few lines down, via e107::getFolder() (L712-714). That reads e107_dirs, which is populated on both config formats:
$handlersDir = e107::getFolder('handlers');
$languagesDir = e107::getFolder('languages');
$themesDir = e107::getFolder('themes');
$helpDir = e107::getFolder('help');
$pluginsDir = e107::getFolder('plugins');That drops the dead $e107_paths branch and keeps the guard meaningful on both formats.
- Bug 7: resolve language/theme/help paths via e107::getFolder() instead of the dead $e107_paths fallback (class2.php unsets it after initCore()), so the non-standard-layout guard works on both the flat and $config=[...] formats. - Bug 5: build the XML manifest attributes with e_parse::toAttributes(), which handles escaping internally and drops the manual htmlspecialchars(ENT_XML1) concatenation. - Bug 6: parse locale files with the PHP tokenizer (token_get_all) instead of preg_match_all(); recognises define()/const/return[] plus setlocale(), ignores commented-out code natively, and adds decodeStringToken() for quote/escape handling.
Adds cases for const, return-array (long + short syntax with escapes), commented-out statements that must be ignored, and phrase-less files that must still register their type. Validates the new token_get_all()-based parser requested in the e107inc#5703 review.
| $pluginsDir = $PLUGINS_DIRECTORY ?: ($paths['plugins'] ?? ($paths['PLUGINS_DIRECTORY'] ?? 'e107_plugins/')); | ||
| // Patch (lancheck-config-paths): resolve directory names via e107::getFolder(), | ||
| // which reads e107_dirs and is populated on BOTH the legacy flat config and the | ||
| // new $config = [...] format. The previous $e107_paths fallback was dead code: |
There was a problem hiding this comment.
This comment does not make sense in context, since whoever reads this code won't know what "previous $e107_paths" were. Please remove or trim the comments to the bare necessities.
|
Thanks for the thorough review, @Deltik and @e107help! I've pushed two commits
Expanded I validated the parser branches locally against real core LAN files and the new On the non-blocking note about the hardcoded English error breakdown / Disclosure: drafted with AI assistance (Claude Opus 4.8 by VSCode) (per CONTRIBUTING.md). I read, ran, and |
There was a problem hiding this comment.
Ficou bem mais limpo, Martin. I re-traced both new commits (09cae28, bccf93b) and ran the new tokenizer against the real core LAN files and your fixtures. The headline: this isn't just regression-free, it's more correct than the regex it replaces.
Parser parity, with two latent bugs quietly fixed. I diffed the harvested key set (old regex vs new tokenizer) across 14 core LAN files. 13 come out identical. The one that differs is English.php, and it differs in the right direction: the old regex reported 130 keys, the tokenizer 126. The four it stops emitting are CHARSET, TEXTDIRECTION, CORE_LAN2, CORE_LAN3, every one of them commented out with // (English.php#L18-L22). The old /* */-only pre-strip let those phantom phrases leak into the generated pack; token_get_all() drops them for free. Your ignoresComments fixture pins exactly that.
The second fix is in the values: phrases now decode to their real runtime form. CORE_LAN1 used to come through as raw source with doubled backslashes (...missing.\\n\\n...); decodeStringToken() returns the true \n. The pack was subtly wrong there before too.
The define() counts on the legacy files are unchanged (lan_admin 420, lan_signup 109, lan_user 107) and all five fixtures pass when I exercise the function directly, so the back-compat path is solid.
Two inline notes below, neither a merge blocker: an optional $pure polish on the toAttributes() switch, and a test fixture that opens with return array( and closes with ];. That one is a parse error as real PHP and only goes green because token_get_all() is lenient (no TOKEN_PARSE flag), so it's worth tightening to valid syntax.
Minor heads-up while I had the harness out: the setlocale() capture changed shape as a side effect. it stored the raw argument list before, now it keeps the first decoded locale (LC_ALL => 'en_GB.UTF-8'). Harmless for a language pack, just flagging it in case it surprises you.
@Deltik already caught the leftover $e107_paths comment, so I'll leave that lane to him. Bom trabalho on the turnaround.
| $langAttributes = $tp->toAttributes(array( | ||
| 'name' => $language, | ||
| 'compatibility' => $ver, | ||
| 'date' => date("Y-m-d"), | ||
| )); | ||
| $authorAttributes = $tp->toAttributes(array( | ||
| 'name' => USERNAME, | ||
| 'email' => USEREMAIL, | ||
| 'url' => SITEURL, | ||
| )); |
There was a problem hiding this comment.
This is the right call, and it absorbs the bug 5 escaping cleanly: toAttribute() runs htmlspecialchars(..., ENT_QUOTES), whose output (' and friends) is well-formed XML, so dropping the manual ENT_XML1 pass is correct.
One defensive tweak worth considering: pass $pure = true. By default toAttributes() sends each value through replaceConstants() (e_parse_class.php#L1999-L2002), and these are literal manifest values (SITEURL, USEREMAIL, the language name) that have no business being path-substituted. It happens to be a no-op today (with the empty $mode, replaceConstants() only forward-replaces literal {e_XXX} tokens, none of which appear in a URL or an email), so this is about stating intent rather than fixing a live bug.
Same helper, one behavioural nuance to keep in mind: it skips empty-valued attributes (L2031). So if USEREMAIL is ever blank, the old email="" now drops out of the tag entirely instead of emitting empty. The reader tolerates a missing attribute, so this is a heads-up, not a blocker.
| 'return array('."\n". | ||
| "\t'LAN1' => 'Főadminisztrátor',"."\n". | ||
| "\t'LAN2' => 'Hői',"."\n". | ||
| '];'; |
There was a problem hiding this comment.
This fixture is mismatched: it opens with return array( but closes with ];, which is a parse error as real PHP (Unclosed '(' ... does not match ']'). It only goes green because fill_phrases_array() calls token_get_all() without the TOKEN_PARSE flag, so the tokenizer never checks that the brackets balance. it emits the 'KEY' => 'value' tokens regardless of the closing bracket.
Since this is the case meant to exercise the long array() form (the next test covers short []), close it with ) so the fixture is valid PHP:
$strings =
'return array('."\n".
"\t'LAN1' => 'Főadminisztrátor',"."\n".
"\t'LAN2' => 'Hői',"."\n".
');';Worth a quick guard in general: a fixture that the parser would reject is only proving the tokenizer's leniency, not the syntax it claims to cover.


Closes #5652
What this does
Admin → Tools → Language Check → Generate Packcould not produce a usablelanguage pack on installs that use the new
$config = [...]format or modernlanguage files (
const/return [...]). This PR repairs the generator.Per the discussion in #5652, bug 8 (
ver.php/ online packs) is intentionallyout of scope — it was already fixed server-side by @Deltik (
languagepacks.xmlnow matches by
major.minorwith a latest-tag fallback), so the client-sidefallback is no longer needed.
Commits (semantic, in the order agreed on the issue)
Blockers / critical —
Fix(lancheck): repair language-pack generation blockers on v2.3.5+$e107_pathswhen the$LANGUAGES/$THEMES/$HELP_DIRECTORYglobals are empty under the new$config = [...]format (was aborting with theLANG_LAN_26label).e_SYSTEM."temp/lancheck/"(created on demand) instead of the removed
e_FILE."public/", so the<lang>/<lang>.xmlmanifest actually ships in the ZIP.htmlspecialchars(..., ENT_QUOTES|ENT_XML1)on thename/email/urlattributes — a quote inUSERNAME/USEREMAILno longerproduces invalid XML the importer rejects (and no longer poisons the
on-disk
<lang>.xml).<lang>.xmlfrom the filelist before adding the freshly generated manifest.
fill_phrases_array()now also recognisesconst KEY = '...';andreturn [ 'KEY' => '...' ];in addition todefine(), so modern packs are no longer reported asFile missing!/n missing phrases.Minor correctness —
Fix(lancheck): minor correctness fixes in language-pack flowfclose($fp)instead of the undefined$writeit.$status['message']instead of the boolean$status['error'](user was seeing a literal
1).constand=infindLocale()'s strip list soconst CORE_LC = "es";resolves correctly.UX —
Improve(lancheck): actionable error breakdown for translatorsinstead of a bare
Errors found: [x] missing, plus a hint pointing to theVerify tab and the
E107_DEBUG_LEVELbypass.Validation
php -l e107_admin/lancheck.phpclean on every commit.new
$configformat: each ZIP contains a single, well-formed<lang>/<lang>.xml; the importer accepts them.simplexml_load_string/file_put_contentswarnings gone.define()-only packs and fully modernised(
return [...]) packs.Scope
e107_admin/lancheck.php(+83 / −11).upstream/master.