👓 What did you see?
The behat/gherkin testsuite has a test where we generate a gherkin file using the keywords of a particular language (with hardcoded step texts for simplicity and making debugging easier), parse that generated feature and then compare the AST to the expected one.
When attempting to migrate to gherkin-php internally, this test broke for the ht language. This language has its When keywords being prefixes of the Then keywords:
|
"then": [ |
|
"* ", |
|
"Lè sa a ", |
|
"Le sa a " |
|
], |
|
"when": [ |
|
"* ", |
|
"Lè ", |
|
"Le " |
|
] |
As the dialect merges When before Then to generate the list of step keywords and the matcher tests those tokens in definition order Le is matched instead of Le sa a , with the sa a being part of the step text (which will then break finding the step definition in a runner).
The ht locale also fails for the given keywords being prefixes of each other, however this could be solved by reordering them in the gherkin-languages.json file (as done for fr for instance). However, ht is the only locale having different types of step keywords being prefixes of each other (at least the only one where the current merging order of step keywords makes it be detected).
✅ What did you expect to see?
The longest matching keyword should be matched for a step.
When building the full list of step keywords by merging the list of each type, they should be sorted by descending length (not sure whether this should be done in the dialect or in the matcher)
📦 Which tool/library version are you using?
gherkin-php (however, my analysis makes me think other languages are also impacted)
🔬 How could we reproduce it?
# language: ht
Karakteristik: Internal operations
In order to stay secret
As a secret organization
We need to be able to erase past agents' memory
Senaryo: Erasing agent memory
Sipoze ke there is agent J
Ak there is agent K
Le I erase agent K's memory
Le sa a there should be agent J
Men there should not be agent K
Feature: Internal operations
In order to stay secret
As a secret organization
We need to be able to erase past agents' memory
Scenario: Erasing agent memory
Given there is agent J
And there is agent K
When I erase agent K's memory
Then there should be agent J
But there should not be agent K
Parsing those 2 files should produce similar AST, differing only by the language of the Feature object and keywords of each step. the text of steps (and the step types) should be identical.
📚 Any additional context?
No response
👓 What did you see?
The behat/gherkin testsuite has a test where we generate a gherkin file using the keywords of a particular language (with hardcoded step texts for simplicity and making debugging easier), parse that generated feature and then compare the AST to the expected one.
When attempting to migrate to gherkin-php internally, this test broke for the
htlanguage. This language has itsWhenkeywords being prefixes of theThenkeywords:gherkin/gherkin-languages.json
Lines 1733 to 1742 in 50cac90
As the dialect merges When before Then to generate the list of step keywords and the matcher tests those tokens in definition order
Leis matched instead ofLe sa a, with thesa abeing part of the step text (which will then break finding the step definition in a runner).The
htlocale also fails for thegivenkeywords being prefixes of each other, however this could be solved by reordering them in thegherkin-languages.jsonfile (as done forfrfor instance). However,htis the only locale having different types of step keywords being prefixes of each other (at least the only one where the current merging order of step keywords makes it be detected).✅ What did you expect to see?
The longest matching keyword should be matched for a step.
When building the full list of step keywords by merging the list of each type, they should be sorted by descending length (not sure whether this should be done in the dialect or in the matcher)
📦 Which tool/library version are you using?
gherkin-php (however, my analysis makes me think other languages are also impacted)
🔬 How could we reproduce it?
Parsing those 2 files should produce similar AST, differing only by the
languageof the Feature object and keywords of each step. the text of steps (and the step types) should be identical.📚 Any additional context?
No response