Skip to content

Step keywords are not matched properly when some keywords are prefixes of each other #400

@stof

Description

@stof

👓 What did you see?

The behat/gherkin testsuite has a test where we generate a gherkin file using the keywords of a particular language (with hardcoded step texts for simplicity and making debugging easier), parse that generated feature and then compare the AST to the expected one.
When attempting to migrate to gherkin-php internally, this test broke for the ht language. This language has its When keywords being prefixes of the Then keywords:

gherkin/gherkin-languages.json

Lines 1733 to 1742 in 50cac90

"then": [
"* ",
"Lè sa a ",
"Le sa a "
],
"when": [
"* ",
"",
"Le "
]

As the dialect merges When before Then to generate the list of step keywords and the matcher tests those tokens in definition order Le is matched instead of Le sa a , with the sa a being part of the step text (which will then break finding the step definition in a runner).

The ht locale also fails for the given keywords being prefixes of each other, however this could be solved by reordering them in the gherkin-languages.json file (as done for fr for instance). However, ht is the only locale having different types of step keywords being prefixes of each other (at least the only one where the current merging order of step keywords makes it be detected).

✅ What did you expect to see?

The longest matching keyword should be matched for a step.

When building the full list of step keywords by merging the list of each type, they should be sorted by descending length (not sure whether this should be done in the dialect or in the matcher)

📦 Which tool/library version are you using?

gherkin-php (however, my analysis makes me think other languages are also impacted)

🔬 How could we reproduce it?

# language: ht
Karakteristik: Internal operations
  In order to stay secret
  As a secret organization
  We need to be able to erase past agents' memory

  Senaryo: Erasing agent memory
    Sipoze ke there is agent J
    Ak there is agent K
    Le I erase agent K's memory
    Le sa a there should be agent J
    Men there should not be agent K
Feature: Internal operations
  In order to stay secret
  As a secret organization
  We need to be able to erase past agents' memory

  Scenario: Erasing agent memory
    Given there is agent J
    And there is agent K
    When I erase agent K's memory
    Then there should be agent J
    But there should not be agent K

Parsing those 2 files should produce similar AST, differing only by the language of the Feature object and keywords of each step. the text of steps (and the step types) should be identical.

📚 Any additional context?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions