Step keywords are not matched properly when some keywords are prefixes of each other

### 👓 What did you see?

The behat/gherkin testsuite has a test where we generate a gherkin file using the keywords of a particular language (with hardcoded step texts for simplicity and making debugging easier), parse that generated feature and then compare the AST to the expected one.
When attempting to migrate to gherkin-php internally, this test broke for the `ht` language. This language has its `When` keywords being prefixes of the `Then` keywords: https://github.com/cucumber/gherkin/blob/50cac90a2cfd1bf366c5e7cde0694778b26cd06b/gherkin-languages.json#L1733-L1742

As the dialect merges When before Then to generate the list of step keywords and the matcher tests those tokens in definition order `Le ` is matched instead of `Le sa a `, with the `sa a ` being part of the step text (which will then break finding the step definition in a runner).

The `ht` locale also fails for the `given` keywords being prefixes of each other, however this could be solved by reordering them in the `gherkin-languages.json` file (as done for `fr` for instance). However, `ht` is the only locale having _different_ types of step keywords being prefixes of each other (at least the only one where the current merging order of step keywords makes it be detected).

### ✅ What did you expect to see?

The longest matching keyword should be matched for a step.

When building the full list of step keywords by merging the list of each type, they should be sorted by descending length (not sure whether this should be done in the dialect or in the matcher)

### 📦 Which tool/library version are you using?

gherkin-php (however, my analysis makes me think other languages are also impacted)

### 🔬 How could we reproduce it?

```gherkin
# language: ht
Karakteristik: Internal operations
  In order to stay secret
  As a secret organization
  We need to be able to erase past agents' memory

  Senaryo: Erasing agent memory
    Sipoze ke there is agent J
    Ak there is agent K
    Le I erase agent K's memory
    Le sa a there should be agent J
    Men there should not be agent K
```

```gherkin
Feature: Internal operations
  In order to stay secret
  As a secret organization
  We need to be able to erase past agents' memory

  Scenario: Erasing agent memory
    Given there is agent J
    And there is agent K
    When I erase agent K's memory
    Then there should be agent J
    But there should not be agent K
```

Parsing those 2 files should produce similar AST, differing only by the `language` of the Feature object and keywords of each step. the text of steps (and the step types) should be identical.

### 📚 Any additional context?

_No response_

	"then": [
	"* ",
	"Lè sa a ",
	"Le sa a "
	],
	"when": [
	"* ",
	"Lè ",
	"Le "
	]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Step keywords are not matched properly when some keywords are prefixes of each other #400

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

📚 Any additional context?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Step keywords are not matched properly when some keywords are prefixes of each other #400

Description

👓 What did you see?

✅ What did you expect to see?

📦 Which tool/library version are you using?

🔬 How could we reproduce it?

📚 Any additional context?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions