Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/evented-tokenizer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ export default class EventedTokenizer {
this.transitionTo(TokenizerState.markupDeclarationOpen);
} else if (char === '/') {
this.transitionTo(TokenizerState.endTagOpen);
} else if (char === '@' || char === ':' || isAlpha(char)) {
} else if (!isSpace(char)) {
this.transitionTo(TokenizerState.tagName);
this.tagNameBuffer = '';
this.delegate.beginStartTag();
Expand Down Expand Up @@ -671,7 +671,7 @@ export default class EventedTokenizer {
endTagOpen() {
let char = this.consume();

if (char === '@' || char === ':' || isAlpha(char)) {
if (!isSpace(char)) {
this.transitionTo(TokenizerState.endTagName);
this.tagNameBuffer = '';
this.delegate.beginEndTag();
Expand Down
52 changes: 52 additions & 0 deletions tests/tokenizer-tests.ts
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,32 @@ QUnit.test('A simple tag', function(assert) {
assert.deepEqual(tokens, [startTag('div')]);
});

QUnit.test('A simple tag with leading non alpha chars', function(assert) {
let tokens = tokenize('<_div>');
assert.deepEqual(tokens, [startTag('_div')]);

tokens = tokenize('<$div>');
assert.deepEqual(tokens, [startTag('$div')]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add emoji case!

<😀> Smile! </😀>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was tempted to add it :)
but
Screenshot 2025-06-03 at 22 06 45

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could still add it:

  1. Assert that it throw error if emoji is used in the beginning of tag name
  2. Assert ok if it's in the middle
image

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this distinction is important for parsing Javascript identifiers, the leading character is more restricted than the rest of them.

The code here should only deal with things that are valid leading characters, because it's controlling when we enter the tag states. We leave those states based only on space and />, so it's necessarily quite tolerant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at that point where I introduced the change. It already is in the state of tagOpen. because it detected the <.
after that it tries to find the beginning of the tag name.
if we just skip characters here, it will just hide the issue.
I think the tokenizer should just allow anything at this stage and let other tools decide if its a valid name.

given that a Javascript variable can not be created with an invalid name, it will be impossible to reference on from a tagname.

so i mean it would be possible to write <😀div> and it would not matter because we cannot write that name in JS.
either way. currently the tokenizer would just skip the 😀 and return div as tag name

Copy link
Copy Markdown
Contributor Author

@patricklx patricklx Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that verification can be done later? Are there some checks in glimmer-vm for that?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image Seems very strange

Copy link
Copy Markdown
Contributor Author

@patricklx patricklx Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that with the current fix? Before it would just skip any non alpha characters .
In that case the whole tag name and then see the class=... as tag name

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, it's behaviour from master

Copy link
Copy Markdown
Contributor Author

@patricklx patricklx Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should also make sure that this also works with any char in the middle


tokens = tokenize('<:div>');
assert.deepEqual(tokens, [startTag(':div')]);

tokens = tokenize('<@div>');
assert.deepEqual(tokens, [startTag('@div')]);

tokens = tokenize('<üdiv>');
assert.deepEqual(tokens, [startTag('üdiv')]);

tokens = tokenize('<€div>');
assert.deepEqual(tokens, [startTag('€div')]);

tokens = tokenize('<div😀>');
assert.deepEqual(tokens, [startTag('div😀')]);

tokens = tokenize('<di😀v>');
assert.deepEqual(tokens, [startTag('di😀v')]);
});

QUnit.test('A simple tag with trailing spaces', function(assert) {
let tokens = tokenize('<div \t\n>');
assert.deepEqual(tokens, [startTag('div')]);
Expand All @@ -179,6 +205,32 @@ QUnit.test('A simple closing tag', function(assert) {
assert.deepEqual(tokens, [endTag('div')]);
});

QUnit.test('A simple closing tag with leading non alpha chars', function(assert) {
let tokens = tokenize('</_div>');
assert.deepEqual(tokens, [endTag('_div')]);

tokens = tokenize('</$div>');
assert.deepEqual(tokens, [endTag('$div')]);

tokens = tokenize('</:div>');
assert.deepEqual(tokens, [endTag(':div')]);

tokens = tokenize('</@div>');
assert.deepEqual(tokens, [endTag('@div')]);

tokens = tokenize('</üdiv>');
assert.deepEqual(tokens, [endTag('üdiv')]);

tokens = tokenize('</€div>');
assert.deepEqual(tokens, [endTag('€div')]);

tokens = tokenize('</di😀v>');
assert.deepEqual(tokens, [endTag('di😀v')]);

tokens = tokenize('</div😀>');
assert.deepEqual(tokens, [endTag('div😀')]);
});

QUnit.test('A simple closing tag with trailing spaces', function(assert) {
let tokens = tokenize('</div \t\n>');
assert.deepEqual(tokens, [endTag('div')]);
Expand Down
Loading