Conversation
| <pre class=idl> | ||
| [Exposed=Window] | ||
| interface ProcessingInstruction : CharacterData { | ||
| constructor(DOMString target, optional DOMString data = ""); |
There was a problem hiding this comment.
I initially thought this would be a bit ugly compared to using IDL records or some such so you can set the attributes directly, but I think overall it's probably okay if we have to keep around the dual-data-structure anyway.
dom.bs
Outdated
| </div> | ||
|
|
||
| <div algorithm> | ||
| <p>To <dfn>update attributes from data</dfn> for a {{ProcessingInstruction}} <a for=/>node</a> <var>pi</var>: |
There was a problem hiding this comment.
Are we sure this parser is compatible with how xsl-stylesheet is parsed today?
There was a problem hiding this comment.
Yes, it was based on how this works in Chromium:
In WebKit it is similar:
https://github.com/WebKit/WebKit/blob/9af8ffff919a1a1478b8a43cef821adb49a4e6a1/Source/WebCore/dom/ProcessingInstruction.cpp#L99
https://github.com/WebKit/WebKit/blob/26d7991c939a41bbab5d01263900598b67220208/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp#L1556-L1574
In Gecko it's a little bit different, every attribute lookup uses an attribute parser and I can't tell if it exactly matches XML or not:
https://github.com/mozilla-firefox/firefox/blob/22d04b52b0eb8d9fa11bf8ede5ccc0243a07c5ba/dom/xml/XMLStylesheetProcessingInstruction.cpp#L92
https://github.com/mozilla-firefox/firefox/blob/afe51e48c9deecccbaf64a74d6f207456433df98/dom/xml/ProcessingInstruction.cpp#L77
https://github.com/mozilla-firefox/firefox/blob/22d04b52b0eb8d9fa11bf8ede5ccc0243a07c5ba/dom/base/nsContentUtils.cpp#L1842-L1943
The difference from what's already implemented is that we'd use the HTML parser in HTML documents, instead of unconditionally using the XML parser.
For compat, this difference means that existing content that creates a <?xml-stylesheet href="..."?> PI and moves it to an HTML document would take the HTML parser path instead of XML. For that to make a difference, I think that content would need to be invalid XML attribute syntax (like <?xml-stylesheet href='...'?>) and depends on it being broken (doing nothing) when being moved to HTML.
There was a problem hiding this comment.
The main difference is what happens if pi.data is set after the PI is moved to an HTML document (or constructed with createProcessingInstruction). In the new scenario invalid XML yet valid HTML can be used in that data for the xml-stylesheet attributes.
|
The approach here is to parse attributes from the PI data, and to keep data and attributes in sync whenever one of them changes. Alternatives considered: Attribute mode: Treat Just Attributes for HTML, |
Add setAttribute/getAttribute/hasAttribute/removeAttribute etc. to ProcessingInstruction. They act similarly to attributes on elements, however they don't produce attribute mutation records. See spec PR: whatwg/dom#1454 Bug: 431374376 Change-Id: I659e772f0fb11a6ffe2fe3876cf61ee0f6ad042a
| <li><p>Append U+0022 (") to <var>data</var>. | ||
|
|
||
| <li><p>Append the result of | ||
| <a href="https://html.spec.whatwg.org/#escapingString">escaping a string</a> given |
There was a problem hiding this comment.
This is because https://html.spec.whatwg.org/#escapingString isn't exported and obviously this needs to be fixed. I'm just not sure yet if the escaping should be the same for HTML and XML, or if it should be split like parsing.
Add setAttribute/getAttribute/hasAttribute/removeAttribute etc. to ProcessingInstruction. They act similarly to attributes on elements, however they don't produce attribute mutation records. See spec PR: whatwg/dom#1454 Bug: 431374376 Change-Id: I659e772f0fb11a6ffe2fe3876cf61ee0f6ad042a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7607900 Reviewed-by: Philip Jägenstedt <foolip@chromium.org> Commit-Queue: Noam Rosenthal <nrosenthal@google.com> Cr-Commit-Position: refs/heads/main@{#1594810}
Add setAttribute/getAttribute/hasAttribute/removeAttribute etc. to ProcessingInstruction. They act similarly to attributes on elements, however they don't produce attribute mutation records. See spec PR: whatwg/dom#1454 Bug: 431374376 Change-Id: I659e772f0fb11a6ffe2fe3876cf61ee0f6ad042a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7607900 Reviewed-by: Philip Jägenstedt <foolip@chromium.org> Commit-Queue: Noam Rosenthal <nrosenthal@google.com> Cr-Commit-Position: refs/heads/main@{#1594810}
Add setAttribute/getAttribute/hasAttribute/removeAttribute etc. to ProcessingInstruction. They act similarly to attributes on elements, however they don't produce attribute mutation records. See spec PR: whatwg/dom#1454 Bug: 431374376 Change-Id: I659e772f0fb11a6ffe2fe3876cf61ee0f6ad042a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7607900 Reviewed-by: Philip Jägenstedt <foolip@chromium.org> Commit-Queue: Noam Rosenthal <nrosenthal@google.com> Cr-Commit-Position: refs/heads/main@{#1594810}
dom.bs
Outdated
| "{{InvalidCharacterError!!exception}}" {{DOMException}}. <!-- Gecko does this. --> | ||
|
|
||
| <li>Return a new {{ProcessingInstruction}} | ||
| <li>Let <var>pi</var> be a new {{ProcessingInstruction}} |
There was a problem hiding this comment.
Let's fix this algorithm to use <li><p> as well.
There was a problem hiding this comment.
But even better. We should make this algorithm invoke "initialize a ProcessingInstruction node" that essentially does what this algorithm does except for allocating a new node. Then we can reuse it for the constructor and keep them synchronized indefinitely.
There was a problem hiding this comment.
Done in the style of event's "initialize" which required disambiguation invocations of that.
Not setting target and data on creation does leave the two undefined until "initialize" is called since https://dom.spec.whatwg.org/#concept-pi-target and https://dom.spec.whatwg.org/#concept-cd-data don't have defaults. Saying that they default to the empty string would be weird in the case of target since that cannot be the empty string after creation+initialization.
If you think this is ugly we could instead make it a "validate target and data" algorithm that only has the exception-throwing bits.
dom.bs
Outdated
| "<code>></attrs></code>". | ||
|
|
||
| <li><p>Let <var>fragment</var> be the result of invoking the | ||
| <a>fragment parsing algorithm steps</a> with <var>context</var> and <var>markup</var>. |
There was a problem hiding this comment.
I don't think it's great that the parsing rules depend on the context document. I also don't see how this is handling errors in the XML case.
There was a problem hiding this comment.
XML errors would result in a "SyntaxError" exception per https://html.spec.whatwg.org/#xml-fragment-parsing-algorithm, and thus leave the PI without any attributes.
If you don't want this to depend on the document, we could use the HTML parser unconditionally, but that would mean that <?xml-stylesheet href=style.css?> starts working in XML, and might force non-browsers that process XML documents with such PIs to parse those attribute as HTML.
I think using a different parser depending on the document is the most consistent with existing APIs like innerHTML, and avoids affecting existing content as much as possible.
There was a problem hiding this comment.
But this is not a context where we can rethrow the exception in so we'd have to handle it. But I don't think we need to use the XML parser.
There was a problem hiding this comment.
Oh, I see what you're saying.
If we don't use the XML parser, do you mean always use the HTML parser, or something else?
There was a problem hiding this comment.
I was thinking that, but it always lowercases attributes so I don’t think we can without an additional flag or some such. Not sure what would be best.
There was a problem hiding this comment.
I guess the option that would be most consistent with elements is that we don't have a constructor at all and switch the parser based on the node document.
If we do want a parser (seems nice) then we could also consider an "XML parser-created" flag that we set and use the XML parser only in that case.
There was a problem hiding this comment.
I think it's similar to document.createElement() case-folding in HTML but not in XML. Whether to support a constructor or not impacts DX a bit, but imo doesn't need to affect the logic for HTML vs XML parser.
Using XML parser in XML docs and HTML parser in HTML docs seems reasonable to me, if we want to keep xml-stylesheet behavior as-is. I assume some folks would be upset if browsers start to use an HTML parser to parse the xml-stylesheet pseudo-attributes in XML documents.
There was a problem hiding this comment.
Well we are extending what processing instructions can do so I think we should do that consistently across the syntaxes. Having subtle differences depending on how you got your processing instruction or what document it is currently in is not great, even if there is some precedent with other nodes.
I don't think people will be upset that browsers support some new ways to spell xml-stylesheet pseudo-attributes. XSLT removal is the significant change there.
And document.createElement() is rather terrible for its lowercasing as well by the way. That behavior has costed us all a lot of time trying to get it right and nail it down.
There was a problem hiding this comment.
Hmmmm. Yeah I guess it would work to always use the HTML parser, even for xml-stylesheet in XML documents. There can still be a document conformance rule that xml-stylesheet PIs in XML documents conform to the XML syntax.
Without XSLT support, xml-stylesheet will only be useful for CSS, which is probably less interesting for non-browser XML tools.
There was a problem hiding this comment.
If we only use the HTML parser, we can consider starting the parser in the right state and stopping parsing when it would switch to the data state.
Add setAttribute/getAttribute/hasAttribute/removeAttribute etc. to ProcessingInstruction. They act similarly to attributes on elements, however they don't produce attribute mutation records. See spec PR: whatwg/dom#1454 Bug: 431374376 Change-Id: I659e772f0fb11a6ffe2fe3876cf61ee0f6ad042a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7607900 Reviewed-by: Philip Jägenstedt <foolip@chromium.org> Commit-Queue: Noam Rosenthal <nrosenthal@google.com> Cr-Commit-Position: refs/heads/main@{#1594810}
dom.bs
Outdated
|
|
||
| <li><p><a for=list>For each</a> <var>attribute</var> of <var>element</var>'s | ||
| <a for=Element>attribute list</a>, set <var>pi</var>'s | ||
| <a for=ProcessingInstruction>attribute map</a>[<var>attribute</var>'s <a for=Attr>local name</a>] |
There was a problem hiding this comment.
Using the local name here means ignoring the namespace, but with input like <?target attr="value" xmlns:bla="..." bla:attr="blavalue"?> in XML, the second attribute would clobber the first and the result would be a single attribute with value "blavalue". That seems worse than for example erroring if the namespace is not null.
This seems like something @zcorpan might have opinions on :)
There was a problem hiding this comment.
What happens for these cases today with xml-stylesheet? We also have to think about what happens here when you serialize and parse again. This strikes me as another reason not to rely on the XML parser.
There was a problem hiding this comment.
https://software.hixie.ch/utilities/js/live-dom-viewer/saved/14573
I'd expect the qualified name, so that bla:href is not treated as the href pseudo-attribute.
element.getAttribute() also uses qualified name.
There was a problem hiding this comment.
I don't actually understand how this works with the XML parser. If you have x:href and href as pseudo-attributes in whatever order and no xmlns attribute, x:href should be an irrecoverable error, no?
There was a problem hiding this comment.
Just x:href would be an error, yes, but if xmlns:x="..." is also used, it would parse.
@zcorpan thanks for the test case, that shows that the line I've commented on does need to change. We could use the qualified name, but we could just as well skip any attributes that aren't in the null namespace, possibly also making use of namespaces here non-conforming.
There was a problem hiding this comment.
Huh, https://software.hixie.ch/utilities/js/live-dom-viewer/saved/14573 uses single quotes, which made me wonder. I put <?xml-stylesheet href='style.css'?> into an XHTML document and that already works in Chrome, Firefox and Safari.
Not sure why that works.
There was a problem hiding this comment.
But <?xml-stylesheet x:href="" href="..."?> "works" fine today in browsers.
There was a problem hiding this comment.
Ignore the "huh" about single quotes, that's valid XML syntax and I just never knew...
But
<?xml-stylesheet x:href="" href="..."?>"works" fine today in browsers.
Oh, I didn't realize that worked. It looks like the reason in Chromium is that this code path uses a SAX parser with a "start element" handler that just ignores namespaces. That's not possible to specify using the XML fragment parser like I attempted here.
One option is to measure how common this is and then change implementations to be stricter about this.
But this could all be moot if we don't use the XML parser at all, so let me try to summarize that discussion first.
|
We need to decide which parser is used to parse attributes from
Option 2 seems like the most plausible to me right now, if we decide to make @annevk WDYT? |
|
I'd prefer 4-5 over 1-2 personally. And between 3-5 it's between 3 and 5. It will have to be somewhat custom I think whatever we pick because of casing and I don't think the differences of 1 have served us well for other features. I don't see a real need to carry them over. 2 seems very weird. I don't think we should care about the exact requirements of https://www.w3.org/TR/xml-stylesheet/ but it does expect |
|
Thanks @annevk! Always using the HTML parser is straightforward in spec and implementation, the only thing that's not ideal about is that the more lenient parsing of With a bespoke parser, that's mostly straightforward with "collect a sequence of code points", but is there a way to decode character entities in attribute values that doesn't duplicate all of the tokenizer states character entities? Or could we do something simpler that ends up diverging slightly from how regular attribute values are parsed? |
|
So the concrete worry is that people creating dedicated XML content today would start relying on more leniency in browsers and this would have downstream repercussions? I think that's an acceptable risk. We have evolved XML (with If we can't reuse the HTML parser directly or want to create our own, following what https://w3c.github.io/webvtt/#webvtt-cue-text-tokenizer did seems somewhat reasonable, although it would be better with a proper hook in HTML. |
|
I've changed this to always use the HTML parser now. The constraints of wanting to support On lowercasing, I think this behavior might still be surprising: pi = new ProcessingInstruction('target', 'FOO=v1');
pi.getAttribute('foo') == 'v1'; // normal in HTML, no surprise
pi.setAttribute('FOO', 'v2');
pi.getAttribute('foo') == 'v1' && pi.getAttribute('FOO') == 'v2'; // unlike HTML, but OK
pi.data += ' ';
pi.getAttribute('foo') == 'v1' && !pi.hasAttribute('FOO'); // collapsed after roundtrip, surprise?We could (1) live with it (2) lowercase in all APIs or (3) use a bespoke parser that preserves attribute case. Any of these would be OK for me. If we stick with the HMTL parser, there's also a layering question remaining. Both "Parse HTML from a string" and "escaping a string" would need to be exported in HTML for this to work as-is, but I think it would be cleaner to add and export "parse attributes" and "serialize attributes" to HTML for this purpose. |
Always lowercasing in the PI attribute API makes sense to me. |
|
Either everything needs to canonicalize to lowercase or we pass a flag to the HTML parser that tells it to preserve case (and don't do anything with casing in the API of course). It feels odd for it to lowercase, but I could live with it and it would certainly be easier. Whether it's surprising probably depends on the kind of web developer. |
|
I agree that the sensible options are to either lowercase everywhere or preserve case everywhere. I've gone with the former for now, but could live with the parser change too. |
…tructions, a=testonly Automatic update from web-platform-tests Support attribute API for processing instructions Add setAttribute/getAttribute/hasAttribute/removeAttribute etc. to ProcessingInstruction. They act similarly to attributes on elements, however they don't produce attribute mutation records. See spec PR: whatwg/dom#1454 Bug: 431374376 Change-Id: I659e772f0fb11a6ffe2fe3876cf61ee0f6ad042a Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/7607900 Reviewed-by: Philip Jägenstedt <foolip@chromium.org> Commit-Queue: Noam Rosenthal <nrosenthal@google.com> Cr-Commit-Position: refs/heads/main@{#1594810} -- wpt-commits: 57fc0105def90d26b060cbf58232ddd09fe622c7 wpt-pr: 58294
that's what any other attribute on HTML space does though, right? document.body.setAttribute('WUT', 123);
document.body.getAttribute('wut'); // "123"edit I am not saying for developers it's OK to abuse lower-casing, I am just saying for "some" developer that breaks expectations |
|
I'm OK with using the HTML parser and lowercasing in the API. |
(See WHATWG Working Mode: Changes for more details.)