As part of #1290, @GreyCat wrote the following in the newly added Encoding name (encoding) section of the style guide - ksy_style_guide.adoc:291-292:
UTF-16 and UTF-32 (without an explicit byte order suffix) are deliberately not accepted.
Technically, they are accepted by the compiler - only a warning is issued (which BTW isn't even visible in the Web IDE, because the JS build of the compiler only throws the first error as an exception and ignores warnings: kaitai-io/kaitai_struct_compiler@8913518):
utf16.ksy
meta:
id: utf16
seq:
- id: foo
type: strz
encoding: UTF-16
$ kaitai-struct-compiler -t python utf16.ksy
utf16.ksy: /seq/0/encoding:
warning: unrecognized encoding name 'UTF-16'
Perhaps UTF-16 and UTF-32 shouldn't be just "unrecognized", but the compiler should know about them and explicitly ban them? This was suggested in the past, see #391 (comment):
And, while we're there, some encoding names should be definitely banned, for example, utf16 and ucs2 (as it lacks information on endianness and relies of current platform's native endianness) and I'm somewhat reluctant about ucs2le and ucs2be (as it's kind of hard to find true UCS2 encoding parser, not UTF16 one).
In #393, they would fall into the "black list" (note: I'm not entirely sure why this box was checked, because we don't have any explicit list of prohibited encodings):
Personally, I wouldn't block all unknown encodings by default (because that could screw someone over if they're using a legitimate encoding that isn't on our list), but if we recognize that it's specifically an unwanted encoding like UTF-16 (or its alias like utf16), then I think it's fine to throw an error. We should probably have an error message specifically for UTF-16 and UTF-32 that includes a clear explanation for the ban, because people who try to use these names most likely have no idea that they are ambiguous.
As I mentioned in #1290 (comment), this will affect some users if they want these specs to keep working after we implement this: Code search results: -org:kaitai-io path:*.ksy /encoding: utf-16$/
As part of #1290, @GreyCat wrote the following in the newly added Encoding name (
encoding) section of the style guide -ksy_style_guide.adoc:291-292:Technically, they are accepted by the compiler - only a warning is issued (which BTW isn't even visible in the Web IDE, because the JS build of the compiler only throws the first error as an exception and ignores warnings: kaitai-io/kaitai_struct_compiler@8913518):
utf16.ksyPerhaps
UTF-16andUTF-32shouldn't be just "unrecognized", but the compiler should know about them and explicitly ban them? This was suggested in the past, see #391 (comment):In #393, they would fall into the "black list" (note: I'm not entirely sure why this box was checked, because we don't have any explicit list of prohibited encodings):
Personally, I wouldn't block all unknown encodings by default (because that could screw someone over if they're using a legitimate encoding that isn't on our list), but if we recognize that it's specifically an unwanted encoding like UTF-16 (or its alias like
utf16), then I think it's fine to throw an error. We should probably have an error message specifically for UTF-16 and UTF-32 that includes a clear explanation for the ban, because people who try to use these names most likely have no idea that they are ambiguous.As I mentioned in #1290 (comment), this will affect some users if they want these specs to keep working after we implement this: Code search results:
-org:kaitai-io path:*.ksy /encoding: utf-16$/