You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
😎 Nice library.💡 Supporting bitstring.Array("uintle12") for my data array would be useful. 🙏
Version: bitstring==4.3.0 Urgency: non-blocking, as I figured out an inefficient workaround to swap endianness BE<->LE manually, and although it has some limitations and is not fully general, my data happens to align nicely to those constraints.
Problem
I have some 12-bit graphics data stored in little endian layout (along with other occurrences found in the wild like FAT-12 tables stored on floppy disk images) for which I tried this library for yesterday, but alas bitstring.Array appears to only support big endian layout for 12-bit data 🤔, as the raw byte data for "uint12" on my x86 machine from bitstring.Array's tobytes is clearly big endian (like TCP/IP field layout), where the first element's 8 MSB's are stored in byte[0], the 4 LSB's are stored in the high nibble of byte[1], then the second element's 4 MSB's in the low nibble of byte[1], and the 8 LSB's in byte[2]:
While older EGA/VGA-based images used BE layout (where the low-index pixel was in the high-index bits, and the high-index pixel was the low-index bits), newer formats on video game consoles and machine learning tensors follow the convention that higher index elements are stored in higher bits.
This actually applies to any element bit size that isn't a multiple of 8, where BE (TCP/IP) fills from MSB to LSB, and LE (x86) fills bits from LSB to MSB, meaning that even oddities like uintle3 or uintle5 should work consistently too, but for my needs, 2-bit, 4-bit, and 12-bit are most important.
Workarounds
Since endianness can be thought more generically as a mapping between logical bit indices and actual bit indices (not simply "how bytes are arranged in a word"), then it's possible to transform between endianness (LE <-> BE) by reversing the direction of all the elements and reversing the direction of all the bytes. So currently I call swapBE8toLE8 before serializing back out to the file, but it would be nicer to handle this directly in-place with the array via direct "uintle#" support (then no extra memory rewrites/copies, potential forgetfulness as you pass the array around to other parts of the program, or boundary condition issues like when the total bits count is not a multiple of 8).
swapBE8toLE8(outputArray)
...
defswapBE8toLE8(array : bitstring.Array):
if (array.itemsize%8) ==0:
# Faster shortcut for byte-size elements which work directly.# (but the "else" branch below would work too).array.byteswap()
else:
# Slower work-around to swap endianness layout for non-byte multiples.# This still has the constraint that the total bit count must be a multiple of 8# because otherwise the byte reversal fails because of the fractional trailer,# whereas a direct implementation would not have that issue.originalDtype=array.dtypearray.reverse()
array.dtype="uint8"array.reverse()
array.dtype=originalDtype#endif#endifdefswapLE8toBE8(array : bitstring.Array):
if (array.itemsize%8) ==0:
# Faster shortcut for byte-size elements which work directly.# (but the "else" branch below would work too).array.byteswap()
else:
# Slower work-around to swap endianness layout for non-byte multiples.# This still has the constraint that the total bit count must be a multiple of 8# because otherwise the byte reversal fails because of the fractional trailer,# whereas a direct implementation would not have that issue.originalDtype=array.dtypearray.dtype="uint8"array.reverse()
array.dtype=originalDtypearray.reverse()
#endif#endif
Note
These two functions are distinct, and you can't just call swapBE8toLE8 a second time on the same data to reverse it, because permuting LE to either BE8 or BE16 (so-called "middle endian") isn't always the same as unpermuting back to little endianness. They are notably symmetric when the element bit size is a multiple of the minimal address unit size (which is 8 bits on most architectures, or 16 bits on a few oddities like the NUXI PDP-11), and so calling swapBE8toLE8 twice on the same array restores the original data then.
Important
I've often seen this belief that endianness is purely an architectural hardware trait of how bytes are arranged within a given word unit, but this isn't a complete picture. When you think about units that straddle across bytes (and read architectural diagrams for TCP/IP or documents like the GenICam Pixel Format Naming Convention), you realize that endianness indirectly also implies the direction bitfields flow within and across each byte, because it makes the most sense (if you want any reasonable efficiency without a bunch of bit slicing and masking/or'ing, especially when reading larger word units than bytes and progressively shifting bits) for BE architectures to store fields MSB->LSB and LE to store LSB->MSB.
Related
Python indexing vs. bit indexing #156 feels distinct, as this one is about adding a dtype to bitstring.Array, and bitstring.options.lsb0 = True doesn't solve the issue anyway.
TLDR
😎 Nice library.💡 Supporting
bitstring.Array("uintle12")for my data array would be useful. 🙏Version: bitstring==4.3.0
Urgency: non-blocking, as I figured out an inefficient workaround to swap endianness BE<->LE manually, and although it has some limitations and is not fully general, my data happens to align nicely to those constraints.
Problem
I have some 12-bit graphics data stored in little endian layout (along with other occurrences found in the wild like FAT-12 tables stored on floppy disk images) for which I tried this library for yesterday, but alas
bitstring.Arrayappears to only support big endian layout for 12-bit data 🤔, as the raw byte data for"uint12"on my x86 machine frombitstring.Array'stobytesis clearly big endian (like TCP/IP field layout), where the first element's 8 MSB's are stored in byte[0], the 4 LSB's are stored in the high nibble of byte[1], then the second element's 4 MSB's in the low nibble of byte[1], and the 8 LSB's in byte[2]:Desired little-endian element layout 🙂:
Actual element layout of "uint12", which is big-endian 🙃:
Tried
bitstring.Array("uintle12")yieldsValueError: Inappropriate Dtype for Array: 'uintle12'bitstring.Array("<uint12");yieldsValueError: Inappropriate Dtype for Array: '<uint12'.bitstring.options.lsb0 = Truejust seems to reverse the element direction while still keeping the actual layout across bytes in BE.Feature request
Please add dtypes for
uintle12(plusuintbe12for symmetry as an alias of the currentuint12) when/if you have time.Additionally I have image data in LE 2-bits-per-pixel and 4-bits-per-pixel that would be nice to work with, but my 4bpp image array...
...instead looks like:
Supporting
"uintle4"and"uintle2"would remedy that:Mathematically for 2bpp:
Note
While older EGA/VGA-based images used BE layout (where the low-index pixel was in the high-index bits, and the high-index pixel was the low-index bits), newer formats on video game consoles and machine learning tensors follow the convention that higher index elements are stored in higher bits.
This actually applies to any element bit size that isn't a multiple of 8, where BE (TCP/IP) fills from MSB to LSB, and LE (x86) fills bits from LSB to MSB, meaning that even oddities like
uintle3oruintle5should work consistently too, but for my needs, 2-bit, 4-bit, and 12-bit are most important.Workarounds
Since endianness can be thought more generically as a mapping between logical bit indices and actual bit indices (not simply "how bytes are arranged in a word"), then it's possible to transform between endianness (LE <-> BE) by reversing the direction of all the elements and reversing the direction of all the bytes. So currently I call
swapBE8toLE8before serializing back out to the file, but it would be nicer to handle this directly in-place with the array via direct"uintle#"support (then no extra memory rewrites/copies, potential forgetfulness as you pass the array around to other parts of the program, or boundary condition issues like when the total bits count is not a multiple of 8).Note
These two functions are distinct, and you can't just call
swapBE8toLE8a second time on the same data to reverse it, because permuting LE to either BE8 or BE16 (so-called "middle endian") isn't always the same as unpermuting back to little endianness. They are notably symmetric when the element bit size is a multiple of the minimal address unit size (which is 8 bits on most architectures, or 16 bits on a few oddities like the NUXI PDP-11), and so callingswapBE8toLE8twice on the same array restores the original data then.Important
I've often seen this belief that endianness is purely an architectural hardware trait of how bytes are arranged within a given word unit, but this isn't a complete picture. When you think about units that straddle across bytes (and read architectural diagrams for TCP/IP or documents like the GenICam Pixel Format Naming Convention), you realize that endianness indirectly also implies the direction bitfields flow within and across each byte, because it makes the most sense (if you want any reasonable efficiency without a bunch of bit slicing and masking/or'ing, especially when reading larger word units than bytes and progressively shifting bits) for BE architectures to store fields MSB->LSB and LE to store LSB->MSB.
Related
bitstring.Array, andbitstring.options.lsb0 = Truedoesn't solve the issue anyway.BitStringaddingintleand is marked completed, but this is aboutbitstring.Array.🫡 Thanks from Redmond Washington.