bitstring.Array('uintle12')?

## TLDR

😎 Nice library.💡 Supporting `bitstring.Array("uintle12")` for my data array would be useful. 🙏

**Version**: bitstring==4.3.0
**Urgency**: non-blocking, as I figured out an inefficient workaround to swap endianness BE<->LE manually, and although it has some limitations and is not fully general, my data happens to align nicely to those constraints.

## Problem

I have some 12-bit graphics data stored in little endian layout (along with other occurrences found in the wild like FAT-12 tables stored on floppy disk images) for which I tried this library for yesterday, but alas `bitstring.Array` appears to only support *big endian* layout for 12-bit data 🤔, as the raw byte data for `"uint12"` on my x86 machine from `bitstring.Array`'s `tobytes` is clearly big endian (like TCP/IP field layout), where the first element's 8 MSB's are stored in byte[0], the 4 LSB's are stored in the high nibble of byte[1], then the second element's 4 MSB's in the low nibble of byte[1], and the 8 LSB's in byte[2]:

Desired little-endian element layout 🙂:
```
Absolute bit index:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ...
Dword:               [---------------------------------------------00----------------------------------------------] ...
Byte:                [---------00----------] [---------01----------] [---------02----------] [---------03----------] ...
Bit in byte:         00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 ...

Element index:       [---------------00----------------] [---------------01----------------] [---------------02----- ...
Bit of element:      00 01 02 03 04 05 06 07 08 09 10 11 00 01 02 03 04 05 06 07 08 09 10 11 00 01 02 03 04 05 06 07 ...
```

Actual element layout of "uint12", which is big-endian 🙃:
```
Absolute bit index:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ...
Dword:               [---------------------------------------------00----------------------------------------------] ...
Byte:                [---------00----------] [---------01----------] [---------02----------] [---------03----------] ...
Bit in byte:         00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 ...

Element index:       (---------00----------] (---01----] [---00----) [---------01----------) (---------02----------] ...
Bit of element:      04 05 06 07 08 09 10 11 08 09 10 11 00 01 02 03 00 01 02 03 04 05 06 07 04 05 06 07 08 09 10 11 ...
```

## Tried

- `bitstring.Array("uintle12")` yields `ValueError: Inappropriate Dtype for Array: 'uintle12'`
- `bitstring.Array("<uint12");` yields `ValueError: Inappropriate Dtype for Array: '<uint12'.`
- `bitstring.options.lsb0 = True` just seems to reverse the *element direction* while still keeping the actual *layout* across bytes in BE.

## Feature request

Please add dtypes for `uintle12` (plus `uintbe12` for symmetry as an alias of the current `uint12`) when/if you have time.

Additionally I have image data in LE 2-bits-per-pixel and 4-bits-per-pixel that would be nice to work with, but my 4bpp image array...

![Image](https://github.com/user-attachments/assets/fc7e9705-dd6a-4453-92da-9ccee01f000b)

...instead looks like:

![Image](https://github.com/user-attachments/assets/cbcdbbee-5214-4d08-acfb-b47cdf33de44)

Supporting `"uintle4"` and `"uintle2"` would remedy that:

![Image](https://github.com/user-attachments/assets/0069e0fe-3ad2-4360-a911-78d780a59c8f) -> ![Image](https://github.com/user-attachments/assets/4db8012e-933a-42f4-bcec-1f9b95368aa3)

Mathematically for 2bpp:
```
pixel[0] = (byte[0] >> 0) & 0x03 // bits 0..2
pixel[1] = (byte[0] >> 2) & 0x03 // bits 2..4
pixel[2] = (byte[0] >> 4) & 0x03 // bits 4..6
pixel[3] = (byte[0] >> 6) & 0x03 // bits 6..8
pixel[4] = (byte[1] >> 0) & 0x03 // bits 8..10
pixel[5] = (byte[1] >> 2) & 0x03 // bits 10..12
pixel[6] = (byte[1] >> 4) & 0x03 // bits 12..14
pixel[7] = (byte[1] >> 6) & 0x03 // bits 14..16
...
pixel[i] = (byte[i >> 2] >> (i * 2 & 0x07)) & 0x03
______________________________________________________________________________________________________________

Absolute bit index:  00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ...
Dword:               [---------------------------------------------00----------------------------------------------] ...
Byte:                [---------00----------] [---------01----------] [---------02----------] [---------03----------] ...
Bit in byte:         00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 00 01 02 03 04 05 06 07 ...

Element index:       [00 ] [01 ] [02 ] [03 ] [04 ] [05 ] [06 ] [07 ] [08 ] [09 ] [10 ] [11 ] [12 ] [13 ] [14 ] [15 ] 
Bit of element:      00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 00 01 ...
```

> [!NOTE]  
> While older EGA/VGA-based images used BE layout (where the *low-index pixel* was in the *high-index bits*, and the *high-index pixel* was the *low-index bits*), newer formats on video game consoles and machine learning tensors follow the convention that higher index elements are stored in higher bits.

This actually applies to *any element bit size* that isn't a multiple of 8, where BE (TCP/IP) fills from MSB to LSB, and LE (x86) fills bits from LSB to MSB, meaning that even oddities like `uintle3` or `uintle5` should work consistently too, but for my needs, 2-bit, 4-bit, and 12-bit are most important.

## Workarounds

Since endianness can be thought more generically as a mapping between logical bit indices and actual bit indices (not simply "how bytes are arranged in a word"), then it's possible to transform between endianness (LE <-> BE) by reversing the direction of all the elements and reversing the direction of all the bytes. So currently I call `swapBE8toLE8` before serializing back out to the file, but it would be nicer to handle this directly in-place with the array via direct `"uintle#"` support (then no extra memory rewrites/copies, potential forgetfulness as you pass the array around to other parts of the program, or boundary condition issues like when the total bits count is not a multiple of 8).

```python
swapBE8toLE8(outputArray)
...

def swapBE8toLE8(array : bitstring.Array):
    if (array.itemsize % 8) == 0:
        # Faster shortcut for byte-size elements which work directly.
        # (but the "else" branch below would work too).
        array.byteswap()
    else:
        # Slower work-around to swap endianness layout for non-byte multiples.
        # This still has the constraint that the total bit count must be a multiple of 8
        # because otherwise the byte reversal fails because of the fractional trailer,
        # whereas a direct implementation would not have that issue.
        originalDtype = array.dtype
        array.reverse()
        array.dtype = "uint8"
        array.reverse()
        array.dtype = originalDtype
    #endif
#endif

def swapLE8toBE8(array : bitstring.Array):
    if (array.itemsize % 8) == 0:
        # Faster shortcut for byte-size elements which work directly.
        # (but the "else" branch below would work too).
        array.byteswap()
    else:
        # Slower work-around to swap endianness layout for non-byte multiples.
        # This still has the constraint that the total bit count must be a multiple of 8
        # because otherwise the byte reversal fails because of the fractional trailer,
        # whereas a direct implementation would not have that issue.
        originalDtype = array.dtype
        array.dtype = "uint8"
        array.reverse()
        array.dtype = originalDtype
        array.reverse()
    #endif
#endif
```

> [!NOTE]  
> These two functions are distinct, and you can't just call `swapBE8toLE8` a second time on the same data to reverse it, because permuting LE to either BE8 or BE16 (so-called "middle endian") isn't always the same as *unpermuting* back to little endianness. They *are* notably symmetric when the element bit size is a multiple of the minimal address unit size (which is 8 bits on most architectures, or 16 bits on a few oddities like the NUXI PDP-11), and so calling `swapBE8toLE8` twice on the same array restores the original data then.

> [!IMPORTANT]  
> I've often seen this belief that endianness is *purely* an architectural hardware trait of how bytes are arranged within a given word unit, but this isn't a complete picture. When you think about units that straddle across bytes (and read architectural diagrams for TCP/IP or documents like the [GenICam Pixel Format Naming Convention](https://www.emva.org/wp-content/uploads/GenICam_PFNC_2_0.pdf)), you realize that endianness indirectly *also* implies the direction bitfields flow within and across each byte, because it makes the most sense (if you want any reasonable efficiency without a bunch of bit slicing and masking/or'ing, especially when reading larger word units than bytes and progressively shifting bits) for BE architectures to store fields MSB->LSB and LE to store LSB->MSB.

## Related

- https://github.com/scott-griffiths/bitstring/issues/156 feels distinct, as this one is about adding a dtype to `bitstring.Array`, and `bitstring.options.lsb0 = True` doesn't solve the issue anyway.
- https://github.com/scott-griffiths/bitstring/issues/41 is about `BitString` adding `intle` and is marked completed, but this is about `bitstring.Array`.
- https://github.com/scott-griffiths/bitstring/issues/210 might be the same the issue, but I've supplied much more information, hopefully enough that it's clear.

##

🫡 Thanks from Redmond Washington.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitstring.Array('uintle12')? #354

TLDR

Problem

Tried

Feature request

Workarounds

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

bitstring.Array('uintle12')? #354

Description

TLDR

Problem

Tried

Feature request

Workarounds

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions