Skip to content

fix: BitMaskedArray mask inversion and placeholder detection#4092

Open
henryiii wants to merge 4 commits into
mainfrom
henryiii/fix-bitmasked-python
Open

fix: BitMaskedArray mask inversion and placeholder detection#4092
henryiii wants to merge 4 commits into
mainfrom
henryiii/fix-bitmasked-python

Conversation

@henryiii

@henryiii henryiii commented Jun 10, 2026

Copy link
Copy Markdown
Member

🤖 AI text below 🤖

Summary

Three silent bugs fixed across BitMaskedArray, ByteMaskedArray, and UnionArray:

1. BitMaskedArray.to_BitMaskedArray — bitmask inversion (data corruption)
src/awkward/contents/bitmaskedarray.py:443 used nplike.logical_not(self._mask.data) — a boolean byte-level NOT — to flip valid_when when lsb_order is unchanged. This corrupts packed bitmasks: e.g. 0b10011010 becomes [False] rather than the correct bitwise inverse 0b01100101. Changed to ~self._mask.data (bitwise invert), matching the lsb_order-change branch at line 459.

2. BitMaskedArray._unique — spurious out._content unwrap
src/awkward/contents/bitmaskedarray.py:659-668 had a conditional that returned out._content when negaxis is not None, stripping the option-type wrapper from the result. The sibling ByteMaskedArray._unique (bytemaskedarray.py) returns out unchanged. Removed the branch to match ByteMaskedArray semantics.

3. _is_getitem_at_placeholder checks Index wrapper instead of buffer
Three methods checked isinstance(self._mask/tags/index, PlaceholderArray) against the Index wrapper object, which is never a PlaceholderArray. Siblings ListOffsetArray (listoffsetarray.py:314) and IndexedArray (indexedarray.py:294) correctly check .data. Fixed in:

  • src/awkward/contents/bitmaskedarray.py:499-502
  • src/awkward/contents/bytemaskedarray.py:380-383
  • src/awkward/contents/unionarray.py:587-591

Test plan

  • Regression tests in tests/test_4092_bitmasked_fixes.py: one test per bug fixed (Fix 1: inversion, Fix 2: unique consistency)
  • Existing BitMasked tests pass: test_3033_flatten_bitmaskedarray.py, test_0127_tomask_operation.py, test_1260_simplify_masked_option_types.py
  • All changes linted with prek -a --quiet (clean)

AI-assisted contribution: fixes identified and implemented with Claude Code (Anthropic), from an automated multi-agent review.

henryiii added a commit that referenced this pull request Jun 10, 2026
Covers to_BitMaskedArray mask inversion, _unique consistency between
BitMasked and ByteMasked option types, and _is_getitem_at_placeholder
buffer check propagation.

Assisted-by: ClaudeCode:claude-sonnet-4-6
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
6304 1 6303 34
View the top 1 failed test(s) by shortest run time
tests/test_0404_array_validity_check.py::test_BitMaskedArray
Stack Traces | 0.006s run time
def test_BitMaskedArray():
        content = ak.contents.NumpyArray(np.arange(13))
        mask = ak.index.IndexU8(np.array([58, 59], dtype=np.uint8))
        array = ak.contents.BitMaskedArray(
            mask, content, valid_when=False, length=13, lsb_order=False
        )
        assert np.asarray(array.mask_as_bool(valid_when=True)).tolist() == [
            True,
            True,
            False,
            False,
            False,
            True,
            False,
            True,
            True,
            True,
            False,
            False,
            False,
        ]
        assert np.asarray(array.to_ByteMaskedArray().mask).tolist() == [
            0,
            0,
            1,
            1,
            1,
            0,
            1,
            0,
            0,
            0,
            1,
            1,
            1,
        ]
        assert np.asarray(array.to_IndexedOptionArray64().index).tolist() == [
            0,
            1,
            -1,
            -1,
            -1,
            5,
            -1,
            7,
            8,
            9,
            -1,
            -1,
            -1,
        ]
        assert to_list(array) == [
            0,
            1,
            None,
            None,
            None,
            5,
            None,
            7,
            8,
            9,
            None,
            None,
            None,
        ]
        assert ak._do.is_unique(array, axis=None) is True
        assert to_list(ak._do.unique(array, axis=None)) == [0, 1, 5, 7, 8, 9, None]
>       assert to_list(ak._do.unique(array, axis=-1)) == [0, 1, 5, 7, 8, 9, None]
E       assert [[0, 1, 5, 7, 8, 9, None]] == [0, 1, 5, 7, 8, 9, None]
E         
E         At index 0 diff: [0, 1, 5, 7, 8, 9, None] != 0
E         Right contains 6 more items, first extra item: 1
E         
E         Full diff:
E           [
E         +     [
E         +         0,
E         +         1,
E         +         5,
E         +         7,
E         +         8,
E         +         9,
E         +         None,
E         -     0,
E         ?     ^
E         +     ],
E         ?     ^
E         -     1,
E         -     5,
E         -     7,
E         -     8,
E         -     9,
E         -     None,
E           ]

array      = <BitMaskedArray valid_when='false' lsb_order='false' len='13'>
    <mask><Index dtype='uint8' len='2'>
        [58 59]
    </Index></mask>
    <content><NumpyArray dtype='int64' len='13'>
        [ 0  1  2  3  4  5  6  7  8  9 10 11 12]
    </NumpyArray></content>
</BitMaskedArray>
content    = <NumpyArray dtype='int64' len='13'>
    [ 0  1  2  3  4  5  6  7  8  9 10 11 12]
</NumpyArray>
mask       = <Index dtype='uint8' len='2'>
    [58 59]
</Index>

tests/test_0404_array_validity_check.py:81: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

henryiii and others added 4 commits June 11, 2026 15:39
- Use bitwise `~` instead of `logical_not` in `to_BitMaskedArray` when
  inverting `valid_when` with the same `lsb_order`, matching the
  different-lsb_order branch and preventing silent data corruption.
- Remove erroneous `out._content` unwrap in `BitMaskedArray._unique` so
  it matches `ByteMaskedArray._unique` semantics (both now delegate to
  `IndexedOptionArray64._unique` and return the result unchanged).
- Fix `_is_getitem_at_placeholder` in `BitMaskedArray`, `ByteMaskedArray`,
  and `UnionArray` to check `.data` on the Index wrapper, consistent with
  `ListOffsetArray` and `IndexedArray` siblings.

Assisted-by: ClaudeCode:claude-sonnet-4-6
Covers to_BitMaskedArray mask inversion, _unique consistency between
BitMasked and ByteMasked option types, and _is_getitem_at_placeholder
buffer check propagation.

Assisted-by: ClaudeCode:claude-sonnet-4-6
Remove redundant round-trip, big-endian, and ByteMaskedArray tests that
don't cover changed code paths. Remove placeholder tests that pass even
without the fix. Rename file to use underscores per project convention.
Two focused tests remain: one per distinct bug fixed.

Assisted-by: ClaudeCode:claude-sonnet-4-6
@henryiii henryiii force-pushed the henryiii/fix-bitmasked-python branch from a2418ab to 624fdb2 Compare June 11, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant