GPIO pin state drift on userspace exit — root-cause investigation

## Summary

GPIO pins used from userspace via `libgpiod` do not maintain a consistent state across the bootloader → kernel → userspace chain on our current BSP (L4T r36.4.4, JetPack 6.2.1). After the userspace application exits (cleanly or via crash), affected pins typically float to an unintended level (e.g. 3.3V when the customer expects 0V), which is hazardous for devices like relays controlling lights, servos, and aux power.

This issue documents the root-cause investigation. It is a compounding **three-layer problem**, not a single bug. Two of the three layers are fixable in this BSP; the third is standard upstream Linux behavior that the customer's application must accommodate.

Related: #32 (original customer-facing report — "i2s gpio overlay doesn't set default state at boot"), PR #42 (closed; BCT-only attempt, incomplete).

## Symptom

On an ARK carrier (JAJ / PAB / PAB_V3) running the currently-shipped BSP:

1. Before any userspace app runs, a pin intended as a relay output (e.g. the I2S connector pins `H,7 / I,0 / I,1 / I,2 / AC,6`) is in a floating state, not actively driven — so its measured voltage depends on external pulls and coupling.
2. The customer's app opens `/dev/gpiochipN`, requests the line as output, and drives it. This works.
3. The app exits (normal termination, SIGKILL, or crash). The pin does **not** return to the "safe" 0V state. Observed voltage is often 3.3V.
4. This state persists until reboot.

## Root cause — three-layer analysis

### Layer 1 — MB1 BCT (pre-kernel pad programming)

MB1 runs on BPMP very early in boot and programs the Tegra234 pinmux pad-control registers (base `0x02430000`) and the GPIO controller direction/value registers (main at `0x02200000`, AON at `0x0c2f0000`) from two DTSI files that live in this repo:

- `device_tree/<board>/Linux_for_Tegra/bootloader/generic/BCT/tegra234-mb1-bct-pinmux-p3767-dp-a03.dtsi`
- `device_tree/<board>/Linux_for_Tegra/bootloader/tegra234-mb1-bct-gpio-p3767-dp-a03.dtsi`

These are compiled into the MB1 BCT binary by `prebuilt/Linux_for_Tegra/kernel/pinmux/t19x/pinmux-dts2cfg.py` and flashed to the `A_MB1_BCT` / `B_MB1_BCT` QSPI partitions.

**Our current state on all three carriers:** the I2S connector pins are listed under `gpio-input` with `tristate=ENABLE, enable-input=ENABLE, pull=DOWN/UP` — i.e., the pads come up as high-impedance floating inputs at boot. No defined state until an app grabs them.

**What the `pinmux-dts2cfg.py` encoder actually writes:** PADCTL bits `[1:0]=func`, `[3:2]=pull`, `[4]=tristate`, `[6]=einput`, `[8]=lpdr`, and crucially `[10]=SFIO` — the tool forces bit [10] to 0 for any pin listed in the companion `gpio-*` node of the GPIO DTSI, and to 1 for pins not listed there. So if a pin is in `gpio-output-low` with a matching pinmux entry that has `tristate=DISABLE, function=rsvd2`, the pad is actively driven low from MB1 onward. (Source: `pinmux-dts2cfg.py` lines 246–405.)

### Layer 2 — Tegra pinctrl SFIO regression (NVIDIA-acknowledged, fixed only in r36.5+)

This is the single most important finding.

In our kernel tree (L4T r36.4.4), `drivers/pinctrl/tegra/pinctrl-tegra.c` defines two callbacks that run on every userspace GPIO request/release cycle:

- `tegra_pinctrl_gpio_request_enable` (lines 306–330): on request, clears PADCTL bit [10] — pad enters GPIO mode.
- `tegra_pinctrl_gpio_disable_free` (lines 332–353): on release, **unconditionally sets PADCTL bit [10] back to 1** — pad re-enters SFIO mode, handing the pad to the alternate function (e.g. `rsvd2` for pins not routed to a real peripheral).

Both callbacks are gated by `pmx->soc->sfsel_in_mux`. Tegra234 SoC data sets this to `true` at `pinctrl-tegra234.c:1815` and `:1932`.

Consequence: when an app exits, the pad is switched to the alternate function. For pins the customer uses as GPIO, the alternate function is typically reserved/unconnected, and the pad ends up effectively floating — not at the BCT-programmed output-low, and not at the last userspace-written value.

**NVIDIA smoking gun** — forum thread [301171](https://forums.developer.nvidia.com/t/40hdr-spi1-gpio-padctl-register-bit-10-effect-by-gpiod-tools-in-jp6/301171):

> NVIDIA engineer `lhoang`: *"When starting gpioget and gpioset, the SFIO bit is disabled by the pinctrl driver. Once gpioset and gpioget exit, the SFIO is enabled by the pinctrl driver."*
>
> NVIDIA engineer `KevinFFF`: *"Yes, this patch is used to fix the known issue we found to control GPIO in JP6. If bit 10 is set, then it would not work as Output GPIO as expected."*

NVIDIA published an official patch (see thread for the diff). The fix was upstreamed to Linux stable (v6.6.93, v6.12.31, v6.12.32) and **bundled in r36.5 / JetPack 6.2.2 by default**. Community mirror of the patch: <https://github.com/jetsonhacks/jetson-orin-gpio-patch>.

We are on **r36.4.4 (JetPack 6.2.1)**, so we ship the bug.

### Layer 3 — Linux character-device contract (standard upstream behavior)

Upstream Linux documentation, [GPIO v2 ioctl](https://docs.kernel.org/userspace-api/gpio/gpio-v2-get-line-ioctl.html):

> *"The state of a line, including the value of output lines, is guaranteed to remain as requested until the returned file descriptor is closed. Once the file descriptor is closed, the state of the line becomes uncontrolled from the userspace perspective, and may revert to its default state."*

When the customer's app exits (normal, SIGKILL, crash), the kernel's fd teardown runs `linereq_release` → `gpiod_free` → `gpiod_free_commit` (`drivers/gpio/gpiolib.c`). This path does **not** reset the Tegra GPIO block's `OUTPUT_VALUE` register — that register retains the last userspace-written level. But Layer 2 has just handed the pad to SFIO, so the value in the GPIO block is no longer connected to the pad.

NVIDIA staff `DaveYYY` on forum thread [285417](https://forums.developer.nvidia.com/t/gpio-control-on-jp6-orin-nano-and-nx/285417): *"It's the design of libgpiod itself."* Not a Jetson bug, not fixable with pinmux configuration.

libgpiod itself documents this: <https://libgpiod.readthedocs.io/en/stable/gpioset.html>. The recommended pattern is `gpioset --mode=signal` (daemonized) or a systemd service that holds the line for the system's lifetime.

## Why PR #42 was incomplete

PR #42 moved five I2S pins from `gpio-input` to `gpio-output-low` in the JAJ BCT and changed the matching pinmux entries to driven/no-pull. Closed without merge on 2026-03-18.

It only addresses Layer 1 (boot-time state), and only on JAJ. With Layer 2 still broken, the customer-visible symptom (post-exit drift) remains: as soon as the customer's app opens and releases the line once, the pad is handed to SFIO by `tegra_pinctrl_gpio_disable_free`, regardless of what BCT programmed initially. BCT is pre-kernel and is not re-applied on release.

## Fix plan

Three parts, layered to close Layers 1 and 2 completely and give the customer a documented workaround for Layer 3.

### Part A — Patch the Tegra pinctrl SFIO regression (Layer 2)

Cherry-pick NVIDIA's official patch to `drivers/pinctrl/tegra/pinctrl-tegra.{c,h}` from forum thread 301171. Wire it into `patches/` and apply it from `build_kernel.sh` alongside the existing Jetvariety patch.

This restores the correct behavior: the original PADCTL SFIO bit state is captured on request and restored on release, rather than unconditionally set to 1. Once applied, a pin that was SFIO=0 (GPIO mode) before userspace touched it stays in GPIO mode after release, and BCT-programmed output state remains connected to the pad.

(Future work: once we rebase onto r36.5 / JetPack 6.2.2, this patch becomes a no-op and can be dropped.)

### Part B — Complete the BCT configuration for customer-facing pins on all three carriers (Layer 1)

Apply PR #42's pattern — `gpio-output-low` + `tristate=DISABLE, enable-input=DISABLE, pull=NONE` — to the same set of pins on JAJ, PAB, and PAB_V3 BCT files. This gives every customer a defined 0V pad state at boot and immediately after kernel boot, rather than floating.

Scope: the five I2S connector pins (`soc_gpio41_ph7`, `soc_gpio42_pi0`, `soc_gpio43_pi1`, `soc_gpio44_pi2`, `soc_gpio59_pac6`) as the first-pass target. Future passes can extend to other customer-configurable pins on a per-carrier basis.

### Part C — Documentation for Layer 3

Add a doc (`docs/gpio.md` or similar) explaining:
- The three-layer model and what each layer is responsible for.
- Why BCT `gpio-output-low` holds at boot but does not re-assert after userspace release.
- How to configure alternate pins for customer applications (pinmux dtsi + gpio dtsi pattern, partial MB1 BCT flash command `flash.sh -k A_MB1_BCT ...`).
- Recommended userspace patterns: `gpioset --mode=signal`, systemd unit holding the line, or external pull-down resistors for safety-critical outputs.
- When to use `gpio-hog` (pins customers should never touch — it blocks `libgpiod` with `EBUSY`) vs BCT (pins customers will drive from their own code).

## Key references

**NVIDIA docs** (authoritative for MB1 BCT mechanism):
- [Pinmux and GPIO Configuration (r36.4.3)](https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Bootloader/PinmuxGpioConfig.html)
- [T23x BCT Loader Intro](https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/SD/Bootloader/T23xBCTLoaderIntro.html)
- [Jetson Orin Series Boot Flow](https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/AR/BootArchitecture/JetsonOrinSeriesBootFlow.html)
- [Jetson Orin NX / Nano adaptation](https://docs.nvidia.com/jetson/archives/r36.4.3/DeveloperGuide/HR/JetsonModuleAdaptationAndBringUp/JetsonOrinNxNanoSeries.html)

**NVIDIA forum — staff-confirmed issue threads**:
- [301171 — SPI1 GPIO SFIO bit effect on JP6 (the regression + patch)](https://forums.developer.nvidia.com/t/40hdr-spi1-gpio-padctl-register-bit-10-effect-by-gpiod-tools-in-jp6/301171)
- [280082 — Default GPIO level status on AGX Orin (Drive 0/1 pattern)](https://forums.developer.nvidia.com/t/how-do-i-set-the-default-gpio-level-status-for-jetson-agx-orin/280082)
- [337285 — Devkit pin state at boot (pinmux/gpio dtsi loaded in MB1)](https://forums.developer.nvidia.com/t/devkit-pin-state-at-boot/337285)
- [322992 — Change GPIO/Pinmux state without reflashing (split of pinmux.dtsi and gpio.dtsi)](https://forums.developer.nvidia.com/t/change-gpio-pinmux-state-without-reflashing/322992)
- [366377 — Partial MB1 BCT flash recipe](https://forums.developer.nvidia.com/t/how-to-quickly-modify-the-device-tree-in-orin-nano/366377)
- [321145 — GPIO state on crash/hang/reboot](https://forums.developer.nvidia.com/t/default-gpio-state-kernel-crash-or-system-hang/321145)
- [285417 — libgpiod revert-on-exit is by design](https://forums.developer.nvidia.com/t/gpio-control-on-jp6-orin-nano-and-nx/285417)
- [323006 — Only AON GPIO survives SC7 suspend/resume](https://forums.developer.nvidia.com/t/gpio-can-not-keep-high-when-resuming/323006)
- [337391 — MB1 pinmux cannot be overridden by DT overlays](https://forums.developer.nvidia.com/t/gpio-pin-configured-via-device-tree-is-stuck-low/337391)
- [63784 — Canonical gpio-hog example](https://forums.developer.nvidia.com/t/how-to-set-gpio-as-an-output-from-the-device-tree/63784)

**Upstream Linux**:
- [GPIO v2 ioctl ABI — "may revert to its default state"](https://docs.kernel.org/userspace-api/gpio/gpio-v2-get-line-ioctl.html)
- [GPIO character device userspace API](https://docs.kernel.org/userspace-api/gpio/chardev.html)
- [DT bindings — gpio.txt (gpio-hog spec)](https://www.kernel.org/doc/Documentation/devicetree/bindings/gpio/gpio.txt)
- [libgpiod issue #77 — persistent state](https://github.com/brgl/libgpiod/issues/77)
- [libgpiod gpioset docs](https://libgpiod.readthedocs.io/en/stable/gpioset.html)

**Community patch mirror**:
- [jetsonhacks/jetson-orin-gpio-patch](https://github.com/jetsonhacks/jetson-orin-gpio-patch)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPIO pin state drift on userspace exit — root-cause investigation #54

Summary

Symptom

Root cause — three-layer analysis

Layer 1 — MB1 BCT (pre-kernel pad programming)

Layer 2 — Tegra pinctrl SFIO regression (NVIDIA-acknowledged, fixed only in r36.5+)

Layer 3 — Linux character-device contract (standard upstream behavior)

Why PR #42 was incomplete

Fix plan

Part A — Patch the Tegra pinctrl SFIO regression (Layer 2)

Part B — Complete the BCT configuration for customer-facing pins on all three carriers (Layer 1)

Part C — Documentation for Layer 3

Key references

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPIO pin state drift on userspace exit — root-cause investigation #54

Description

Summary

Symptom

Root cause — three-layer analysis

Layer 1 — MB1 BCT (pre-kernel pad programming)

Layer 2 — Tegra pinctrl SFIO regression (NVIDIA-acknowledged, fixed only in r36.5+)

Layer 3 — Linux character-device contract (standard upstream behavior)

Why PR #42 was incomplete

Fix plan

Part A — Patch the Tegra pinctrl SFIO regression (Layer 2)

Part B — Complete the BCT configuration for customer-facing pins on all three carriers (Layer 1)

Part C — Documentation for Layer 3

Key references

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions