Skip to content

GPIO pin state drift on userspace exit — root-cause investigation #54

@dakejahl

Description

@dakejahl

Summary

GPIO pins used from userspace via libgpiod do not maintain a consistent state across the bootloader → kernel → userspace chain on our current BSP (L4T r36.4.4, JetPack 6.2.1). After the userspace application exits (cleanly or via crash), affected pins typically float to an unintended level (e.g. 3.3V when the customer expects 0V), which is hazardous for devices like relays controlling lights, servos, and aux power.

This issue documents the root-cause investigation. It is a compounding three-layer problem, not a single bug. Two of the three layers are fixable in this BSP; the third is standard upstream Linux behavior that the customer's application must accommodate.

Related: #32 (original customer-facing report — "i2s gpio overlay doesn't set default state at boot"), PR #42 (closed; BCT-only attempt, incomplete).

Symptom

On an ARK carrier (JAJ / PAB / PAB_V3) running the currently-shipped BSP:

  1. Before any userspace app runs, a pin intended as a relay output (e.g. the I2S connector pins H,7 / I,0 / I,1 / I,2 / AC,6) is in a floating state, not actively driven — so its measured voltage depends on external pulls and coupling.
  2. The customer's app opens /dev/gpiochipN, requests the line as output, and drives it. This works.
  3. The app exits (normal termination, SIGKILL, or crash). The pin does not return to the "safe" 0V state. Observed voltage is often 3.3V.
  4. This state persists until reboot.

Root cause — three-layer analysis

Layer 1 — MB1 BCT (pre-kernel pad programming)

MB1 runs on BPMP very early in boot and programs the Tegra234 pinmux pad-control registers (base 0x02430000) and the GPIO controller direction/value registers (main at 0x02200000, AON at 0x0c2f0000) from two DTSI files that live in this repo:

  • device_tree/<board>/Linux_for_Tegra/bootloader/generic/BCT/tegra234-mb1-bct-pinmux-p3767-dp-a03.dtsi
  • device_tree/<board>/Linux_for_Tegra/bootloader/tegra234-mb1-bct-gpio-p3767-dp-a03.dtsi

These are compiled into the MB1 BCT binary by prebuilt/Linux_for_Tegra/kernel/pinmux/t19x/pinmux-dts2cfg.py and flashed to the A_MB1_BCT / B_MB1_BCT QSPI partitions.

Our current state on all three carriers: the I2S connector pins are listed under gpio-input with tristate=ENABLE, enable-input=ENABLE, pull=DOWN/UP — i.e., the pads come up as high-impedance floating inputs at boot. No defined state until an app grabs them.

What the pinmux-dts2cfg.py encoder actually writes: PADCTL bits [1:0]=func, [3:2]=pull, [4]=tristate, [6]=einput, [8]=lpdr, and crucially [10]=SFIO — the tool forces bit [10] to 0 for any pin listed in the companion gpio-* node of the GPIO DTSI, and to 1 for pins not listed there. So if a pin is in gpio-output-low with a matching pinmux entry that has tristate=DISABLE, function=rsvd2, the pad is actively driven low from MB1 onward. (Source: pinmux-dts2cfg.py lines 246–405.)

Layer 2 — Tegra pinctrl SFIO regression (NVIDIA-acknowledged, fixed only in r36.5+)

This is the single most important finding.

In our kernel tree (L4T r36.4.4), drivers/pinctrl/tegra/pinctrl-tegra.c defines two callbacks that run on every userspace GPIO request/release cycle:

  • tegra_pinctrl_gpio_request_enable (lines 306–330): on request, clears PADCTL bit [10] — pad enters GPIO mode.
  • tegra_pinctrl_gpio_disable_free (lines 332–353): on release, unconditionally sets PADCTL bit [10] back to 1 — pad re-enters SFIO mode, handing the pad to the alternate function (e.g. rsvd2 for pins not routed to a real peripheral).

Both callbacks are gated by pmx->soc->sfsel_in_mux. Tegra234 SoC data sets this to true at pinctrl-tegra234.c:1815 and :1932.

Consequence: when an app exits, the pad is switched to the alternate function. For pins the customer uses as GPIO, the alternate function is typically reserved/unconnected, and the pad ends up effectively floating — not at the BCT-programmed output-low, and not at the last userspace-written value.

NVIDIA smoking gun — forum thread 301171:

NVIDIA engineer lhoang: "When starting gpioget and gpioset, the SFIO bit is disabled by the pinctrl driver. Once gpioset and gpioget exit, the SFIO is enabled by the pinctrl driver."

NVIDIA engineer KevinFFF: "Yes, this patch is used to fix the known issue we found to control GPIO in JP6. If bit 10 is set, then it would not work as Output GPIO as expected."

NVIDIA published an official patch (see thread for the diff). The fix was upstreamed to Linux stable (v6.6.93, v6.12.31, v6.12.32) and bundled in r36.5 / JetPack 6.2.2 by default. Community mirror of the patch: https://github.com/jetsonhacks/jetson-orin-gpio-patch.

We are on r36.4.4 (JetPack 6.2.1), so we ship the bug.

Layer 3 — Linux character-device contract (standard upstream behavior)

Upstream Linux documentation, GPIO v2 ioctl:

"The state of a line, including the value of output lines, is guaranteed to remain as requested until the returned file descriptor is closed. Once the file descriptor is closed, the state of the line becomes uncontrolled from the userspace perspective, and may revert to its default state."

When the customer's app exits (normal, SIGKILL, crash), the kernel's fd teardown runs linereq_releasegpiod_freegpiod_free_commit (drivers/gpio/gpiolib.c). This path does not reset the Tegra GPIO block's OUTPUT_VALUE register — that register retains the last userspace-written level. But Layer 2 has just handed the pad to SFIO, so the value in the GPIO block is no longer connected to the pad.

NVIDIA staff DaveYYY on forum thread 285417: "It's the design of libgpiod itself." Not a Jetson bug, not fixable with pinmux configuration.

libgpiod itself documents this: https://libgpiod.readthedocs.io/en/stable/gpioset.html. The recommended pattern is gpioset --mode=signal (daemonized) or a systemd service that holds the line for the system's lifetime.

Why PR #42 was incomplete

PR #42 moved five I2S pins from gpio-input to gpio-output-low in the JAJ BCT and changed the matching pinmux entries to driven/no-pull. Closed without merge on 2026-03-18.

It only addresses Layer 1 (boot-time state), and only on JAJ. With Layer 2 still broken, the customer-visible symptom (post-exit drift) remains: as soon as the customer's app opens and releases the line once, the pad is handed to SFIO by tegra_pinctrl_gpio_disable_free, regardless of what BCT programmed initially. BCT is pre-kernel and is not re-applied on release.

Fix plan

Three parts, layered to close Layers 1 and 2 completely and give the customer a documented workaround for Layer 3.

Part A — Patch the Tegra pinctrl SFIO regression (Layer 2)

Cherry-pick NVIDIA's official patch to drivers/pinctrl/tegra/pinctrl-tegra.{c,h} from forum thread 301171. Wire it into patches/ and apply it from build_kernel.sh alongside the existing Jetvariety patch.

This restores the correct behavior: the original PADCTL SFIO bit state is captured on request and restored on release, rather than unconditionally set to 1. Once applied, a pin that was SFIO=0 (GPIO mode) before userspace touched it stays in GPIO mode after release, and BCT-programmed output state remains connected to the pad.

(Future work: once we rebase onto r36.5 / JetPack 6.2.2, this patch becomes a no-op and can be dropped.)

Part B — Complete the BCT configuration for customer-facing pins on all three carriers (Layer 1)

Apply PR #42's pattern — gpio-output-low + tristate=DISABLE, enable-input=DISABLE, pull=NONE — to the same set of pins on JAJ, PAB, and PAB_V3 BCT files. This gives every customer a defined 0V pad state at boot and immediately after kernel boot, rather than floating.

Scope: the five I2S connector pins (soc_gpio41_ph7, soc_gpio42_pi0, soc_gpio43_pi1, soc_gpio44_pi2, soc_gpio59_pac6) as the first-pass target. Future passes can extend to other customer-configurable pins on a per-carrier basis.

Part C — Documentation for Layer 3

Add a doc (docs/gpio.md or similar) explaining:

  • The three-layer model and what each layer is responsible for.
  • Why BCT gpio-output-low holds at boot but does not re-assert after userspace release.
  • How to configure alternate pins for customer applications (pinmux dtsi + gpio dtsi pattern, partial MB1 BCT flash command flash.sh -k A_MB1_BCT ...).
  • Recommended userspace patterns: gpioset --mode=signal, systemd unit holding the line, or external pull-down resistors for safety-critical outputs.
  • When to use gpio-hog (pins customers should never touch — it blocks libgpiod with EBUSY) vs BCT (pins customers will drive from their own code).

Key references

NVIDIA docs (authoritative for MB1 BCT mechanism):

NVIDIA forum — staff-confirmed issue threads:

Upstream Linux:

Community patch mirror:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions