Skip to content

Puzzle 9: GPU Debugging Workflow Issue #255

Description

@thomas-gale

uv/pixi CLI Version

0.70.2

Mojo Version

1.0.0b2.dev2026060906 (26da54e2)

GPU Information

NVIDIA GeForce RTX 5070 Ti (12.0)

Operating System

Arch Linux

Related Puzzle

Puzzle 9: GPU Debugging Workflow

What happened?

  1. Run the 'Detective Work Puzzles' (three methods):
    a pixi run -e nvidia p09 --first-case doesn't crash! (Just outputs garbage) result: HostBuffer([-8.103731e+22, 216419.02, 148412.05, 10.0])
    b. JIT method ptx error (see below): pixi run -e nvidia mojo debug --cuda-gdb --break-on-launch problems/p09/p09.mojo --first-case (running wtih run/r in cuda-gdb)
    c. Or, build with debug info same ptx error (see below): pixi run -e nvidia mojo build -O0 -g problems/p09/p09.mojo -o problems/p09/p09_debug

  2. For methods b & c, the error output:

/home/tgale/dev/mojo-gpu-puzzles/problems/p09/p09.mojo:1:1: error: ptxas application ptx input, line 96; error   : Output parameter cannot be an incomplete array.
ptxas application ptx input, line 125; error   : Output parameter cannot be an incomplete array.
ptxas application ptx input, line 600; error   : Call has wrong number of parameters
ptxas application ptx input, line 649; error   : Call has wrong number of parameters
ptxas fatal   : Ptx assembly aborted due to errors

# ===----------------------------------------------------------------------=== #
^
/home/tgale/dev/mojo-gpu-puzzles/.pixi/envs/nvidia/bin/mojo: error: failed to run the pass manager
  1. I would expect JIT/build with debug symbols to compile so that we can perform the debugging/detective exercises/steps outlined in the three tutorial cases.

Note, I was able to build with default optimization (pixi run -e nvidia mojo build -g problems/p09/p09.mojo -o problems/p09/p09_debug), but that meant that I was debugging things with optimized away symbols - I appreciate this was not the intention of the tutorial!

E.g.

  • pixi run -e nvidia mojo build -g problems/p09/p09.mojo -o problems/p09/p09_debug
  • pixi run -e nvidia cuda-gdb problems/p09/p09_debug
  • set cuda break_on_launch all
  • run --first-case

Able to see the initialized input memory (the intention of the tutorial)

(cuda-gdb)  set cuda break_on_launch all
(cuda-gdb) run --first-case
Starting program: /home/tgale/dev/mojo-gpu-puzzles/problems/p09/p09_debug --first-case
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff31ff6c0 (LWP 25043)]
[New Thread 0x7ffff21fe6c0 (LWP 25044)]
[New Thread 0x7ffff11fd6c0 (LWP 25045)]
[New Thread 0x7ffff01fc6c0 (LWP 25046)]
[New Thread 0x7fffef1fb6c0 (LWP 25047)]
[New Thread 0x7fffee1fa6c0 (LWP 25048)]
[New Thread 0x7fffed1f96c0 (LWP 25049)]
[New Thread 0x7fffec1f86c0 (LWP 25050)]
First Case: Try to identify what's wrong without looking at the code!

[New Thread 0x7fffe47ff6c0 (LWP 25052)]
[New Thread 0x7fffe29ff6c0 (LWP 25053)]
[Detaching after fork from child process 25054]
[New Thread 0x7fffdbfff6c0 (LWP 25065)]
[New Thread 0x7fffdaffe6c0 (LWP 25066)]
[New Thread 0x7ffdbbfff6c0 (LWP 25067)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]

CUDA thread hit application kernel entry function breakpoint, p09_add_10_UnsafePointer_Bo6A6AoA6A6AcB6A6A_0f8da7b69379d471<<<(1,1,1),(4,1,1)>>> (output=0x7ffa84000000, a=0x7fffffff93b0) at /home/tgale/dev/mojo-gpu-puzzles/problems/p09/p09.mojo:31
31          var i = thread_idx.x
(cuda-gdb) print a[0]@4
$1 = {{2.25695233e-39}, {0}, {1.40129846e-45}, {0}}

Steps to Reproduce

  1. Puzzle 9: GPU Debugging Workflow
  2. Run, JIT: pixi run -e nvidia mojo debug --cuda-gdb --break-on-launch problems/p09/p09.mojo --first-case or attempt to build with debug: pixi run -e nvidia mojo build -O0 -g problems/p09/p09.mojo -o problems/p09/p09_debug
  3. Observe in output:
/home/tgale/dev/mojo-gpu-puzzles/problems/p09/p09.mojo:1:1: error: ptxas application ptx input, line 96; error   : Output parameter cannot be an incomplete array.
ptxas application ptx input, line 125; error   : Output parameter cannot be an incomplete array.
ptxas application ptx input, line 600; error   : Call has wrong number of parameters
ptxas application ptx input, line 649; error   : Call has wrong number of parameters
ptxas fatal   : Ptx assembly aborted due to errors

Relevant Log Output

Additional Context

I assume this issue is related to modularbot bumping the mojo version in the pixi.lock to latest nightly and not having tests for regressions to tutorial 9? (https://github.com/modular/mojo-gpu-puzzles/commits?author=modularbot)

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions