Skip to content

Build with CUDA 12.8, hermetic build env using nix flake, ability to use upstream sentencepiece, use upstream NCCL#1041

Open
robinp wants to merge 3 commits into
marian-nmt:masterfrom
robinp:nix-cuda12.8
Open

Build with CUDA 12.8, hermetic build env using nix flake, ability to use upstream sentencepiece, use upstream NCCL#1041
robinp wants to merge 3 commits into
marian-nmt:masterfrom
robinp:nix-cuda12.8

Conversation

@robinp

@robinp robinp commented Mar 3, 2026

Copy link
Copy Markdown

Description

Probably a too big change in once, so opening rather to document how these things can be done, for cherry-picking in case someone is interested.

How to test

Install the nix package manager, then get a shell using nix develop - this will compile with the cmake settings as set in flake.nix. Only a devShell is added, but adding an output that produces the built binaries wouldn't be too hard (would need pinning the submodules etc though?).

TESTED=compiled & exercised training and translation on Linux in the hermetic Nix devShell. (flake.lock currently locks this to a nixos-unstable package state where CUDA 12.8 is the current).

See the README addition for detail on how to build and run with Nix, and why you need to symlink libcuda.so.1 into a separate directory to be put on the LD_LIBRARY_PATH, but the TLDR is:

nix \
    --extra-experimental-features nix-command \
    --extra-experimental-features flakes \
    develop
# in the shell:
# replace LD_LIBRARY_PATH dir with wherever you have put a symlink for libcuda.so.1 (and maybe libnvidia....so..), see README
LD_LIBRARY_PATH=/run/nvidia-libs marian translate ...

Some notable changes

  • Building with Nix section in the README
  • Added CMAKE option to enable or disable the ( currently hardcoded-on) disabling of the default lookup path for cublasLt, which prevented finding it
  • Added CMAKE option controlling if the local (somewhat old?) NCCL is used or upstream. NOTE: this should default to ON for current behavior (it is now set default OFF).
  • Added CMAKE option to use local or upstream sentencepiece. NOTE: Again, I defaulted to dont-use-local, but for previous behavior compatibility, this should be ON.

Minor changes

  • CMAKE emits bit more status messages about MKL lookup stages, as it was quirky to get it found
  • tiny C++ and CU changes for GCC 14 / NVCC 12.8 compatibility

Quirks

  • I added 12.6 support for SM_100 arch separately, but in retrospect docs say only 12.8 adds support for that.. so maybe adding 12.6 separately is moot.
  • It seems the upstream-provided sentencepiece can be built in USE_STATIC_LIBS mode, but I don't understand why. So forcing that (new) CMAKE codepath disabled, and even in static build mode, the upstream sentencepiece is linked to dynamically now.
  • I did NOT test NCCL, so while it compiles and links, no idea if it would work.
  • I did NOT test building with CUDNN - probably would work, but might need adding pkgs.cudaPackages.cudadnn to the flake.nix.

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant