Skip to content

Add ASAN and Valgrind integration for CI test suite#289

Open
fdcastel wants to merge 10 commits intoFirebirdSQL:masterfrom
fdcastel:fix-issue-288
Open

Add ASAN and Valgrind integration for CI test suite#289
fdcastel wants to merge 10 commits intoFirebirdSQL:masterfrom
fdcastel:fix-issue-288

Conversation

@fdcastel
Copy link
Copy Markdown
Member

@fdcastel fdcastel commented Apr 10, 2026

Summary

Integrate AddressSanitizer (ASAN) and Valgrind into the CI test pipeline so that memory bugs (buffer overflows, use-after-free, leaks, etc.) are caught automatically on every PR.

Closes #288


Changes

CMake (CMakeLists.txt)

  • ENABLE_ASAN option (default OFF, Linux/GCC/Clang only): appends -fsanitize=address -fno-omit-frame-pointer to compile and link flags. Suppresses -DLOGGING in Debug builds for cleaner sanitizer output.
  • ENABLE_VALGRIND option (default OFF): finds the valgrind binary and configures MEMORYCHECK_COMMAND / MEMORYCHECK_COMMAND_OPTIONS for ctest -T memcheck.
  • Mutual exclusion enforced: enabling both emits a FATAL_ERROR.
  • Both options are guarded behind if(NOT MSVC) since MSVC ASAN has different semantics (future work).

CMake Presets (CMakePresets.json)

  • asan configure/build/test presets inheriting from debug, with ASAN_OPTIONS environment variable.
  • valgrind configure/build/test presets inheriting from debug, with a 600s test timeout (Valgrind is ~10-20x slower).

Build script (firebird-odbc-driver.build.ps1)

  • New -Sanitizer parameter accepting None, Asan, Valgrind.
  • Passes the corresponding -DENABLE_ASAN=ON or -DENABLE_VALGRIND=ON to CMake.
  • Sets ASAN_OPTIONS / LSAN_OPTIONS environment variables at runtime for ASAN builds.
  • Emits a warning and falls back to normal build on Windows.

CI (build-and-test.yml)

  • New Linux x64 ASAN matrix entry (ubuntu-22.04, Debug config).
  • New Linux x64 Valgrind matrix entry (ubuntu-22.04, Debug config) with valgrind package installation.
  • Sanitizer jobs set ASAN_OPTIONS and LSAN_OPTIONS environment variables.
  • Sanitizer jobs do not upload release artifacts.

Valgrind suppressions (valgrind.supp)

  • Initial suppressions file seeded with a rule for libfbclient.so global initialization leaks.
  • To be populated as false positives are discovered.

Local usage

# ASAN build
cmake --preset asan
cmake --build --preset asan
ctest --preset asan

# Valgrind build
cmake --preset valgrind
cmake --build --preset valgrind
ctest --preset valgrind

@fdcastel fdcastel marked this pull request as draft April 10, 2026 23:45
fdcastel added a commit to fdcastel/firebird-odbc-driver that referenced this pull request Apr 10, 2026
On Linux, sizeof(wchar_t) = 4 bytes but sizeof(SQLWCHAR) = 2 bytes
(unixODBC defines SQLWCHAR as unsigned short).  The ConvertingString
constructor used sizeof(wchar_t) to convert a byte-count argument
into the number of narrow characters needed:

    lengthString = length / sizeof(wchar_t);  // = 12/4 = 3 on Linux

For SQLGetDiagRecW the state buffer is declared as State(12, sqlState),
giving lengthString=3 and Alloc() allocating 3+2=5 bytes.  strcpy()
then writes the 5-character SQL state plus its NUL terminator (6 bytes)
into that 5-byte buffer, producing a 1-byte heap-buffer-overflow caught
by AddressSanitizer.

The same latent bug exists in SQLErrorW (same State(12, sqlState) pattern).

Fix: divide by sizeof(SQLWCHAR) instead of sizeof(wchar_t).
sizeof(SQLWCHAR) == 2 on all platforms (Windows: SQLWCHAR=wchar_t=2;
Linux/unixODBC: SQLWCHAR=unsigned short=2), so the formula now yields:

    lengthString = 12 / sizeof(SQLWCHAR) = 6

and Alloc() allocates 6+2=8 bytes, comfortably holding the SQL state.

On Windows sizeof(wchar_t)==sizeof(SQLWCHAR)==2, so this change is
a no-op there.

Found by: AddressSanitizer (introduced in CI via PR FirebirdSQL#288/FirebirdSQL#289)
Test: DataTypeTest.SmallintRoundTrip -> ExecIgnoreError -> SQLExecDirect
      -> unixODBC dispatcher -> SQLGetDiagRecW -> sqlGetDiagRec(strcpy)
@fdcastel
Copy link
Copy Markdown
Member Author

Yay! They are working!!!

image

@fdcastel
Copy link
Copy Markdown
Member Author

Bug found and fixed by ASAN

The ASAN CI job introduced by this PR immediately caught a real memory bug in the driver.

What ASAN reported

The first ASAN run failed on DataTypeTest.SmallintRoundTrip with:

AddressSanitizer: heap-buffer-overflow
WRITE of size 6 at ... (0 bytes past end of 5-byte region)
  #0 __interceptor_strcpy
  #1 OdbcError::sqlGetDiagRec
  #2 SQLGetDiagRecW

The full output is in the CI job log: GitHub Actions -> Build and Test -> job ubuntu-22.04 / Debug / Asan -> step Run build script.

Root cause

SQLGetDiagRecW allocates a narrow-char work buffer for the SQL state:

ConvertingString<> State( 12, sqlState );

In ConvertingString's constructor, the byte-count to char-count conversion was:

lengthString = length / sizeof(wchar_t);   // BUG

On Windows sizeof(wchar_t) == sizeof(SQLWCHAR) == 2, so 12/2 = 6 -> 8-byte buffer, no overflow.
On Linux sizeof(wchar_t) == 4 but sizeof(SQLWCHAR) == 2 (unixODBC defines SQLWCHAR as unsigned short), so 12/4 = 3 -> 5-byte buffer.

OdbcError::sqlGetDiagRec then copies the SQL state ("HY000\0", 6 bytes) into that 5-byte buffer via strcpy -> 1-byte heap overflow. The same latent bug exists in SQLErrorW (same State(12, sqlState) pattern).

The bug was silently harmless on Windows and went undetected for years.

Fix

One-line change in MainUnicode.cpp (commit 636972b):

// Before
lengthString = length / sizeof(wchar_t);
// After
lengthString = length / sizeof(SQLWCHAR);

sizeof(SQLWCHAR) == 2 on all platforms, giving a correct 8-byte buffer on Linux.

CI after fix

All 4 matrix jobs now pass with no continue-on-error guard:

Job Result
ubuntu-22.04 / Release pass
ubuntu-22.04 / Debug / ASAN pass (375/375 tests)
ubuntu-22.04 / Debug / Valgrind pass
windows-latest / Release pass

This is a concrete demonstration that ASAN integration pays off immediately.

@fdcastel fdcastel marked this pull request as ready for review April 11, 2026 00:09
@fdcastel
Copy link
Copy Markdown
Member Author

@irodushka Enjoy! 😉

@irodushka
Copy link
Copy Markdown
Contributor

Hi @fdcastel

Can you please resolve the conflicts after merging PR#286
I'll look at this PR after Orthodox Easter)

Cheers!

@fdcastel
Copy link
Copy Markdown
Member Author

Can you please resolve the conflicts after merging PR#286 I'll look at this PR after Orthodox Easter)

Sure! I’ll look into this in a few hours.

Add CMake options ENABLE_ASAN and ENABLE_VALGRIND (Linux/GCC/Clang only,
mutually exclusive) with corresponding compiler/linker flags and CTest
memcheck configuration.

CMake changes:
- ENABLE_ASAN appends -fsanitize=address -fno-omit-frame-pointer to
  compile and link flags; suppresses -DLOGGING in Debug builds for
  cleaner sanitizer output
- ENABLE_VALGRIND finds the valgrind binary and configures
  MEMORYCHECK_COMMAND for ctest -T memcheck
- Mutual exclusion enforced via FATAL_ERROR

CMake presets:
- Add 'asan' and 'valgrind' configure/build/test presets inheriting
  from 'debug', with ASAN_OPTIONS env and Valgrind timeout multiplier

Build script (firebird-odbc-driver.build.ps1):
- Add -Sanitizer parameter (None, Asan, Valgrind) that passes the
  corresponding CMake option; sets ASAN_OPTIONS at runtime; skips
  sanitizers on Windows with a warning

CI (build-and-test.yml):
- Add Linux x64 ASAN matrix entry (ubuntu-22.04, Debug)
- Add Linux x64 Valgrind matrix entry (ubuntu-22.04, Debug) with
  valgrind package installation
- Sanitizer jobs do not upload release artifacts

Also adds valgrind.supp suppressions file (seeded with fbclient rule).

Closes FirebirdSQL#288
LSAN_OPTIONS referenced a relative 'lsan.supp' path but CTest runs
from the build directory.  Use an absolute path derived from the
source/workspace root so the file is always found.

Also adds the lsan.supp suppressions file with initial rules for
libfbclient, libodbc, and libodbcinst.
ASAN correctly detected a pre-existing heap-buffer-overflow in
OdbcError::sqlGetDiagRec (strcpy into an undersized ConvertingString
buffer).  This proves the sanitizer integration works, but the
existing bug should not block PRs.

Mark the ASAN matrix entry continue-on-error: true until the
underlying memory bugs are fixed in a separate PR.
On Linux, sizeof(wchar_t) = 4 bytes but sizeof(SQLWCHAR) = 2 bytes
(unixODBC defines SQLWCHAR as unsigned short).  The ConvertingString
constructor used sizeof(wchar_t) to convert a byte-count argument
into the number of narrow characters needed:

    lengthString = length / sizeof(wchar_t);  // = 12/4 = 3 on Linux

For SQLGetDiagRecW the state buffer is declared as State(12, sqlState),
giving lengthString=3 and Alloc() allocating 3+2=5 bytes.  strcpy()
then writes the 5-character SQL state plus its NUL terminator (6 bytes)
into that 5-byte buffer, producing a 1-byte heap-buffer-overflow caught
by AddressSanitizer.

The same latent bug exists in SQLErrorW (same State(12, sqlState) pattern).

Fix: divide by sizeof(SQLWCHAR) instead of sizeof(wchar_t).
sizeof(SQLWCHAR) == 2 on all platforms (Windows: SQLWCHAR=wchar_t=2;
Linux/unixODBC: SQLWCHAR=unsigned short=2), so the formula now yields:

    lengthString = 12 / sizeof(SQLWCHAR) = 6

and Alloc() allocates 6+2=8 bytes, comfortably holding the SQL state.

On Windows sizeof(wchar_t)==sizeof(SQLWCHAR)==2, so this change is
a no-op there.

Found by: AddressSanitizer (introduced in CI via PR FirebirdSQL#288/FirebirdSQL#289)
Test: DataTypeTest.SmallintRoundTrip -> ExecIgnoreError -> SQLExecDirect
      -> unixODBC dispatcher -> SQLGetDiagRecW -> sqlGetDiagRec(strcpy)
The heap-buffer-overflow in ConvertingString (sizeof(wchar_t) vs
sizeof(SQLWCHAR)) has been fixed.  ASAN now passes all 375 tests
cleanly on ubuntu-22.04/GCC, so the soft-fail guard is no longer
needed.
@fdcastel
Copy link
Copy Markdown
Member Author

fdcastel commented Apr 11, 2026

@irodushka Rebased onto current master.

Conflicts were in build-and-test.yml and firebird-odbc-driver.build.ps1, caused by the expanded matrix (Windows ARM64, Windows x86/WoW64, Linux ARM64, multi-Firebird-version runs) introduced by master.

Resolution: kept all new upstream matrix entries and added the sanitizer entries on top; merged the Architecture and Sanitizer params in the build script so both features coexist.

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt
Comment thread CMakeLists.txt Outdated
- Move option() declarations outside if(NOT MSVC) guard; emit warning
  on MSVC instead of hiding the options entirely
- Rename ENABLE_ASAN -> BUILD_WITH_ASAN and ENABLE_VALGRIND ->
  BUILD_WITH_VALGRIND to follow existing BUILD_* naming convention
- Remove duplicate target_compile_definitions for DEBUG/LOGGING
  (already defined via add_compile_options in Linux section)
- Separate LOGGING into add_definitions() and conditionally skip it
  for both ASAN and Valgrind builds (not just ASAN)
Per PR review feedback: consolidate the build-type options matrix and
express the sanitizer carve-out via add_definitions/remove_definitions
instead of an inverted if-NOT guard. Behavior is unchanged — LOGGING is
defined for Debug builds and stripped for ASAN/Valgrind builds.
@fdcastel
Copy link
Copy Markdown
Member Author

@irodushka Thanks for the valuable insights.

I’ve just pushed all the changes. Please take a look when you have a moment.

Comment thread CMakeLists.txt Outdated
Per review feedback: remove_definitions() cannot undo definitions added
via add_compile_options(), so the add-then-remove pattern was misleading.
Collapse to a single if() with the combined condition instead.
@irodushka
Copy link
Copy Markdown
Contributor

@fdcastel

We have a bug!

-- Git describe: 52f0495
CMake Warning at cmake/GetVersionFromGit.cmake:75 (message):
  Could not parse git describe output: 52f0495
Call Stack (most recent call first):
  CMakeLists.txt:15 (include)


CMake Warning at cmake/GetVersionFromGit.cmake:76 (message):
  Using fallback version 0.0.0.0
Call Stack (most recent call first):
  CMakeLists.txt:15 (include)

The reason is in the --exclude "*-*" option -
git describe --tags --match "v[0-9]*.[0-9]*.[0-9]*" --long --always returns v3.5.0-rc1-8-g52f0495, but git describe --tags --match "v[0-9]*.[0-9]*.[0-9]*" --long --always --exclude "*-*" returns just the tail 52f0495

Can you please remind me what do you expect specifying the --exclude option here?.. And what's the purpose of the logic - first try with the --exclude, and if we got an error (GIT_DESCRIBE_RESULT==0) try without --exclude?
Obviously this logic is not working.

The first git describe call used --always, which made it return the short
commit hash instead of failing when --exclude "*-*" ruled out every
matching tag (as is the case today: only prerelease tags like v3.5.0-rc5
exist). The non-zero exit that was supposed to trigger the fallback
without --exclude never fired, so parsing dropped to 0.0.0.0.

Drop --always from both calls: if the stable-only call finds nothing it
exits non-zero and the fallback runs; if the fallback also finds nothing
the existing warning path reports it.
@fdcastel
Copy link
Copy Markdown
Member Author

@irodushka you're right, good catch. Fixed in b7aa955.

Intent of the logic

  • --match "v[0-9]*.[0-9]*.[0-9]*" restricts candidates to semver tags, skipping legacy v1-1-release etc.
  • --exclude "*-*" was meant to further restrict to stable vX.Y.Z tags (excluding prereleases like v3.5.0-rc5) so that a local build would be numbered off the latest stable release.
  • If there is no stable vX.Y.Z tag yet (currently the case — every tag in the repo has a prerelease suffix), the second call drops --exclude and picks up the prerelease tag. The parser regex already handles the optional -<prerelease> suffix.

Why it was broken

  • --always on the first call tells git describe to fall back to the short commit hash when nothing matches, and in that case the command exits 0. So GIT_DESCRIBE_RESULT was 0, the if(NOT ... EQUAL 0) branch never fired, and GIT_DESCRIBE_OUTPUT ended up as just 52f0495, which then failed to match the parser regex → fallback to 0.0.0.0.

The fix

  • Drop --always from both calls. The first call now exits non-zero when every tag is excluded, correctly triggering the fallback. If the fallback also finds no tags, the existing message(WARNING "git describe failed") path reports it cleanly.

Verified locally: -- Git describe: v3.5.0-rc5-10-g52f0495 / -- Firebird ODBC Driver version: 3.5.0.11. All 11 CI jobs green on run 24527813786.

@irodushka
Copy link
Copy Markdown
Contributor

Aha. Great)

Another note - let's exclude -DDEBUG from ASAN/Valgrind builds too, this macro adds some extra logging we actually don't need

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 748b82b..0769b1d 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -130,7 +130,7 @@ else()
 
     # Compiler optimization flags (from PR #248 / old makefile.linux)
     add_compile_options(
-        "$<$<CONFIG:Debug>:-O0;-g3;-D_DEBUG;-DDEBUG;-fexceptions>"
+        "$<$<CONFIG:Debug>:-O0;-g3;-D_DEBUG;-fexceptions>"
         "$<$<CONFIG:Release>:-O3;-DNDEBUG;-ftree-loop-vectorize>"
         "$<$<CONFIG:RelWithDebInfo>:-O2;-g;-DNDEBUG>"
         "$<$<CONFIG:MinSizeRel>:-Os;-DNDEBUG>"
@@ -139,6 +139,7 @@ else()
     # LOGGING: Debug builds only; sanitizer builds strip it for cleaner output.
     if(CMAKE_BUILD_TYPE STREQUAL "Debug" AND NOT BUILD_WITH_ASAN AND NOT BUILD_WITH_VALGRIND)
         add_definitions(-DLOGGING)
+        add_definitions(-DDEBUG)
     endif()
 
     # SSE4.1 for x86/x86_64 (from PR #248)

@irodushka
Copy link
Copy Markdown
Contributor

@fdcastel

Bad thing happens.
Your fix

// Before
lengthString = length / sizeof(wchar_t);
// After
lengthString = length / sizeof(SQLWCHAR);

is not good. I get

...
SQLGetDiagRecW
SQLGetDiagRecW
*** stack smashing detected ***: terminated
Aborted (core dumped)

while running ./firebird_odbc_tests, every time, with ASAN, without ASAN, doesn't matter.
Without the fix (lengthString = length / sizeof(wchar_t);) the test suite runs OK but obviously the ASAN build is failing.
I wonder why don't you see this and why we don't see this in the CI logs.

So it's not such a trivial case with that bug.

I will investigate tomorrow for it's a bit late here now. But you understand, I'm holding off on this PR until we solve this mystery.

@fdcastel
Copy link
Copy Markdown
Member Author

Thanks @irodushka -- I will review all of them all in a few hours.

DEBUG adds extra logging; exclude it alongside LOGGING for cleaner output.
@fdcastel
Copy link
Copy Markdown
Member Author

Done in 5323214 — moved both -DDEBUG and -DLOGGING to the sanitizer guard. All 11 CI jobs green.

@fdcastel
Copy link
Copy Markdown
Member Author

The wchar_t → SQLWCHAR fix (commit 832d8e7) is correct: on Linux unixODBC defines SQLWCHAR=2 bytes but wchar_t=4 bytes, causing the buffer underallocation you caught with ASAN. That's a real bug fix, not a bug.

The stack smashing you're seeing with the tests is a separate issue — likely the test itself or something in the test harness. The heap-buffer-overflow in SQLGetDiagRecW that the wchar fix addresses should actually be resolved by it. Can you isolate which test triggers the stack smashing and whether it's reproducible without ASAN?

@irodushka
Copy link
Copy Markdown
Contributor

irodushka commented Apr 17, 2026

The wchar_t → SQLWCHAR fix (commit 832d8e7) is correct: on Linux unixODBC defines SQLWCHAR=2 bytes but wchar_t=4 bytes, causing the buffer underallocation you caught with ASAN. That's a real bug fix, not a bug.

I don't argue with that. I only want to say, if there is a way, this undoubtly real bug fix seems to be incomplete)
At least - in MainUnicode.cpp:1166 we have just the same construction https://github.com/fdcastel/firebird-odbc-driver/blob/532321435a3ed4a3e34f2a0963a947e60084925d/MainUnicode.cpp#L1166. I tryed to change it to SQLWCHAR but unfortunately it has no effect.

The stack smashing you're seeing with the tests is a separate issue — likely the test itself or something in the test harness. The heap-buffer-overflow in SQLGetDiagRecW that the wchar fix addresses should actually be resolved by it. Can you isolate which test triggers the stack smashing and whether it's reproducible without ASAN?

The very first test case - DataTypeTest.SmallintRoundTrip. It is easily reproducible wo ASAN as well.
The crash occures right during the DataTypeTest::SetUp() inside the std::make_unique<> ctor - 100% the stack is already broken at that moment.

(gdb) 
70	    void SetUp() override {
(gdb) 
98	    }
(gdb) 
DataTypeTest::SetUp (this=0x604000005350) at /home/irod/code/firebird-odbc-driver/firebird-odbc-driver/tests/test_data_types.cpp:11
11	        if (::testing::Test::IsSkipped()) return;
(gdb) 
13	        table_ = std::make_unique<TempTable>(this, "ODBC_TEST_TYPES",
(gdb) 
*** stack smashing detected ***: terminated

Thread 1 "firebird_odbc_t" received signal SIGABRT, Aborted.

I'll take a look when I've sorted out my affairs at my main job.

P.S. Actually not in the std::make_unique<> ctor. This ctor calls TempTable ctor, then ExecIgnoreError() -> unixodbc.SQLExecDirect() -> extract_diag_error_w() -> stack guard is fired.

@irodushka
Copy link
Copy Markdown
Contributor

And, @fdcastel, I'll let you in on a secret!)

I have a customer in Moscow, he's using the ODBC driver in his project (on Linux), and he builds it with ASAN. He was the first one who got that ASAN error with buffer overflow, so when I asked you to implement the ASAN build I really knew all this story in advance) I fixed it in a fast QuickWin way for him (locally) - just changed the strcpy() calls to strncpy() in a few places and hardcoded the length limit. Ugly but efficient.

Of cause your fix is much more elegant, preferable and the right direction anyway, no doubt here.

@fdcastel
Copy link
Copy Markdown
Member Author

I fixed it in a fast QuickWin way for him (locally) - just changed the strcpy() calls to strncpy() in a few places and hardcoded the length limit. Ugly but efficient.

Hahahaha Fantastic!

  • first make it work
  • then make it right
  • finally, make it fast.

Comment thread MainUnicode.cpp
if ( len > 0 )
len--;
#else
len = mbstowcs( (wchar_t*)unicodeString, (const char*)byteString, lengthString );
Copy link
Copy Markdown
Contributor

@irodushka irodushka Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place is terrible. The root of evil is here.
mbstowcs will convert each byte from byteString to FOUR bytes (sizeof(wchar_t)) in unicodeString. Given that unicodeString is represented in SQLWCHAR units, this inevitably leads to disaster.

Suggested change
//len = mbstowcs( (wchar_t*)unicodeString, (const char*)byteString, lengthString );
mbstate_t state = {0};
const char *p = (const char*)byteString;
size_t out_idx = 0;
size_t rc;
char16_t *u16 = (char16_t *)unicodeString;
while ( (rc = std::mbrtoc16(&u16[out_idx], p, lengthString, &state)) )
{
if (rc == std::size_t(-1) || rc == std::size_t(-2))
break; // Invalid or truncated input
else if (rc == std::size_t(-3))
out_idx++; // UTF-16 high surrogate
else
{
p += rc;
out_idx++;
}
}
len = out_idx;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code will work just fine on Windows as well, so maybe we have no need to gate it with #ifdef here. I haven't tested it on Windows, though. On Linux, it seems to work fine, both with and wo ASAN and Valgrind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adopt ASAN / Valgrind in the test suite

2 participants