A simple proof-of-concept Windows syscall logger using the NT instrumentation callback mechanism. I wrote this several years ago, but I've decided to post it now as I've been asked about the topic on many occasions.
Windows allows a process to register an instrumentation callback via NtSetInformationProcess(ProcessInstrumentationCallback). This callback is invoked on every return from a kernel syscall, giving user-mode code the opportunity to inspect each call before execution resumes.
This project:
- Builds a sorted syscall name table from
ntdll.dll(NT table) andwin32u.dll(Win32k table) export lists at runtime. - Allocates and registers a small x64 assembly stub as the instrumentation callback.
- The stub atomically stores the current thread ID into a shared slot to notify the handler, and spin-waits for the handler to finish.
- A dedicated handler thread waits for syscalls on other threads, suspends the target thread, queries
ThreadLastSystemCallviaNtQueryInformationThread, logs the syscall name, first param value, and return value, then clears the shared slot to allow the original thread to continue.
A separate handler thread is required because calling NtQueryInformationThread(ThreadLastSystemCall) on the current thread always returns the query call itself (NtQueryInformationThread), which is useless for obvious reasons. Instead, each thread that triggers the callback stores its ID in the shared slot and spins until the background handler thread picks it up, suspends it, and performs the query on its behalf. This is very inefficient and introduces an obvious bottleneck - only one thread is processed at a time, and every other intercepted syscall blocks until the handler is ready.
Compounding this, the handler thread itself must spin in a busy-wait loop polling the shared slot, rather than using an event-driven approach. The reason for this is that the threads waiting in the callback cannot execute any syscalls while blocked - doing so would overwrite the ThreadLastSystemCall value before the handler reads it. Since an efficient signalling mechanism would necessitate the use of syscalls, a conventional queue is not viable here, and a busy-wait spin loop is the only workable option.
In practice, this was not a concern. The tool was written to assist with tracing a specific application (CTF challenge) where performance was not critical.