waqasm86 waqasm86

About Me

I'm Mohammad Waqas, a systems engineer focused on GPU-accelerated LLM inference, observability, and local-first AI infrastructure. I build practical tooling around CUDA, llama.cpp, OpenTelemetry, distributed computing, and low-VRAM deployment.

Building CUDA-first inference and telemetry systems for local LLMs
Engineering distributed runtimes with MPI, TCP, async I/O, and content-addressed storage
Connecting coding agents to local GGUF models through MCP bridges
Operating AI workloads with Docker, Kubernetes, Helm, Prometheus, and Grafana

Featured Work

Project	What it does	Stack
LlamaTelemetry	CUDA-first OpenTelemetry SDK for LLM inference observability and explainability	Python, OpenTelemetry, CUDA
CUDA NVIDIA Systems Engineering	Distributed LLM inference system with TCP networking, MPI scheduling, storage, and latency benchmarks	C++20, CUDA, MPI
LLM Observability Stack	Local GPU AI platform combining k3s, Ollama, Open WebUI, LangChain, and observability tooling	Kubernetes, Helm, NVIDIA
Windsurf llama.cpp MCP Bridge	MCP server routing coding-agent tools to a local llama.cpp server	Python, MCP, GGUF
CUDA MPI Llama Scheduler	Work-stealing inference scheduler with multi-rank load balancing and percentile latency analysis	CUDA, MPI, C++
Ubuntu CUDA llama.cpp Executable	Prebuilt CUDA-enabled llama.cpp distribution for Ubuntu, from low-VRAM GPUs to RTX systems	Python, CUDA, llama.cpp

Tech Stack

Languages

AI, Inference and Observability

Infrastructure and Systems

GitHub Stats

_{Building efficient AI systems from GPU kernels to production telemetry.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

waqasm86 waqasm86

Achievements

Achievements

Block or report waqasm86

About Me

Featured Work

Tech Stack

GitHub Stats

Pinned Loading

Uh oh!