Skip to content
View chfeng-cs's full-sized avatar
💬
All In AI
💬
All In AI
  • Alibaba
  • Shanghai Jiao Tong University
  • 23:31 (UTC +08:00)

Block or report chfeng-cs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
chfeng-cs/README.md

Ethan Feng

Infrastructure engineer focused on LLM inference systems.

  • M.S. Computer Science — Shanghai Jiao Tong University
  • B.S. Computer Science — Harbin Institute of Technology
  • 2 yrs at Alibaba

Focus Areas: LLM Inference / GPU Performance

Open Source

Currently contributing to vllm — KV cache transfer, scheduler optimization, and hybrid KV cache management (HMA).

See detail at my vllm contributions

Contact

📫 ethan.fengch [at] gmail [dot] com

🔗 LinkedIn

Pinned Loading

  1. sglang sglang Public

    Forked from sgl-project/sglang

    SGLang is a high-performance serving framework for large language models and multimodal models.

    Python

  2. vllm vllm Public

    Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

  3. flashinfer flashinfer Public

    Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    Python

  4. vllm-contributions vllm-contributions Public

    Python

  5. TensorRT-LLM TensorRT-LLM Public

    Forked from NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

    Python