Shared Memory GEMM#14
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
* intra_sm/ipc v1 release * intra_sm/ipc v2 release * intra_sm/shared_mem v1 release * intra_sm/shared_mem v2 release * intra_sm/ipc v3 release * intra_sm/tb_scheduler v1 release * clean up duplicated files * inter_sm/l2_cache v1 * inter_sm/mem_bw * delete unused files in inter_sm * cleanup interference kernels * improve logs * verified intra_sm/ipc scripts * verified inter_sm/shared_mem * specify GPU type for IPC experiment * verified intra_sm/tb_scheduler * update README * inter_sm/l2_cache v2 * bug fix: use cuda.synchronize in case there is no interfernece kernel * README v1 * bug fixes * Shared Memory GEMM (#14) * add scripts for gemm shared memory interference * split up shared_mem into llm and gemm subrepo * intra_sm/shared_mem/gemm v1 * Merge gcontext_test.py and main.py (#15) * membw to use main.py instead of gcontext_test.py * fix bugs related to num_requests and num_threads_per_tb * inter_sm/mem_bw verified * remove gcontext_test.py, new universal entrypoint is main.py * fix inter_sm/l2 to use L2Kernel * fix, missing set_percentage arg * fix num_warmup vs num_request * minor adjustment README ipc * add requirements.txt file * remove custom profiling folder * main README * vllm/interference * inter_sm/l2_cache final * inter_sm/mem_bw final * intra_sm/ipc final * intra_sm shared mem gemm * intra-sm shared mem llm * intra sm tb scheduler final
add paper shared memory paper experiment for GEMM