LightLLM v1.0.0--Minimal Inter-Process Communication Overhead, Fastest DeepSeek-R1 Serving Performance on Single H200, and Prototype Support for PD-Disaggregation By MTC Team · 16 Feb 2025 We are delighted to announce the release of LightLLM v1.0.0.
Reducing Overhead with Cuda Graph By MTC Team · 22 Jan 2025 Cuda Graph is used to reduce overhead in LightLLM.