Make LLM inference faster and cheaper with SGLang

DeepLearning.AI··2 min read
AI/MLTechnology
Share𝕏in

AI Summary

DeepLearning.AI launched a new course on optimizing LLM inference efficiency using SGLang. The course teaches how to reduce computational costs through caching strategies and covers both text generation and image diffusion models.

Key Facts

SGLang course teaches how to optimize LLM inference by eliminating repeated computation through caching shared system prompts and context
Students build attention mechanisms and KV cache from scratch, then implement RadixAttention for cross-user caching optimization
Course covers both text generation optimization and applying the same principles to diffusion models for faster image generation

More from DeepLearning.AI

📰TodayFeed📡Signals💰Capital