AMD released Lemonade, a 2MB open-source local LLM server optimized for GPU/NPU execution with OpenAI API compatibility out of the box.
AMD launched Lemonade, a lightweight (2MB) open-source local AI server that runs LLMs, image generation, and speech models on consumer hardware using GPU and NPU acceleration. It supports models like gpt-oss-120b and Qwen-Coder-Next, leverages llama.cpp, Ryzen AI SW, and FastFlowLM backends, and is compatible with hundreds of apps via the OpenAI API standard. The server supports multi-model execution simultaneously, includes a GUI for model management, and auto-configures dependencies for AMD hardware. It targets local-first, private AI execution with no cloud dependency.
Lemonade drops an OpenAI API-compatible local server that auto-configures GPU and NPU backends with zero manual driver wrestling. The 2MB footprint and automatic dependency setup via llama.cpp and Ryzen AI SW means you can run 120B parameter models locally without touching cloud APIs. Multi-model concurrent execution is now a local feature, not a cloud premium.
Install Lemonade on an AMD GPU machine, point your existing OpenAI API calls to localhost, and benchmark latency on your highest-traffic endpoint — if it's under 300ms, eliminate that API cost line entirely.
Install Lemonade from the AMD GitHub repo and run the installer — it auto-detects your GPU/NPU and sets up backends
Tags
Sources
Signals by role
Also today
Tools mentioned