Open SourceHigh Impact·Thursday, April 2, 2026

AMD Launches Lemonade: Open-Source Local LLM Server for GPU and NPU

AMD released Lemonade, a 2MB open-source local LLM server optimized for GPU/NPU execution with OpenAI API compatibility out of the box.

What happened

AMD launched Lemonade, a lightweight (2MB) open-source local AI server that runs LLMs, image generation, and speech models on consumer hardware using GPU and NPU acceleration. It supports models like gpt-oss-120b and Qwen-Coder-Next, leverages llama.cpp, Ryzen AI SW, and FastFlowLM backends, and is compatible with hundreds of apps via the OpenAI API standard. The server supports multi-model execution simultaneously, includes a GUI for model management, and auto-configures dependencies for AMD hardware. It targets local-first, private AI execution with no cloud dependency.

Why it matters to you

personalized

Lemonade drops an OpenAI API-compatible local server that auto-configures GPU and NPU backends with zero manual driver wrestling. The 2MB footprint and automatic dependency setup via llama.cpp and Ryzen AI SW means you can run 120B parameter models locally without touching cloud APIs. Multi-model concurrent execution is now a local feature, not a cloud premium.

What to do about it

Install Lemonade on an AMD GPU machine, point your existing OpenAI API calls to localhost, and benchmark latency on your highest-traffic endpoint — if it's under 300ms, eliminate that API cost line entirely.

Try this now

curl5 min

1
Install Lemonade from the AMD GitHub repo and run the installer — it auto-detects your GPU/NPU and sets up backends

Community

7 comments