Refreshingly fast LLMs on
GPUs and NPUs
GPUs and NPUs
Install, run LLMs locally, and discover apps in minutes
One Minute Install
Simple installer that sets up the stack automatically.
Multi-engine compatibility
Works with llama.cpp, Ryzen AI SW, and FastFlowLM.
Auto-configures for your hardware
Configures dependencies for your GPU and NPU.
Multiple Models at Once
Run more than one model at the same time.
Cross-platform
A consistent experience across Windows and Linux.
Multi-modal input
Handle text, images, and audio in one SDK.
Built-in app to manage models
A GUI that lets you download, try, and switch models quickly.
Native C++ Backend
Lightweight server that is only 2MB.
Latest Release
Loading latest release...