🧠 Level 6: Model — LLaMa

The local AI model. Runs on [::1]:8765. This is where inference happens.

Checking...
Step 1
Install llama-cpp (or use llama.cpp binary)
pip3 install llama-cpp-python
Step 2
Download your model
wget https://huggingface.co/.../model.gguf
Step 3
Start model server
llama-server -m model.gguf --host ::1 --port 8765 -c 8192 &
Step 4
Verify model responds
curl -s http://[::1]:8765/
💡 Why this matters:
The model is the AI brain. llama-cpp-server (or llama.cpp) loads a GGUF model and serves it over HTTP. Your system talks to it via the OpenClaw gateway. Context size (-c) depends on your RAM — 8192 needs ~8GB, 32768 needs ~32GB.
Terminal — click ▶ Run above or type manually