Running llama.cpp SYCL on Windows with Intel Arc B580

Prerequisites


Step 1 — Download SYCL build

$release = Invoke-RestMethod "<https://api.github.com/repos/ggml-org/llama.cpp/releases/latest>"
$asset = $release.assets | Where-Object { $_.name -like "*win-sycl-x64*" }
Invoke-WebRequest $asset.browser_download_url -OutFile "llama-sycl.zip"
Expand-Archive llama-sycl.zip -DestinationPath C:\\llama-sycl -Force

No oneAPI install needed — the zip bundles all required DLLs (sycl8.dll, mkl_*.dll, svml_dispmd.dll).


Step 2 — Verify GPUs are detected

cd C:\\llama-sycl
$env:ONEAPI_DEVICE_SELECTOR = "level_zero:0,1"
.\\sycl-ls.exe

Expected output:

| 0| [level_zero:gpu:0]| Intel Arc B580 Graphics| 20.1| 160| ... | 1.15.37669|
| 1| [level_zero:gpu:1]| Intel Arc B580 Graphics| 20.1| 160| ... | 1.15.37669|

Step 3 — Run the server

$env:ONEAPI_DEVICE_SELECTOR = "level_zero:0,1"
$env:ZES_ENABLE_SYSMAN = "1"
$env:GGML_SYCL_DISABLE_DNN = "1"

.\\llama-server.exe `
    -m C:\\llama.cpp\\models\\Qwen2.5-7B-Instruct-Q4_K_M.gguf `
    --port 8000 --host 0.0.0.0 `
    -c 32768 -ngl 99 -sm row `
    --flash-attn true -fit off

Known Issues & Workarounds

Issue Workaround
Crashes during device init Use -fit off
oneDNN crash Set GGML_SYCL_DISABLE_DNN=1
Multi-GPU crash Try single GPU: level_zero:0
ZES_ENABLE_SYSMAN warning Set as env var, not export
unknown id on B580 Known driver issue, cosmetic only on Windows
SYCL slower than Vulkan Expected — driver 1.15 not fully optimized for B580 yet