llama.cpp on Windows with 2x Intel Arc B580 - SYCL

Running llama.cpp SYCL on Windows with Intel Arc B580

Prerequisites

Windows with Intel Arc B580 driver 32.0.101.8724+
Verify: Get-WmiObject Win32_VideoController | Select Name, DriverVersion

Step 1 — Download SYCL build

$release = Invoke-RestMethod "<https://api.github.com/repos/ggml-org/llama.cpp/releases/latest>"
$asset = $release.assets | Where-Object { $_.name -like "*win-sycl-x64*" }
Invoke-WebRequest $asset.browser_download_url -OutFile "llama-sycl.zip"
Expand-Archive llama-sycl.zip -DestinationPath C:\llama-sycl -Force

No oneAPI install needed — the zip bundles all required DLLs (sycl8.dll, mkl_*.dll, svml_dispmd.dll).

Step 2 — Verify GPUs are detected

cd C:\llama-sycl
$env:ONEAPI_DEVICE_SELECTOR = "level_zero:0,1"
.\sycl-ls.exe

Expected output:

| 0| [level_zero:gpu:0]| Intel Arc B580 Graphics| 20.1| 160| ... | 1.15.37669|
| 1| [level_zero:gpu:1]| Intel Arc B580 Graphics| 20.1| 160| ... | 1.15.37669|

Step 3 — Run the server

$env:ONEAPI_DEVICE_SELECTOR = "level_zero:0,1"
$env:ZES_ENABLE_SYSMAN = "1"
$env:GGML_SYCL_DISABLE_DNN = "1"

.\llama-server.exe `
    -m C:\llama.cpp\models\Qwen2.5-7B-Instruct-Q4_K_M.gguf `
    --port 8000 --host 0.0.0.0 `
    -c 32768 -ngl 99 -sm row `
    --flash-attn true -fit off

Known Issues & Workarounds

Issue	Workaround
Crashes during device init	Use `-fit off`
oneDNN crash	Set `GGML_SYCL_DISABLE_DNN=1`
Multi-GPU crash	Try single GPU: `level_zero:0`
`ZES_ENABLE_SYSMAN` warning	Set as env var, not export
`unknown id` on B580	Known driver issue, cosmetic only on Windows
SYCL slower than Vulkan	Expected — driver 1.15 not fully optimized for B580 yet