The Battlemage WSL2 Playbook: Dual B580 Inference via Vulkan

Here is the complete, start-to-finish documentation for bringing a raw Dual Intel B580 (Battlemage) setup online in WSL2 for LLM inference.

This guide specifically bypasses the current Intel Level Zero / IPEX translation bugs in WSL2 by leveraging the highly stable Vulkan backend in llama.cpp.

Architecture Overview

Hardware: 2x Intel Arc B580 (12GB VRAM each, 24GB Total)
Environment: Windows Subsystem for Linux (WSL2) - Ubuntu 24.04
Target Model: Qwen 3.5 9B (Q4_K_M GGUF format)
Compute Backend: Vulkan (bypassing native SYCL/OneAPI due to WSL2 driver segmentation faults on Xe2)

Phase 1: Prepare the WSL2 Environment

Intel GPUs in WSL2 are exposed via a DirectX translation layer (/dev/dxg). To use them effectively without native Linux kernel drivers, we rely on Vulkan's excellent paravirtualization support.

1. Install Vulkan Core Libraries and Compilers

You need the Vulkan headers, the shader compiler (glslc), and the SPIR-V headers to translate the compute shaders into a binary format the B580s can execute.

sudo apt update sudo apt install vulkan-tools libvulkan-dev glslc spirv-headers

2. Verify Hardware Visibility

Ensure the Vulkan API can successfully poll your GPUs without crashing.

vulkaninfo | grep "deviceName"

Expected Output: You should see your two Intel Arc B580 GPUs listed.

Architecture Overview

Phase 1: Prepare the WSL2 Environment

Phase 2: Compile llama.cpp for Vulkan

Phase 2: Compile `llama.cpp` for Vulkan