Last Updated: 3/9/2026

Installation

vLLM supports the following hardware platforms:

GPU

NVIDIA CUDA
AMD ROCm
Intel XPU

CPU

Intel/AMD x86
ARM AArch64
Apple silicon
IBM Z (S390X)

Hardware Plugins

vLLM supports third-party hardware plugins that live outside the main vllm repository. These follow the Hardware-Pluggable RFC.

A list of all supported hardware can be found on the vllm.ai website . If you want to add new hardware, please contact us on Slack or Email.

Installation Instructions

NVIDIA CUDA

For NVIDIA GPUs, install vLLM using pip:


# Using uv (recommended)
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm --torch-backend=auto

The --torch-backend=auto flag automatically selects the appropriate PyTorch index based on your CUDA driver version.

Alternatively, use conda:


conda create -n myenv python=3.12 -y
conda activate myenv
pip install --upgrade uv
uv pip install vllm --torch-backend=auto

AMD ROCm

For AMD GPUs, install vLLM using uv:


uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/

Requirements:

Python 3.12
ROCm 7.0
glibc >= 2.35

Note: Previously, Docker images were published using AMD’s docker release pipeline at rocm/vllm-dev. This is being deprecated in favor of vLLM’s docker release pipeline.

Intel XPU

For Intel GPUs, follow the installation instructions in the XPU documentation .

Google TPU

To run vLLM on Google TPUs, install the vllm-tpu package:


uv pip install vllm-tpu

For more detailed instructions, refer to the vLLM on TPU documentation .

CPU Installation

For CPU-only installation:


uv pip install vllm-cpu

Supported CPU architectures:

Intel/AMD x86-64
ARM AArch64
Apple Silicon (M1/M2/M3)
IBM Z (S390X)

Docker Installation

vLLM provides official Docker images:


# NVIDIA GPU
docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-openai:latest \
  --model mistralai/Mistral-7B-v0.1
 
# AMD GPU
docker run --device=/dev/kfd --device=/dev/dri \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
  -p 8000:8000 \
  --ipc=host \
  vllm/vllm-openai:latest \
  --model mistralai/Mistral-7B-v0.1

Building from Source

To build vLLM from source:


git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -e .

For development with editable install:


pip install -e ".[dev]"

Troubleshooting

If you encounter issues during installation:

Check CUDA/ROCm version compatibility
Verify Python version (3.10-3.13)
Check available disk space (models can be large)
Review error logs for specific dependency issues

For more help, visit:

Next Steps

After installation, proceed to the Quickstart Guide to start using vLLM.