Skip to Content
getting-startedInstallation

Last Updated: 3/9/2026


Installation

vLLM supports the following hardware platforms:

GPU

CPU

  • Intel/AMD x86
  • ARM AArch64
  • Apple silicon
  • IBM Z (S390X)

Hardware Plugins

vLLM supports third-party hardware plugins that live outside the main vllm repository. These follow the Hardware-Pluggable RFC.

A list of all supported hardware can be found on the vllm.ai website . If you want to add new hardware, please contact us on Slack  or Email.

Installation Instructions

NVIDIA CUDA

For NVIDIA GPUs, install vLLM using pip:

# Using uv (recommended) uv venv --python 3.12 --seed source .venv/bin/activate uv pip install vllm --torch-backend=auto

The --torch-backend=auto flag automatically selects the appropriate PyTorch index based on your CUDA driver version.

Alternatively, use conda:

conda create -n myenv python=3.12 -y conda activate myenv pip install --upgrade uv uv pip install vllm --torch-backend=auto

AMD ROCm

For AMD GPUs, install vLLM using uv:

uv venv --python 3.12 --seed source .venv/bin/activate uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/

Requirements:

  • Python 3.12
  • ROCm 7.0
  • glibc >= 2.35

Note: Previously, Docker images were published using AMD’s docker release pipeline at rocm/vllm-dev. This is being deprecated in favor of vLLM’s docker release pipeline.

Intel XPU

For Intel GPUs, follow the installation instructions in the XPU documentation .

Google TPU

To run vLLM on Google TPUs, install the vllm-tpu package:

uv pip install vllm-tpu

For more detailed instructions, refer to the vLLM on TPU documentation .

CPU Installation

For CPU-only installation:

uv pip install vllm-cpu

Supported CPU architectures:

  • Intel/AMD x86-64
  • ARM AArch64
  • Apple Silicon (M1/M2/M3)
  • IBM Z (S390X)

Docker Installation

vLLM provides official Docker images:

# NVIDIA GPU docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HUGGING_FACE_HUB_TOKEN=<secret>" \ -p 8000:8000 \ --ipc=host \ vllm/vllm-openai:latest \ --model mistralai/Mistral-7B-v0.1 # AMD GPU docker run --device=/dev/kfd --device=/dev/dri \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HUGGING_FACE_HUB_TOKEN=<secret>" \ -p 8000:8000 \ --ipc=host \ vllm/vllm-openai:latest \ --model mistralai/Mistral-7B-v0.1

Building from Source

To build vLLM from source:

git clone https://github.com/vllm-project/vllm.git cd vllm pip install -e .

For development with editable install:

pip install -e ".[dev]"

Troubleshooting

If you encounter issues during installation:

  1. Check CUDA/ROCm version compatibility
  2. Verify Python version (3.10-3.13)
  3. Check available disk space (models can be large)
  4. Review error logs for specific dependency issues

For more help, visit:

Next Steps

After installation, proceed to the Quickstart Guide to start using vLLM.