Back to Blog

GPU Passthrough in Proxmox LXC Containers: Plex and Ollama

8 min read

GPU passthrough in LXC containers isn’t as straightforward as it sounds. After plenty of trial and error, I’ve got two NVIDIA GPUs running in Proxmox LXC containers — a GTX 1060 6GB for Plex hardware transcoding and a GTX 1050 Ti 4GB for Ollama LLM inference. Here’s everything I learned along the way.

Why LXC Instead of VMs?

Proxmox supports both VMs and LXC containers. For GPU workloads, you might expect VMs with PCIe passthrough to be the obvious choice. But LXC containers offer some compelling advantages:

  • Lower Overhead: No hypervisor layer between the GPU and the application
  • Shared GPU: The host and container can share the same GPU simultaneously
  • Simpler Management: LXC containers are lighter weight and faster to start
  • Better Integration: Direct access to host kernel and drivers

The trade-off? You need privileged containers and careful cgroup configuration.

The Hardware

HostGPUVRAMDriverKernelUse Case
pve01GTX 1060 6GB6 GB580.126.096.14.11-5-pvePlex transcoding
pve02GTX 1050 Ti 4GB4 GB550.163.016.14.11-5-pveOllama inference

Both GPUs are consumer-grade cards — nothing exotic. The GTX 1060 handles Plex transcoding effortlessly, and the GTX 1050 Ti is enough for smaller LLM models with GPU acceleration.

Step 1: Install NVIDIA Drivers on the Proxmox Host

Before containers can access the GPU, you need working NVIDIA drivers on the Proxmox host itself. I followed the standard approach of installing from NVIDIA’s .run package rather than Debian packages, since Proxmox’s kernel versions don’t always match Debian’s repos.

After installation, verify with nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
|   GPU  Name                  ...                                                        |
|   0    NVIDIA GeForce GTX 1060 6GB                                                      |
+-----------------------------------------------------------------------------------------+

Driver Compatibility Warning

This is the gotcha that cost me hours: NVIDIA driver 550.x requires kernel 6.14.x. When Proxmox updated to kernel 6.17.x, the driver broke completely. I had to pin the kernel at 6.14.11-5-pve until NVIDIA released a compatible driver.

If you see errors about missing kernel modules after a Proxmox update, check your kernel version first. Rolling back to a compatible kernel is usually the fastest fix.

Step 2: Configure Privileged LXC Containers

GPU passthrough requires privileged containers. This is non-negotiable — unprivileged containers can’t access GPU device nodes.

If you’re converting an existing unprivileged container:

# In the Proxmox web UI or via CLI:
pct set <ctid> -unprivileged 0

Container Configuration

The magic happens in /etc/pve/lxc/<ctid>.conf. Here’s what I added for the Plex container (CT 113 on pve01):

# NVIDIA GPU passthrough - cgroup permissions
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 234:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm

# NVIDIA device mounts
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir

# NVIDIA binaries and libraries from host
lxc.mount.entry: /usr/bin/nvidia-smi usr/bin/nvidia-smi none bind,optional,create=file
lxc.mount.entry: /usr/lib/x86_64-linux-gnu/libcuda.so.580.126.09 usr/lib/x86_64-linux-gnu/libcuda.so.1 none bind,optional,create=file
lxc.mount.entry: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.580.126.09 usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 none bind,optional,create=file

Let me break down what each section does:

cgroup2 device permissions (c 195:* rwm, etc.): These grant the container read/write/mknod access to NVIDIA device character nodes. Major number 195 is the NVIDIA control device, 234 is nvidia-uvm (unified virtual memory), and 226 is the DRI (Direct Rendering Infrastructure) subsystem.

Device mounts: Bind-mount the actual GPU device files from the host into the container. The optional flag prevents the container from failing to start if a device is temporarily unavailable.

Library mounts: Bind-mount the host’s NVIDIA libraries directly into the container. This is critical — the container doesn’t need its own NVIDIA driver installation. It shares the host’s driver binaries and libraries. Note the version-specific paths (580.126.09) — these must match your installed driver version exactly.

Step 3: Plex GPU Transcoding

With GPU access configured, enabling hardware transcoding in Plex is straightforward:

  1. Plex SettingsTranscoder
  2. Enable “Use hardware acceleration when available”
  3. Set Hardware transcoding device to the NVIDIA GPU

Verifying Hardware Transcoding

Start a transcode (play something that requires transcoding) and verify GPU usage:

# Inside the Plex container
nvidia-smi

You should see a Plex process using GPU memory. During a 1080p → 480p transcode, I see about 67 MiB of GPU memory allocated.

For programmatic verification, query the Plex API:

curl -s -H 'X-Plex-Token: YOUR_TOKEN' \
  'http://localhost:32400/transcode/sessions'

Look for these indicators in the response:

  • transcodeHwRequested="1" — Hardware was requested
  • transcodeHwEncodingTitle="Nvidia ()" — NVIDIA encoder active
  • speed > 2x — Hardware acceleration confirmed (software is typically 0.5–1x)

My test results: 1080p H.264 → 480p transcode at 2.6x realtime with NVENC encoding confirmed. Without the GPU, that same transcode crawls at 0.8x.

Permission Fix After Privilege Change

If you converted from an unprivileged to privileged container, you’ll likely need to fix file ownership:

pct exec 113 -- chown -R plex:plex /var/lib/plexmediaserver/

Without this, Plex can’t read its own database and configuration files.

Step 4: Ollama GPU Inference

Ollama (CT 114 on pve02) uses the same GPU passthrough approach but with different VRAM considerations. The GTX 1050 Ti has only 4 GB of VRAM, which severely limits model selection.

The VRAM Problem

Not all models fit in 4 GB of VRAM. When a model exceeds available VRAM, Ollama falls back to CPU-only inference. This sounds harmless, but it’s actually dangerous in my setup:

  • pve02 has only 7.7 GB total RAM shared across docker02, Zabbix, Pi-hole, and other services
  • CPU-mode models consume system RAM instead of VRAM
  • Large models + existing workloads = memory exhaustion = system freeze

I learned this the hard way when llama3:8b (4.7 GB on disk, ~5.3 GB in RAM) caused pve02 to freeze solid. No SSH, no console — had to hard-reboot from the Proxmox UI.

Model Selection Guide

ModelDiskRAM LoadedGPU OffloadSafe?
mistral:7b-instruct-q4_04.1 GB~5.5 GB75% GPU / 25% CPUYes (recommended)
starcoder2:3b1.7 GB~2 GB100% GPUYes
llama3:8b4.7 GB~5.3 GB0% GPU (100% CPU)No — causes freezes

Rules of thumb:

  1. Models with disk size ≤ 3.5 GB should fully offload to the 4 GB GPU
  2. mistral:7b-instruct-q4_0 is the sweet spot — mostly GPU-accelerated with manageable CPU spillover
  3. Anything larger than ~4.5 GB on disk will run entirely on CPU and risk memory exhaustion

Keep-Alive Configuration

To prevent the GPU from being occupied by idle models, I set OLLAMA_KEEP_ALIVE=5m in the systemd service. Models unload from VRAM after 5 minutes of inactivity, freeing resources for other workloads.

Lessons Learned

1. Version-Lock Your Kernel

GPU drivers are tightly coupled to kernel versions. A routine apt upgrade can break your GPU setup if it pulls a new kernel. Pin your kernel version or test upgrades on a non-GPU node first.

2. Driver Versions Must Match Everywhere

The library paths in the LXC config (libcuda.so.580.126.09) are version-specific. When you update the host driver, you must update the container config to match. Forget this, and nvidia-smi inside the container will fail silently or report mismatched versions.

3. VRAM Is Your Hard Limit

With consumer GPUs, VRAM is the constraint that matters most. Plan your workloads around it. My 6 GB card handles Plex transcoding with room to spare, but the 4 GB card requires careful model selection for Ollama.

4. Privileged Containers Have Security Implications

Privileged LXC containers have broader access to the host system. Keep them on trusted networks and limit what runs inside them. Don’t put untrusted workloads in a privileged GPU container.

5. Test After Every Host Update

Proxmox updates, kernel upgrades, and driver updates can all break GPU passthrough. After any system update on a GPU host, verify nvidia-smi works both on the host and inside each container before calling it done.

What’s Next

The current setup handles my needs — Plex transcoding and small LLM inference — but I’m watching the market for affordable GPUs with more VRAM. An 8 GB or 12 GB card on pve02 would open up larger language models without the memory-exhaustion risk.

For now, the GTX 1050 Ti running Mistral 7B with partial GPU offload is a solid workaround. It powers the AI-driven Zabbix alert analysis workflow through n8n, providing meaningful analysis of infrastructure problems without breaking the bank.

For a broader view of how these GPUs fit into my overall infrastructure, check out Building My Homelab.