GPU Passthrough in Proxmox LXC Containers: Plex and Ollama

GPU passthrough in LXC containers isn't as straightforward as it sounds. After plenty of trial and error, I've got two NVIDIA GPUs running in Proxmox LXC containers - a GTX 1060 6GB for Plex hardware transcoding and a GTX 1050 Ti 4GB for Ollama LLM inference. Here's everything I learned along the way.

Why LXC Instead of VMs?#

Proxmox supports both VMs and LXC containers. For GPU workloads, you might expect VMs with PCIe passthrough to be the obvious choice. But LXC containers offer some compelling advantages:

Lower Overhead: No hypervisor layer between the GPU and the application
Shared GPU: The host and container can share the same GPU simultaneously
Simpler Management: LXC containers are lighter weight and faster to start
Better Integration: Direct access to host kernel and drivers

The trade-off? You need privileged containers and careful cgroup configuration.

The Hardware#

Host	GPU	VRAM	Driver	Kernel	Use Case
pve01	GTX 1060 6GB	6 GB	580.126.09	6.14.11-5-pve	Plex transcoding
pve02	GTX 1050 Ti 4GB	4 GB	550.163.01	6.14.11-5-pve	Ollama inference

Both GPUs are consumer-grade cards - nothing exotic. The GTX 1060 handles Plex transcoding effortlessly, and the GTX 1050 Ti is enough for smaller LLM models with GPU acceleration.

Step 1: Install NVIDIA Drivers on the Proxmox Host#

Before containers can access the GPU, you need working NVIDIA drivers on the Proxmox host itself. I followed the standard approach of installing from NVIDIA's .run package rather than Debian packages, since Proxmox's kernel versions don't always match Debian's repos.

After installation, verify with nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
|   GPU  Name                  ...                                                        |
|   0    NVIDIA GeForce GTX 1060 6GB                                                      |
+-----------------------------------------------------------------------------------------+

Driver Compatibility Warning#

This is the gotcha that cost me hours: NVIDIA driver 550.x requires kernel 6.14.x. When Proxmox updated to kernel 6.17.x, the driver broke completely. I had to pin the kernel at 6.14.11-5-pve until NVIDIA released a compatible driver.

If you see errors about missing kernel modules after a Proxmox update, check your kernel version first. Rolling back to a compatible kernel is usually the fastest fix.

Step 2: Configure Privileged LXC Containers#

GPU passthrough requires privileged containers. This is non-negotiable - unprivileged containers can't access GPU device nodes.

If you're converting an existing unprivileged container:

# In the Proxmox web UI or via CLI:
pct set <ctid> -unprivileged 0

Container Configuration#

The magic happens in /etc/pve/lxc/<ctid>.conf. Here's what I added for the Plex container (CT 113 on pve01):

# NVIDIA GPU passthrough - cgroup permissions
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 234:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm
 
# NVIDIA device mounts
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
 
# NVIDIA binaries and libraries from host
lxc.mount.entry: /usr/bin/nvidia-smi usr/bin/nvidia-smi none bind,optional,create=file
lxc.mount.entry: /usr/lib/x86_64-linux-gnu/libcuda.so.580.126.09 usr/lib/x86_64-linux-gnu/libcuda.so.1 none bind,optional,create=file
lxc.mount.entry: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.580.126.09 usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 none bind,optional,create=file

Let me break down what each section does:

cgroup2 device permissions (c 195:* rwm, etc.): These grant the container read/write/mknod access to NVIDIA device character nodes. Major number 195 is the NVIDIA control device, 234 is nvidia-uvm (unified virtual memory), and 226 is the DRI (Direct Rendering Infrastructure) subsystem.

Device mounts: Bind-mount the actual GPU device files from the host into the container. The optional flag prevents the container from failing to start if a device is temporarily unavailable.

Library mounts: Bind-mount the host's NVIDIA libraries directly into the container. This is critical - the container doesn't need its own NVIDIA driver installation. It shares the host's driver binaries and libraries. Note the version-specific paths (580.126.09) - these must match your installed driver version exactly.

Step 3: Plex GPU Transcoding#

With GPU access configured, enabling hardware transcoding in Plex is straightforward:

Plex Settings → Transcoder
Enable "Use hardware acceleration when available"
Set Hardware transcoding device to the NVIDIA GPU

Verifying Hardware Transcoding#

Start a transcode (play something that requires transcoding) and verify GPU usage:

# Inside the Plex container
nvidia-smi

You should see a Plex process using GPU memory. During a 1080p → 480p transcode, I see about 67 MiB of GPU memory allocated.

For programmatic verification, query the Plex API:

curl -s -H 'X-Plex-Token: YOUR_TOKEN' \
  'http://localhost:32400/transcode/sessions'

Look for these indicators in the response:

transcodeHwRequested="1" - Hardware was requested
transcodeHwEncodingTitle="Nvidia ()" - NVIDIA encoder active
speed > 2x - Hardware acceleration confirmed (software is typically 0.5-1x)

My test results: 1080p H.264 → 480p transcode at 2.6x realtime with NVENC encoding confirmed. Without the GPU, that same transcode crawls at 0.8x.

Permission Fix After Privilege Change#

If you converted from an unprivileged to privileged container, you'll likely need to fix file ownership:

pct exec 113 -- chown -R plex:plex /var/lib/plexmediaserver/

Without this, Plex can't read its own database and configuration files.

Step 4: Ollama GPU Inference#

Ollama (CT 114 on pve02) uses the same GPU passthrough approach but with different VRAM considerations. The GTX 1050 Ti has only 4 GB of VRAM, which severely limits model selection.

The VRAM Problem#

Not all models fit in 4 GB of VRAM. When a model exceeds available VRAM, Ollama falls back to CPU-only inference. This sounds harmless, but it's actually dangerous in my setup:

pve02 has only 7.7 GB total RAM shared across docker02, Zabbix, Pi-hole, and other services
CPU-mode models consume system RAM instead of VRAM
Large models + existing workloads = memory exhaustion = system freeze

I learned this the hard way when llama3:8b (4.7 GB on disk, ~5.3 GB in RAM) caused pve02 to freeze solid. No SSH, no console - had to hard-reboot from the Proxmox UI.

Model Selection Guide#

Model	Disk	RAM Loaded	GPU Offload	Safe?
`mistral:7b-instruct-q4_0`	4.1 GB	~5.5 GB	75% GPU / 25% CPU	Yes (recommended)
`starcoder2:3b`	1.7 GB	~2 GB	100% GPU	Yes
`llama3:8b`	4.7 GB	~5.3 GB	0% GPU (100% CPU)	No - causes freezes

Rules of thumb:

Models with disk size ≤ 3.5 GB should fully offload to the 4 GB GPU
mistral:7b-instruct-q4_0 is the sweet spot - mostly GPU-accelerated with manageable CPU spillover
Anything larger than ~4.5 GB on disk will run entirely on CPU and risk memory exhaustion

Keep-Alive Configuration#

To prevent the GPU from being occupied by idle models, I set OLLAMA_KEEP_ALIVE=5m in the systemd service. Models unload from VRAM after 5 minutes of inactivity, freeing resources for other workloads.

Lessons Learned#

1. Version-Lock Your Kernel#

GPU drivers are tightly coupled to kernel versions. A routine apt upgrade can break your GPU setup if it pulls a new kernel. Pin your kernel version or test upgrades on a non-GPU node first.

2. Driver Versions Must Match Everywhere#

The library paths in the LXC config (libcuda.so.580.126.09) are version-specific. When you update the host driver, you must update the container config to match. Forget this, and nvidia-smi inside the container will fail silently or report mismatched versions.

3. VRAM Is Your Hard Limit#

With consumer GPUs, VRAM is the constraint that matters most. Plan your workloads around it. My 6 GB card handles Plex transcoding with room to spare, but the 4 GB card requires careful model selection for Ollama.

4. Privileged Containers Have Security Implications#

Privileged LXC containers have broader access to the host system. Keep them on trusted networks and limit what runs inside them. Don't put untrusted workloads in a privileged GPU container.

5. Test After Every Host Update#

Proxmox updates, kernel upgrades, and driver updates can all break GPU passthrough. After any system update on a GPU host, verify nvidia-smi works both on the host and inside each container before calling it done.

What's Next#

The current setup handles my needs - Plex transcoding and small LLM inference - but I'm watching the market for affordable GPUs with more VRAM. An 8 GB or 12 GB card on pve02 would open up larger language models without the memory-exhaustion risk.

For now, the GTX 1050 Ti running Mistral 7B with partial GPU offload is a solid workaround. It powers the AI-driven Zabbix alert analysis workflow through n8n, providing meaningful analysis of infrastructure problems without breaking the bank.

For a broader view of how these GPUs fit into my overall infrastructure, check out Building My Homelab.