Monitoring Everything: Zabbix 7 in a Homelab

If a service is important enough to run, it's important enough to monitor. That principle led me to deploy Zabbix 7 as the backbone of my homelab monitoring - tracking 18 hosts across Proxmox hypervisors, Docker hosts, LXC containers, an Unraid NAS, and UniFi network gear. Here's how it all fits together.

Why Zabbix?#

I evaluated several monitoring platforms before settling on Zabbix:

Prometheus + Grafana: Great for metrics but requires significant configuration for each target. No built-in alerting UI.
Uptime Kuma: Simple and effective for uptime monitoring, but limited host-level metrics.
Netdata: Beautiful real-time dashboards but high resource consumption and weak long-term storage.
Zabbix: Enterprise-grade monitoring with auto-discovery, templates, triggers, and built-in alerting. Steeper learning curve, but far more capable at scale.

For 18 hosts with diverse monitoring needs (Linux metrics, SNMP, HTTP checks, Docker containers, custom APIs), Zabbix's template system and agent-based collection was the clear winner.

Infrastructure Overview#

Zabbix Server 7.4.6 runs in a dedicated LXC container (CT 115) on pve02, with PostgreSQL as the backend database.

Monitored Hosts#

Category	Hosts	Monitoring Method
Proxmox Hypervisors	pve01, pve02, pve03, pbs01	SNMP
Docker Hosts	docker01, docker02, docker03	Zabbix Agent 2 (PSK)
LXC Containers	traefik, pihole, pihole2, homepage, ollama, zabbix	Zabbix Agent 2 (PSK)
Network	UCG Ultra	SNMP + HTTP API
NAS	Tower (Unraid)	Zabbix Agent 2 (Docker, PSK)

That's 18 hosts total, each with its own monitoring profile.

PSK-Encrypted Agent Deployment#

Every Zabbix Agent 2 installation uses Pre-Shared Key (PSK) encryption. In a flat homelab network, this prevents monitoring data from being intercepted in plaintext.

Agent Configuration#

Each host's /etc/zabbix/zabbix_agent2.conf includes:

Server=10.0.1.103
ServerActive=10.0.1.103
Hostname=docker01
TLSConnect=psk
TLSAccept=psk,unencrypted
TLSPSKIdentity=docker01
TLSPSKFile=/etc/zabbix/zabbix_agent2.psk

Key details:

PSK Identity: Uses the hostname for easy identification in Zabbix Server
Standard PSK: All hosts share a 128-character PSK (generated with openssl rand -hex 64)
TLSAccept: Accepts both PSK and unencrypted connections - this is a workaround for a Zabbix Server PSK configuration quirk

ARM64 Agent Limitations#

One exception: pihole2 runs on ARM64 hardware (Raspberry Pi). There are no official Zabbix 7.x packages for ARM64, so it runs Agent 6.0.14 from Debian's repository.

The version mismatch causes issues:

Active checks are incompatible between Agent 6.x and Server 7.x
Workaround: Use passive checks only with TLSConnect=unencrypted

It's not ideal, but it works. Pi-hole DNS metrics are still collected reliably.

Custom UniFi UCG Ultra Monitoring#

The UniFi Cloud Gateway Ultra (my main router/gateway) doesn't support Zabbix Agent installation. Instead, I monitor it through two channels:

SNMP Monitoring#

Standard SNMP (UDP 161) provides:

Interface discovery and traffic statistics
System uptime
Applied via Zabbix's built-in "Interfaces Simple SNMP" template

HTTP API Monitoring#

For UniFi-specific metrics, I wrote a custom monitoring script that queries the UniFi API:

#!/bin/bash
# /usr/local/bin/unifi_api_monitor.sh
# Queries UniFi API for health metrics

This script runs as UserParameters on the Zabbix LXC (not on the gateway - it has no shell access for custom scripts):

# /etc/zabbix/zabbix_agent2.d/unifi_api.conf
UserParameter=unifi.client.count,/usr/local/bin/unifi_api_monitor.sh clients
UserParameter=unifi.health[*],/usr/local/bin/unifi_api_monitor.sh health $1
UserParameter=unifi.wan[*],/usr/local/bin/unifi_api_monitor.sh wan $1
UserParameter=unifi.firmware.version,/usr/local/bin/unifi_api_monitor.sh firmware

This gives me:

Client count: Number of connected wireless and wired clients
Subsystem health: WAN, LAN, and WLAN status (ok/unknown/error)
WAN metrics: Latency (ms) and uptime (seconds)
Firmware version: Current gateway firmware for update tracking

Authentication uses a bearer token stored in /etc/default/zabbix-agent2, keeping credentials out of the script and Zabbix configuration.

Zabbix Agent on Unraid (Docker Container)#

Unraid doesn't support native package installation like Debian-based systems. Instead, Zabbix Agent 2 runs as a Docker container on the Tower NAS:

services:
  zabbix-agent2:
    image: zabbix/zabbix-agent2:latest
    container_name: zabbix-agent2
    network_mode: host
    privileged: true
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /dev:/dev:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      ZBX_SERVER_HOST: "10.0.1.103"
      ZBX_HOSTNAME: "Tower"
      ZBX_TLSACCEPT: "psk,unencrypted"

The container needs host networking, privileged mode, and read-only access to /proc, /sys, and the Docker socket. This provides full system metrics plus Docker container monitoring on the Unraid box.

Proxmox Monitoring via SNMP#

The four Proxmox hosts (pve01, pve02, pve03, pbs01) are monitored via SNMP rather than Zabbix Agent. This keeps the hypervisor layer clean - no extra packages installed on the bare-metal Proxmox hosts.

SNMP configuration uses secure random community strings (not public) and restricts access to the 10.0.1.0/24 subnet.

Grafana Integration#

Zabbix data feeds into Grafana for visualization via the alexanderzobnin-zabbix-app plugin. Grafana runs on docker02 with a direct connection to Zabbix's API.

Dashboards#

Zabbix Native Dashboard - "Homelab Infrastructure Overview" (Dashboard ID: 394):

Page 1 (Overview): Honeycomb health map showing all hosts, host navigator, active problems panel, and top triggers
Page 2 (Host Details): Click a host to see linked gauge widgets for CPU, memory, and disk utilization, plus a time-series graph

Grafana Dashboard - Zabbix Full Server Status (imported from Grafana.com ID: 5363):

Detailed host metrics with historical trends
Customized panels for homelab-specific metrics

The combination gives me both quick-glance health checks (Zabbix native) and deep-dive historical analysis (Grafana).

Maintenance Windows#

Monitoring is useless if it generates alerts for expected events. I configured Zabbix maintenance windows to suppress alerts during predictable operations:

Window	Schedule	Duration
Daily PBS Backup	21:00 EST daily	2 hours
Weekly Unraid Backup	Sunday 01:00 EST	4 hours

These windows cover all host groups (Linux servers, hypervisors, VMs, network devices, Zabbix servers, applications, databases). During maintenance, Zabbix still collects data - it just suppresses trigger actions and notifications.

Without these windows, every backup cycle would generate a storm of disk I/O and CPU alerts across the cluster.

Auto-Resolve Informational Alerts#

Some Zabbix triggers fire for informational events (severity 1) like "system configuration has been changed." These are good to know about but shouldn't linger as active problems indefinitely.

A cron job on the Zabbix LXC auto-resolves severity-1 alerts older than 24 hours:

# /etc/cron.d/zabbix-auto-resolve
0 * * * * zabbix /usr/local/bin/auto-resolve-info-alerts.sh

This keeps the active problems list focused on genuine issues that need attention, which also improves the quality of the AI-powered alert analysis.

Telegram Alerting#

Zabbix sends notifications through a Telegram bot for critical alerts. The bot posts to a dedicated homelab alerts channel, providing:

Problem notifications when triggers fire
Recovery notifications when problems resolve
Severity-based formatting (critical alerts are prominent, warnings are subdued)

Combined with the n8n AI analysis workflow, I get two layers of notification:

Immediate: Zabbix → Telegram (raw alert)
Analyzed: n8n → Ollama → Telegram (contextual analysis, hourly batch)

Lessons Learned#

1. Templates Are Your Best Friend#

Zabbix's template system lets you define monitoring profiles once and apply them to multiple hosts. I have templates for Docker hosts, LXC containers, Proxmox nodes, and network devices. When I add a new host, configuration takes minutes.

2. Don't Alert on Everything#

Early on, I had triggers for every possible metric with aggressive thresholds. The result? Alert fatigue. I tuned thresholds based on actual baselines and added maintenance windows for known events. Now alerts are meaningful.

3. SNMP Community Strings Matter#

Never use public or private as SNMP community strings, even on a private network. Generate random strings and restrict access to the monitoring subnet.

4. Monitor the Monitor#

Zabbix itself needs monitoring. I track the Zabbix server's own health - database size, queue length, cache hit ratios - to ensure the monitoring system doesn't become a problem itself.

5. Documentation Prevents Amnesia#

Every custom UserParameter, template modification, and PSK deployment is documented. Six months from now, I won't remember why the UCG Ultra monitoring uses bearer tokens instead of session cookies. The documentation will.

What's Next#

Application-level monitoring: Deeper Zabbix templates for individual Docker services (n8n queue depth, Plex transcode sessions)
Log monitoring: Integrating Zabbix with log analysis for pattern detection
Network flow analysis: Using SNMP flow data for bandwidth usage patterns

For the AI layer that enhances these alerts, see AI-Powered Infrastructure Monitoring. For the complete homelab overview, check out Building My Homelab.