Monitoring Everything: Zabbix 7 in a Homelab
If a service is important enough to run, it's important enough to monitor. That principle led me to deploy Zabbix 7 as the backbone of my homelab monitoring — tracking 18 hosts across Proxmox hypervisors, Docker hosts, LXC containers, an Unraid NAS, and UniFi network gear. Here's how it all fits together.
Why Zabbix?#
I evaluated several monitoring platforms before settling on Zabbix:
- Prometheus + Grafana: Great for metrics but requires significant configuration for each target. No built-in alerting UI.
- Uptime Kuma: Simple and effective for uptime monitoring, but limited host-level metrics.
- Netdata: Beautiful real-time dashboards but high resource consumption and weak long-term storage.
- Zabbix: Enterprise-grade monitoring with auto-discovery, templates, triggers, and built-in alerting. Steeper learning curve, but far more capable at scale.
For 18 hosts with diverse monitoring needs (Linux metrics, SNMP, HTTP checks, Docker containers, custom APIs), Zabbix's template system and agent-based collection was the clear winner.
Infrastructure Overview#
Zabbix Server 7.4.6 runs in a dedicated LXC container (CT 115) on pve02, with PostgreSQL as the backend database.
Monitored Hosts#
| Category | Hosts | Monitoring Method |
|---|---|---|
| Proxmox Hypervisors | pve01, pve02, pve03, pbs01 | SNMP |
| Docker Hosts | docker01, docker02, docker03 | Zabbix Agent 2 (PSK) |
| LXC Containers | traefik, pihole, pihole2, homepage, ollama, zabbix | Zabbix Agent 2 (PSK) |
| Network | UCG Ultra | SNMP + HTTP API |
| NAS | Tower (Unraid) | Zabbix Agent 2 (Docker, PSK) |
That's 18 hosts total, each with its own monitoring profile.
PSK-Encrypted Agent Deployment#
Every Zabbix Agent 2 installation uses Pre-Shared Key (PSK) encryption. In a flat homelab network, this prevents monitoring data from being intercepted in plaintext.
Agent Configuration#
Each host's /etc/zabbix/zabbix_agent2.conf includes:
Server=10.0.1.103
ServerActive=10.0.1.103
Hostname=docker01
TLSConnect=psk
TLSAccept=psk,unencrypted
TLSPSKIdentity=docker01
TLSPSKFile=/etc/zabbix/zabbix_agent2.pskKey details:
- PSK Identity: Uses the hostname for easy identification in Zabbix Server
- Standard PSK: All hosts share a 128-character PSK (generated with
openssl rand -hex 64) - TLSAccept: Accepts both PSK and unencrypted connections — this is a workaround for a Zabbix Server PSK configuration quirk
ARM64 Agent Limitations#
One exception: pihole2 runs on ARM64 hardware (Raspberry Pi). There are no official Zabbix 7.x packages for ARM64, so it runs Agent 6.0.14 from Debian's repository.
The version mismatch causes issues:
- Active checks are incompatible between Agent 6.x and Server 7.x
- Workaround: Use passive checks only with
TLSConnect=unencrypted
It's not ideal, but it works. Pi-hole DNS metrics are still collected reliably.
Custom UniFi UCG Ultra Monitoring#
The UniFi Cloud Gateway Ultra (my main router/gateway) doesn't support Zabbix Agent installation. Instead, I monitor it through two channels:
SNMP Monitoring#
Standard SNMP (UDP 161) provides:
- Interface discovery and traffic statistics
- System uptime
- Applied via Zabbix's built-in "Interfaces Simple SNMP" template
HTTP API Monitoring#
For UniFi-specific metrics, I wrote a custom monitoring script that queries the UniFi API:
#!/bin/bash
# /usr/local/bin/unifi_api_monitor.sh
# Queries UniFi API for health metricsThis script runs as UserParameters on the Zabbix LXC (not on the gateway — it has no shell access for custom scripts):
# /etc/zabbix/zabbix_agent2.d/unifi_api.conf
UserParameter=unifi.client.count,/usr/local/bin/unifi_api_monitor.sh clients
UserParameter=unifi.health[*],/usr/local/bin/unifi_api_monitor.sh health $1
UserParameter=unifi.wan[*],/usr/local/bin/unifi_api_monitor.sh wan $1
UserParameter=unifi.firmware.version,/usr/local/bin/unifi_api_monitor.sh firmwareThis gives me:
- Client count: Number of connected wireless and wired clients
- Subsystem health: WAN, LAN, and WLAN status (ok/unknown/error)
- WAN metrics: Latency (ms) and uptime (seconds)
- Firmware version: Current gateway firmware for update tracking
Authentication uses a bearer token stored in /etc/default/zabbix-agent2, keeping credentials out of the script and Zabbix configuration.
Zabbix Agent on Unraid (Docker Container)#
Unraid doesn't support native package installation like Debian-based systems. Instead, Zabbix Agent 2 runs as a Docker container on the Tower NAS:
services:
zabbix-agent2:
image: zabbix/zabbix-agent2:latest
container_name: zabbix-agent2
network_mode: host
privileged: true
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /dev:/dev:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
ZBX_SERVER_HOST: "10.0.1.103"
ZBX_HOSTNAME: "Tower"
ZBX_TLSACCEPT: "psk,unencrypted"The container needs host networking, privileged mode, and read-only access to /proc, /sys, and the Docker socket. This provides full system metrics plus Docker container monitoring on the Unraid box.
Proxmox Monitoring via SNMP#
The four Proxmox hosts (pve01, pve02, pve03, pbs01) are monitored via SNMP rather than Zabbix Agent. This keeps the hypervisor layer clean — no extra packages installed on the bare-metal Proxmox hosts.
SNMP configuration uses secure random community strings (not public) and restricts access to the 10.0.1.0/24 subnet.
Grafana Integration#
Zabbix data feeds into Grafana for visualization via the alexanderzobnin-zabbix-app plugin. Grafana runs on docker02 with a direct connection to Zabbix's API.
Dashboards#
Zabbix Native Dashboard — "Homelab Infrastructure Overview" (Dashboard ID: 394):
- Page 1 (Overview): Honeycomb health map showing all hosts, host navigator, active problems panel, and top triggers
- Page 2 (Host Details): Click a host to see linked gauge widgets for CPU, memory, and disk utilization, plus a time-series graph
Grafana Dashboard — Zabbix Full Server Status (imported from Grafana.com ID: 5363):
- Detailed host metrics with historical trends
- Customized panels for homelab-specific metrics
The combination gives me both quick-glance health checks (Zabbix native) and deep-dive historical analysis (Grafana).
Maintenance Windows#
Monitoring is useless if it generates alerts for expected events. I configured Zabbix maintenance windows to suppress alerts during predictable operations:
| Window | Schedule | Duration |
|---|---|---|
| Daily PBS Backup | 21:00 EST daily | 2 hours |
| Weekly Unraid Backup | Sunday 01:00 EST | 4 hours |
These windows cover all host groups (Linux servers, hypervisors, VMs, network devices, Zabbix servers, applications, databases). During maintenance, Zabbix still collects data — it just suppresses trigger actions and notifications.
Without these windows, every backup cycle would generate a storm of disk I/O and CPU alerts across the cluster.
Auto-Resolve Informational Alerts#
Some Zabbix triggers fire for informational events (severity 1) like "system configuration has been changed." These are good to know about but shouldn't linger as active problems indefinitely.
A cron job on the Zabbix LXC auto-resolves severity-1 alerts older than 24 hours:
# /etc/cron.d/zabbix-auto-resolve
0 * * * * zabbix /usr/local/bin/auto-resolve-info-alerts.shThis keeps the active problems list focused on genuine issues that need attention, which also improves the quality of the AI-powered alert analysis.
Telegram Alerting#
Zabbix sends notifications through a Telegram bot for critical alerts. The bot posts to a dedicated homelab alerts channel, providing:
- Problem notifications when triggers fire
- Recovery notifications when problems resolve
- Severity-based formatting (critical alerts are prominent, warnings are subdued)
Combined with the n8n AI analysis workflow, I get two layers of notification:
- Immediate: Zabbix → Telegram (raw alert)
- Analyzed: n8n → Ollama → Telegram (contextual analysis, hourly batch)
Lessons Learned#
1. Templates Are Your Best Friend#
Zabbix's template system lets you define monitoring profiles once and apply them to multiple hosts. I have templates for Docker hosts, LXC containers, Proxmox nodes, and network devices. When I add a new host, configuration takes minutes.
2. Don't Alert on Everything#
Early on, I had triggers for every possible metric with aggressive thresholds. The result? Alert fatigue. I tuned thresholds based on actual baselines and added maintenance windows for known events. Now alerts are meaningful.
3. SNMP Community Strings Matter#
Never use public or private as SNMP community strings, even on a private network. Generate random strings and restrict access to the monitoring subnet.
4. Monitor the Monitor#
Zabbix itself needs monitoring. I track the Zabbix server's own health — database size, queue length, cache hit ratios — to ensure the monitoring system doesn't become a problem itself.
5. Documentation Prevents Amnesia#
Every custom UserParameter, template modification, and PSK deployment is documented. Six months from now, I won't remember why the UCG Ultra monitoring uses bearer tokens instead of session cookies. The documentation will.
What's Next#
- Application-level monitoring: Deeper Zabbix templates for individual Docker services (n8n queue depth, Plex transcode sessions)
- Log monitoring: Integrating Zabbix with log analysis for pattern detection
- Network flow analysis: Using SNMP flow data for bandwidth usage patterns
For the AI layer that enhances these alerts, see AI-Powered Infrastructure Monitoring. For the complete homelab overview, check out Building My Homelab.