Monitoring Everything: Zabbix 7 in a Homelab
If a service is important enough to run, it’s important enough to monitor. That principle led me to deploy Zabbix 7 as the backbone of my homelab monitoring — tracking 18 hosts across Proxmox hypervisors, Docker hosts, LXC containers, an Unraid NAS, and UniFi network gear. Here’s how it all fits together.
Why Zabbix?
I evaluated several monitoring platforms before settling on Zabbix:
- Prometheus + Grafana: Great for metrics but requires significant configuration for each target. No built-in alerting UI.
- Uptime Kuma: Simple and effective for uptime monitoring, but limited host-level metrics.
- Netdata: Beautiful real-time dashboards but high resource consumption and weak long-term storage.
- Zabbix: Enterprise-grade monitoring with auto-discovery, templates, triggers, and built-in alerting. Steeper learning curve, but far more capable at scale.
For 18 hosts with diverse monitoring needs (Linux metrics, SNMP, HTTP checks, Docker containers, custom APIs), Zabbix’s template system and agent-based collection was the clear winner.
Infrastructure Overview
Zabbix Server 7.4.6 runs in a dedicated LXC container (CT 115) on pve02, with PostgreSQL as the backend database.
Monitored Hosts
| Category | Hosts | Monitoring Method |
|---|---|---|
| Proxmox Hypervisors | pve01, pve02, pve03, pbs01 | SNMP |
| Docker Hosts | docker01, docker02, docker03 | Zabbix Agent 2 (PSK) |
| LXC Containers | traefik, pihole, pihole2, homepage, ollama, zabbix | Zabbix Agent 2 (PSK) |
| Network | UCG Ultra | SNMP + HTTP API |
| NAS | Tower (Unraid) | Zabbix Agent 2 (Docker, PSK) |
That’s 18 hosts total, each with its own monitoring profile.
PSK-Encrypted Agent Deployment
Every Zabbix Agent 2 installation uses Pre-Shared Key (PSK) encryption. In a flat homelab network, this prevents monitoring data from being intercepted in plaintext.
Agent Configuration
Each host’s /etc/zabbix/zabbix_agent2.conf includes:
Server=10.0.1.103
ServerActive=10.0.1.103
Hostname=docker01
TLSConnect=psk
TLSAccept=psk,unencrypted
TLSPSKIdentity=docker01
TLSPSKFile=/etc/zabbix/zabbix_agent2.psk
Key details:
- PSK Identity: Uses the hostname for easy identification in Zabbix Server
- Standard PSK: All hosts share a 128-character PSK (generated with
openssl rand -hex 64) - TLSAccept: Accepts both PSK and unencrypted connections — this is a workaround for a Zabbix Server PSK configuration quirk
ARM64 Agent Limitations
One exception: pihole2 runs on ARM64 hardware (Raspberry Pi). There are no official Zabbix 7.x packages for ARM64, so it runs Agent 6.0.14 from Debian’s repository.
The version mismatch causes issues:
- Active checks are incompatible between Agent 6.x and Server 7.x
- Workaround: Use passive checks only with
TLSConnect=unencrypted
It’s not ideal, but it works. Pi-hole DNS metrics are still collected reliably.
Custom UniFi UCG Ultra Monitoring
The UniFi Cloud Gateway Ultra (my main router/gateway) doesn’t support Zabbix Agent installation. Instead, I monitor it through two channels:
SNMP Monitoring
Standard SNMP (UDP 161) provides:
- Interface discovery and traffic statistics
- System uptime
- Applied via Zabbix’s built-in “Interfaces Simple SNMP” template
HTTP API Monitoring
For UniFi-specific metrics, I wrote a custom monitoring script that queries the UniFi API:
#!/bin/bash
# /usr/local/bin/unifi_api_monitor.sh
# Queries UniFi API for health metrics
This script runs as UserParameters on the Zabbix LXC (not on the gateway — it has no shell access for custom scripts):
# /etc/zabbix/zabbix_agent2.d/unifi_api.conf
UserParameter=unifi.client.count,/usr/local/bin/unifi_api_monitor.sh clients
UserParameter=unifi.health[*],/usr/local/bin/unifi_api_monitor.sh health $1
UserParameter=unifi.wan[*],/usr/local/bin/unifi_api_monitor.sh wan $1
UserParameter=unifi.firmware.version,/usr/local/bin/unifi_api_monitor.sh firmware
This gives me:
- Client count: Number of connected wireless and wired clients
- Subsystem health: WAN, LAN, and WLAN status (ok/unknown/error)
- WAN metrics: Latency (ms) and uptime (seconds)
- Firmware version: Current gateway firmware for update tracking
Authentication uses a bearer token stored in /etc/default/zabbix-agent2, keeping credentials out of the script and Zabbix configuration.
Zabbix Agent on Unraid (Docker Container)
Unraid doesn’t support native package installation like Debian-based systems. Instead, Zabbix Agent 2 runs as a Docker container on the Tower NAS:
services:
zabbix-agent2:
image: zabbix/zabbix-agent2:latest
container_name: zabbix-agent2
network_mode: host
privileged: true
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /dev:/dev:ro
- /var/run/docker.sock:/var/run/docker.sock:ro
environment:
ZBX_SERVER_HOST: "10.0.1.103"
ZBX_HOSTNAME: "Tower"
ZBX_TLSACCEPT: "psk,unencrypted"
The container needs host networking, privileged mode, and read-only access to /proc, /sys, and the Docker socket. This provides full system metrics plus Docker container monitoring on the Unraid box.
Proxmox Monitoring via SNMP
The four Proxmox hosts (pve01, pve02, pve03, pbs01) are monitored via SNMP rather than Zabbix Agent. This keeps the hypervisor layer clean — no extra packages installed on the bare-metal Proxmox hosts.
SNMP configuration uses secure random community strings (not public) and restricts access to the 10.0.1.0/24 subnet.
Grafana Integration
Zabbix data feeds into Grafana for visualization via the alexanderzobnin-zabbix-app plugin. Grafana runs on docker02 with a direct connection to Zabbix’s API.
Dashboards
Zabbix Native Dashboard — “Homelab Infrastructure Overview” (Dashboard ID: 394):
- Page 1 (Overview): Honeycomb health map showing all hosts, host navigator, active problems panel, and top triggers
- Page 2 (Host Details): Click a host to see linked gauge widgets for CPU, memory, and disk utilization, plus a time-series graph
Grafana Dashboard — Zabbix Full Server Status (imported from Grafana.com ID: 5363):
- Detailed host metrics with historical trends
- Customized panels for homelab-specific metrics
The combination gives me both quick-glance health checks (Zabbix native) and deep-dive historical analysis (Grafana).
Maintenance Windows
Monitoring is useless if it generates alerts for expected events. I configured Zabbix maintenance windows to suppress alerts during predictable operations:
| Window | Schedule | Duration |
|---|---|---|
| Daily PBS Backup | 21:00 EST daily | 2 hours |
| Weekly Unraid Backup | Sunday 01:00 EST | 4 hours |
These windows cover all host groups (Linux servers, hypervisors, VMs, network devices, Zabbix servers, applications, databases). During maintenance, Zabbix still collects data — it just suppresses trigger actions and notifications.
Without these windows, every backup cycle would generate a storm of disk I/O and CPU alerts across the cluster.
Auto-Resolve Informational Alerts
Some Zabbix triggers fire for informational events (severity 1) like “system configuration has been changed.” These are good to know about but shouldn’t linger as active problems indefinitely.
A cron job on the Zabbix LXC auto-resolves severity-1 alerts older than 24 hours:
# /etc/cron.d/zabbix-auto-resolve
0 * * * * zabbix /usr/local/bin/auto-resolve-info-alerts.sh
This keeps the active problems list focused on genuine issues that need attention, which also improves the quality of the AI-powered alert analysis.
Telegram Alerting
Zabbix sends notifications through a Telegram bot for critical alerts. The bot posts to a dedicated homelab alerts channel, providing:
- Problem notifications when triggers fire
- Recovery notifications when problems resolve
- Severity-based formatting (critical alerts are prominent, warnings are subdued)
Combined with the n8n AI analysis workflow, I get two layers of notification:
- Immediate: Zabbix → Telegram (raw alert)
- Analyzed: n8n → Ollama → Telegram (contextual analysis, hourly batch)
Lessons Learned
1. Templates Are Your Best Friend
Zabbix’s template system lets you define monitoring profiles once and apply them to multiple hosts. I have templates for Docker hosts, LXC containers, Proxmox nodes, and network devices. When I add a new host, configuration takes minutes.
2. Don’t Alert on Everything
Early on, I had triggers for every possible metric with aggressive thresholds. The result? Alert fatigue. I tuned thresholds based on actual baselines and added maintenance windows for known events. Now alerts are meaningful.
3. SNMP Community Strings Matter
Never use public or private as SNMP community strings, even on a private network. Generate random strings and restrict access to the monitoring subnet.
4. Monitor the Monitor
Zabbix itself needs monitoring. I track the Zabbix server’s own health — database size, queue length, cache hit ratios — to ensure the monitoring system doesn’t become a problem itself.
5. Documentation Prevents Amnesia
Every custom UserParameter, template modification, and PSK deployment is documented. Six months from now, I won’t remember why the UCG Ultra monitoring uses bearer tokens instead of session cookies. The documentation will.
What’s Next
- Application-level monitoring: Deeper Zabbix templates for individual Docker services (n8n queue depth, Plex transcode sessions)
- Log monitoring: Integrating Zabbix with log analysis for pattern detection
- Network flow analysis: Using SNMP flow data for bandwidth usage patterns
For the AI layer that enhances these alerts, see AI-Powered Infrastructure Monitoring. For the complete homelab overview, check out Building My Homelab.