Back to Blog

Monitoring Everything: Zabbix 7 in a Homelab

7 min read

If a service is important enough to run, it's important enough to monitor. That principle led me to deploy Zabbix 7 as the backbone of my homelab monitoring — tracking 18 hosts across Proxmox hypervisors, Docker hosts, LXC containers, an Unraid NAS, and UniFi network gear. Here's how it all fits together.

Why Zabbix?#

I evaluated several monitoring platforms before settling on Zabbix:

  • Prometheus + Grafana: Great for metrics but requires significant configuration for each target. No built-in alerting UI.
  • Uptime Kuma: Simple and effective for uptime monitoring, but limited host-level metrics.
  • Netdata: Beautiful real-time dashboards but high resource consumption and weak long-term storage.
  • Zabbix: Enterprise-grade monitoring with auto-discovery, templates, triggers, and built-in alerting. Steeper learning curve, but far more capable at scale.

For 18 hosts with diverse monitoring needs (Linux metrics, SNMP, HTTP checks, Docker containers, custom APIs), Zabbix's template system and agent-based collection was the clear winner.

Infrastructure Overview#

Zabbix Server 7.4.6 runs in a dedicated LXC container (CT 115) on pve02, with PostgreSQL as the backend database.

Monitored Hosts#

CategoryHostsMonitoring Method
Proxmox Hypervisorspve01, pve02, pve03, pbs01SNMP
Docker Hostsdocker01, docker02, docker03Zabbix Agent 2 (PSK)
LXC Containerstraefik, pihole, pihole2, homepage, ollama, zabbixZabbix Agent 2 (PSK)
NetworkUCG UltraSNMP + HTTP API
NASTower (Unraid)Zabbix Agent 2 (Docker, PSK)

That's 18 hosts total, each with its own monitoring profile.

PSK-Encrypted Agent Deployment#

Every Zabbix Agent 2 installation uses Pre-Shared Key (PSK) encryption. In a flat homelab network, this prevents monitoring data from being intercepted in plaintext.

Agent Configuration#

Each host's /etc/zabbix/zabbix_agent2.conf includes:

Server=10.0.1.103
ServerActive=10.0.1.103
Hostname=docker01
TLSConnect=psk
TLSAccept=psk,unencrypted
TLSPSKIdentity=docker01
TLSPSKFile=/etc/zabbix/zabbix_agent2.psk

Key details:

  • PSK Identity: Uses the hostname for easy identification in Zabbix Server
  • Standard PSK: All hosts share a 128-character PSK (generated with openssl rand -hex 64)
  • TLSAccept: Accepts both PSK and unencrypted connections — this is a workaround for a Zabbix Server PSK configuration quirk

ARM64 Agent Limitations#

One exception: pihole2 runs on ARM64 hardware (Raspberry Pi). There are no official Zabbix 7.x packages for ARM64, so it runs Agent 6.0.14 from Debian's repository.

The version mismatch causes issues:

  • Active checks are incompatible between Agent 6.x and Server 7.x
  • Workaround: Use passive checks only with TLSConnect=unencrypted

It's not ideal, but it works. Pi-hole DNS metrics are still collected reliably.

Custom UniFi UCG Ultra Monitoring#

The UniFi Cloud Gateway Ultra (my main router/gateway) doesn't support Zabbix Agent installation. Instead, I monitor it through two channels:

SNMP Monitoring#

Standard SNMP (UDP 161) provides:

  • Interface discovery and traffic statistics
  • System uptime
  • Applied via Zabbix's built-in "Interfaces Simple SNMP" template

HTTP API Monitoring#

For UniFi-specific metrics, I wrote a custom monitoring script that queries the UniFi API:

#!/bin/bash
# /usr/local/bin/unifi_api_monitor.sh
# Queries UniFi API for health metrics

This script runs as UserParameters on the Zabbix LXC (not on the gateway — it has no shell access for custom scripts):

# /etc/zabbix/zabbix_agent2.d/unifi_api.conf
UserParameter=unifi.client.count,/usr/local/bin/unifi_api_monitor.sh clients
UserParameter=unifi.health[*],/usr/local/bin/unifi_api_monitor.sh health $1
UserParameter=unifi.wan[*],/usr/local/bin/unifi_api_monitor.sh wan $1
UserParameter=unifi.firmware.version,/usr/local/bin/unifi_api_monitor.sh firmware

This gives me:

  • Client count: Number of connected wireless and wired clients
  • Subsystem health: WAN, LAN, and WLAN status (ok/unknown/error)
  • WAN metrics: Latency (ms) and uptime (seconds)
  • Firmware version: Current gateway firmware for update tracking

Authentication uses a bearer token stored in /etc/default/zabbix-agent2, keeping credentials out of the script and Zabbix configuration.

Zabbix Agent on Unraid (Docker Container)#

Unraid doesn't support native package installation like Debian-based systems. Instead, Zabbix Agent 2 runs as a Docker container on the Tower NAS:

services:
  zabbix-agent2:
    image: zabbix/zabbix-agent2:latest
    container_name: zabbix-agent2
    network_mode: host
    privileged: true
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /dev:/dev:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      ZBX_SERVER_HOST: "10.0.1.103"
      ZBX_HOSTNAME: "Tower"
      ZBX_TLSACCEPT: "psk,unencrypted"

The container needs host networking, privileged mode, and read-only access to /proc, /sys, and the Docker socket. This provides full system metrics plus Docker container monitoring on the Unraid box.

Proxmox Monitoring via SNMP#

The four Proxmox hosts (pve01, pve02, pve03, pbs01) are monitored via SNMP rather than Zabbix Agent. This keeps the hypervisor layer clean — no extra packages installed on the bare-metal Proxmox hosts.

SNMP configuration uses secure random community strings (not public) and restricts access to the 10.0.1.0/24 subnet.

Grafana Integration#

Zabbix data feeds into Grafana for visualization via the alexanderzobnin-zabbix-app plugin. Grafana runs on docker02 with a direct connection to Zabbix's API.

Dashboards#

Zabbix Native Dashboard — "Homelab Infrastructure Overview" (Dashboard ID: 394):

  • Page 1 (Overview): Honeycomb health map showing all hosts, host navigator, active problems panel, and top triggers
  • Page 2 (Host Details): Click a host to see linked gauge widgets for CPU, memory, and disk utilization, plus a time-series graph

Grafana Dashboard — Zabbix Full Server Status (imported from Grafana.com ID: 5363):

  • Detailed host metrics with historical trends
  • Customized panels for homelab-specific metrics

The combination gives me both quick-glance health checks (Zabbix native) and deep-dive historical analysis (Grafana).

Maintenance Windows#

Monitoring is useless if it generates alerts for expected events. I configured Zabbix maintenance windows to suppress alerts during predictable operations:

WindowScheduleDuration
Daily PBS Backup21:00 EST daily2 hours
Weekly Unraid BackupSunday 01:00 EST4 hours

These windows cover all host groups (Linux servers, hypervisors, VMs, network devices, Zabbix servers, applications, databases). During maintenance, Zabbix still collects data — it just suppresses trigger actions and notifications.

Without these windows, every backup cycle would generate a storm of disk I/O and CPU alerts across the cluster.

Auto-Resolve Informational Alerts#

Some Zabbix triggers fire for informational events (severity 1) like "system configuration has been changed." These are good to know about but shouldn't linger as active problems indefinitely.

A cron job on the Zabbix LXC auto-resolves severity-1 alerts older than 24 hours:

# /etc/cron.d/zabbix-auto-resolve
0 * * * * zabbix /usr/local/bin/auto-resolve-info-alerts.sh

This keeps the active problems list focused on genuine issues that need attention, which also improves the quality of the AI-powered alert analysis.

Telegram Alerting#

Zabbix sends notifications through a Telegram bot for critical alerts. The bot posts to a dedicated homelab alerts channel, providing:

  • Problem notifications when triggers fire
  • Recovery notifications when problems resolve
  • Severity-based formatting (critical alerts are prominent, warnings are subdued)

Combined with the n8n AI analysis workflow, I get two layers of notification:

  1. Immediate: Zabbix → Telegram (raw alert)
  2. Analyzed: n8n → Ollama → Telegram (contextual analysis, hourly batch)

Lessons Learned#

1. Templates Are Your Best Friend#

Zabbix's template system lets you define monitoring profiles once and apply them to multiple hosts. I have templates for Docker hosts, LXC containers, Proxmox nodes, and network devices. When I add a new host, configuration takes minutes.

2. Don't Alert on Everything#

Early on, I had triggers for every possible metric with aggressive thresholds. The result? Alert fatigue. I tuned thresholds based on actual baselines and added maintenance windows for known events. Now alerts are meaningful.

3. SNMP Community Strings Matter#

Never use public or private as SNMP community strings, even on a private network. Generate random strings and restrict access to the monitoring subnet.

4. Monitor the Monitor#

Zabbix itself needs monitoring. I track the Zabbix server's own health — database size, queue length, cache hit ratios — to ensure the monitoring system doesn't become a problem itself.

5. Documentation Prevents Amnesia#

Every custom UserParameter, template modification, and PSK deployment is documented. Six months from now, I won't remember why the UCG Ultra monitoring uses bearer tokens instead of session cookies. The documentation will.

What's Next#

  • Application-level monitoring: Deeper Zabbix templates for individual Docker services (n8n queue depth, Plex transcode sessions)
  • Log monitoring: Integrating Zabbix with log analysis for pattern detection
  • Network flow analysis: Using SNMP flow data for bandwidth usage patterns

For the AI layer that enhances these alerts, see AI-Powered Infrastructure Monitoring. For the complete homelab overview, check out Building My Homelab.

Related Posts