Back to Blog

Monitoring Everything: Zabbix 7 in a Homelab

7 min read

If a service is important enough to run, it’s important enough to monitor. That principle led me to deploy Zabbix 7 as the backbone of my homelab monitoring — tracking 18 hosts across Proxmox hypervisors, Docker hosts, LXC containers, an Unraid NAS, and UniFi network gear. Here’s how it all fits together.

Why Zabbix?

I evaluated several monitoring platforms before settling on Zabbix:

  • Prometheus + Grafana: Great for metrics but requires significant configuration for each target. No built-in alerting UI.
  • Uptime Kuma: Simple and effective for uptime monitoring, but limited host-level metrics.
  • Netdata: Beautiful real-time dashboards but high resource consumption and weak long-term storage.
  • Zabbix: Enterprise-grade monitoring with auto-discovery, templates, triggers, and built-in alerting. Steeper learning curve, but far more capable at scale.

For 18 hosts with diverse monitoring needs (Linux metrics, SNMP, HTTP checks, Docker containers, custom APIs), Zabbix’s template system and agent-based collection was the clear winner.

Infrastructure Overview

Zabbix Server 7.4.6 runs in a dedicated LXC container (CT 115) on pve02, with PostgreSQL as the backend database.

Monitored Hosts

CategoryHostsMonitoring Method
Proxmox Hypervisorspve01, pve02, pve03, pbs01SNMP
Docker Hostsdocker01, docker02, docker03Zabbix Agent 2 (PSK)
LXC Containerstraefik, pihole, pihole2, homepage, ollama, zabbixZabbix Agent 2 (PSK)
NetworkUCG UltraSNMP + HTTP API
NASTower (Unraid)Zabbix Agent 2 (Docker, PSK)

That’s 18 hosts total, each with its own monitoring profile.

PSK-Encrypted Agent Deployment

Every Zabbix Agent 2 installation uses Pre-Shared Key (PSK) encryption. In a flat homelab network, this prevents monitoring data from being intercepted in plaintext.

Agent Configuration

Each host’s /etc/zabbix/zabbix_agent2.conf includes:

Server=10.0.1.103
ServerActive=10.0.1.103
Hostname=docker01
TLSConnect=psk
TLSAccept=psk,unencrypted
TLSPSKIdentity=docker01
TLSPSKFile=/etc/zabbix/zabbix_agent2.psk

Key details:

  • PSK Identity: Uses the hostname for easy identification in Zabbix Server
  • Standard PSK: All hosts share a 128-character PSK (generated with openssl rand -hex 64)
  • TLSAccept: Accepts both PSK and unencrypted connections — this is a workaround for a Zabbix Server PSK configuration quirk

ARM64 Agent Limitations

One exception: pihole2 runs on ARM64 hardware (Raspberry Pi). There are no official Zabbix 7.x packages for ARM64, so it runs Agent 6.0.14 from Debian’s repository.

The version mismatch causes issues:

  • Active checks are incompatible between Agent 6.x and Server 7.x
  • Workaround: Use passive checks only with TLSConnect=unencrypted

It’s not ideal, but it works. Pi-hole DNS metrics are still collected reliably.

Custom UniFi UCG Ultra Monitoring

The UniFi Cloud Gateway Ultra (my main router/gateway) doesn’t support Zabbix Agent installation. Instead, I monitor it through two channels:

SNMP Monitoring

Standard SNMP (UDP 161) provides:

  • Interface discovery and traffic statistics
  • System uptime
  • Applied via Zabbix’s built-in “Interfaces Simple SNMP” template

HTTP API Monitoring

For UniFi-specific metrics, I wrote a custom monitoring script that queries the UniFi API:

#!/bin/bash
# /usr/local/bin/unifi_api_monitor.sh
# Queries UniFi API for health metrics

This script runs as UserParameters on the Zabbix LXC (not on the gateway — it has no shell access for custom scripts):

# /etc/zabbix/zabbix_agent2.d/unifi_api.conf
UserParameter=unifi.client.count,/usr/local/bin/unifi_api_monitor.sh clients
UserParameter=unifi.health[*],/usr/local/bin/unifi_api_monitor.sh health $1
UserParameter=unifi.wan[*],/usr/local/bin/unifi_api_monitor.sh wan $1
UserParameter=unifi.firmware.version,/usr/local/bin/unifi_api_monitor.sh firmware

This gives me:

  • Client count: Number of connected wireless and wired clients
  • Subsystem health: WAN, LAN, and WLAN status (ok/unknown/error)
  • WAN metrics: Latency (ms) and uptime (seconds)
  • Firmware version: Current gateway firmware for update tracking

Authentication uses a bearer token stored in /etc/default/zabbix-agent2, keeping credentials out of the script and Zabbix configuration.

Zabbix Agent on Unraid (Docker Container)

Unraid doesn’t support native package installation like Debian-based systems. Instead, Zabbix Agent 2 runs as a Docker container on the Tower NAS:

services:
  zabbix-agent2:
    image: zabbix/zabbix-agent2:latest
    container_name: zabbix-agent2
    network_mode: host
    privileged: true
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /dev:/dev:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      ZBX_SERVER_HOST: "10.0.1.103"
      ZBX_HOSTNAME: "Tower"
      ZBX_TLSACCEPT: "psk,unencrypted"

The container needs host networking, privileged mode, and read-only access to /proc, /sys, and the Docker socket. This provides full system metrics plus Docker container monitoring on the Unraid box.

Proxmox Monitoring via SNMP

The four Proxmox hosts (pve01, pve02, pve03, pbs01) are monitored via SNMP rather than Zabbix Agent. This keeps the hypervisor layer clean — no extra packages installed on the bare-metal Proxmox hosts.

SNMP configuration uses secure random community strings (not public) and restricts access to the 10.0.1.0/24 subnet.

Grafana Integration

Zabbix data feeds into Grafana for visualization via the alexanderzobnin-zabbix-app plugin. Grafana runs on docker02 with a direct connection to Zabbix’s API.

Dashboards

Zabbix Native Dashboard — “Homelab Infrastructure Overview” (Dashboard ID: 394):

  • Page 1 (Overview): Honeycomb health map showing all hosts, host navigator, active problems panel, and top triggers
  • Page 2 (Host Details): Click a host to see linked gauge widgets for CPU, memory, and disk utilization, plus a time-series graph

Grafana Dashboard — Zabbix Full Server Status (imported from Grafana.com ID: 5363):

  • Detailed host metrics with historical trends
  • Customized panels for homelab-specific metrics

The combination gives me both quick-glance health checks (Zabbix native) and deep-dive historical analysis (Grafana).

Maintenance Windows

Monitoring is useless if it generates alerts for expected events. I configured Zabbix maintenance windows to suppress alerts during predictable operations:

WindowScheduleDuration
Daily PBS Backup21:00 EST daily2 hours
Weekly Unraid BackupSunday 01:00 EST4 hours

These windows cover all host groups (Linux servers, hypervisors, VMs, network devices, Zabbix servers, applications, databases). During maintenance, Zabbix still collects data — it just suppresses trigger actions and notifications.

Without these windows, every backup cycle would generate a storm of disk I/O and CPU alerts across the cluster.

Auto-Resolve Informational Alerts

Some Zabbix triggers fire for informational events (severity 1) like “system configuration has been changed.” These are good to know about but shouldn’t linger as active problems indefinitely.

A cron job on the Zabbix LXC auto-resolves severity-1 alerts older than 24 hours:

# /etc/cron.d/zabbix-auto-resolve
0 * * * * zabbix /usr/local/bin/auto-resolve-info-alerts.sh

This keeps the active problems list focused on genuine issues that need attention, which also improves the quality of the AI-powered alert analysis.

Telegram Alerting

Zabbix sends notifications through a Telegram bot for critical alerts. The bot posts to a dedicated homelab alerts channel, providing:

  • Problem notifications when triggers fire
  • Recovery notifications when problems resolve
  • Severity-based formatting (critical alerts are prominent, warnings are subdued)

Combined with the n8n AI analysis workflow, I get two layers of notification:

  1. Immediate: Zabbix → Telegram (raw alert)
  2. Analyzed: n8n → Ollama → Telegram (contextual analysis, hourly batch)

Lessons Learned

1. Templates Are Your Best Friend

Zabbix’s template system lets you define monitoring profiles once and apply them to multiple hosts. I have templates for Docker hosts, LXC containers, Proxmox nodes, and network devices. When I add a new host, configuration takes minutes.

2. Don’t Alert on Everything

Early on, I had triggers for every possible metric with aggressive thresholds. The result? Alert fatigue. I tuned thresholds based on actual baselines and added maintenance windows for known events. Now alerts are meaningful.

3. SNMP Community Strings Matter

Never use public or private as SNMP community strings, even on a private network. Generate random strings and restrict access to the monitoring subnet.

4. Monitor the Monitor

Zabbix itself needs monitoring. I track the Zabbix server’s own health — database size, queue length, cache hit ratios — to ensure the monitoring system doesn’t become a problem itself.

5. Documentation Prevents Amnesia

Every custom UserParameter, template modification, and PSK deployment is documented. Six months from now, I won’t remember why the UCG Ultra monitoring uses bearer tokens instead of session cookies. The documentation will.

What’s Next

  • Application-level monitoring: Deeper Zabbix templates for individual Docker services (n8n queue depth, Plex transcode sessions)
  • Log monitoring: Integrating Zabbix with log analysis for pattern detection
  • Network flow analysis: Using SNMP flow data for bandwidth usage patterns

For the AI layer that enhances these alerts, see AI-Powered Infrastructure Monitoring. For the complete homelab overview, check out Building My Homelab.