Architecture overview¶
A small homelab that runs like a real one, with GitOps, observability, and a controlled blast radius.
Layout¶
Internet
Cloudflare (DNS, proxy for select names)
│
▼
LAN 192.168.1.0/24
NPM (TLS termination, reverse proxy)
│
▼
MetalLB pool 192.168.1.225-240
│
▼
Kubernetes cluster (kubeadm, Calico, ArgoCD)
apiserver VIP 192.168.1.20:6443
k8cluster1 192.168.1.90 ── pve1
k8cluster2 192.168.1.89 ── pve1
k8cluster3 192.168.1.91 ── pve2
│ pod->external = SNAT to node IP
▼
Postgres 16 VM 192.168.1.123 ── pve2
Uptime Kuma VM 192.168.1.129 ── pve2 (idle, see decisions)
Proxmox pve1 192.168.1.10
Proxmox pve2 192.168.1.11
A higher-fidelity diagram and screenshots live in Visuals.
Hosts and roles¶
| Host | IP | Role |
|---|---|---|
| pve1 | 192.168.1.10 | Proxmox node, hosts most VMs |
| pve2 | 192.168.1.11 | Proxmox node, hosts the rest |
| K8s API | 192.168.1.20 | kube-apiserver VIP |
| k8cluster1 | 192.168.1.90 | K8s worker / control plane |
| k8cluster2 | 192.168.1.89 | K8s worker / control plane |
| k8cluster3 | 192.168.1.91 | K8s worker / control plane |
| postgresql | 192.168.1.123 | Postgres 16, shared backend for Grafana, claude-bridge, and other stateful apps |
| uptime-kuma | 192.168.1.129 | Black-box monitoring VM (currently idle, see decisions) |
| NPM | LAN | nginx-proxy-manager, TLS termination and routing for *.herro.me |
Kubernetes cluster¶
- 3 nodes, kubeadm-built, single control-plane endpoint at
192.168.1.20:6443. - Calico CNI, pod CIDR
10.244.0.0/16. - MetalLB in L2 mode announcing from the pool
192.168.1.225-240(see ADR 0001). - ArgoCD reconciles the cluster from Git. App-of-apps pattern, source repo
k8s-argocd. - Stakater Reloader watches ConfigMaps and Secrets and rolls Deployments when they change.
- External Secrets Operator pulls credentials from outside the cluster so secrets never live in Git.
What runs where¶
| Namespace | What | Notes |
|---|---|---|
argocd |
ArgoCD HA + Redis | HA mode, redis-ha-haproxy |
monitoring |
kube-prometheus-stack, Loki, Vector, Alertmanager, Grafana | Discord notification proxy lives here |
metallb-system |
MetalLB controller and speakers | L2Advertisement covers the whole pool |
external-secrets |
ESO controllers | Backed by a secret store outside the cluster |
reloader |
Stakater Reloader | reloader.stakater.com/auto: "true" triggers rollouts |
media |
*arr stack, Jellyfin, qBittorrent, Jellyseerr, NZBGet | 192.168.1.226 shared LB IP |
automation |
claude-bridge HITL Discord bridge | LB at 192.168.1.235 |
trading |
Trading dashboard + API | NQ-bias-engine |
science |
Dask cluster, lsdb workloads | LSDB (large scale astronomy databases) |
dashboard |
Homepage | Lab landing page |
security |
Falco runtime detection | Discord webhook for alerts |
The full layout is in k8s-argocd (or wherever it lives publicly).
Why this shape¶
A few decisions worth calling out, with longer rationale in Decisions:
- External Postgres VM instead of in-cluster CloudNativePG. Stateful data is the most expensive thing to lose, and I wanted backups, restore drills, and major-version upgrades to be boring even if K8s broke. (ADR 0002)
- MetalLB L2 instead of BGP, because the home network is a single L2 segment and there is no router I trust to peer with. (ADR 0001)
- ArgoCD instead of Flux, mostly for the UI when explaining the lab to other humans, and the App-of-Apps pattern fits how I think. (ADR 0003)
- Discord as the alert sink because it is where I already live. A small custom proxy translates Alertmanager and Falco webhooks into channel-appropriate messages.