Architecture overview¶

A small homelab that runs like a real one, with GitOps, observability, and a controlled blast radius.

Layout¶

Internet
  Cloudflare (DNS, proxy for select names)
    │
    ▼
LAN 192.168.1.0/24
  NPM (TLS termination, reverse proxy)
    │
    ▼
  MetalLB pool 192.168.1.225-240
    │
    ▼
  Kubernetes cluster (kubeadm, Calico, ArgoCD)
    apiserver VIP   192.168.1.20:6443
    k8cluster1      192.168.1.90  ── pve1
    k8cluster2      192.168.1.89  ── pve1
    k8cluster3      192.168.1.91  ── pve2
                       │  pod->external = SNAT to node IP
                       ▼
  Postgres 16 VM      192.168.1.123  ── pve2
  Uptime Kuma VM      192.168.1.129  ── pve2 (idle, see decisions)
  Proxmox pve1        192.168.1.10
  Proxmox pve2        192.168.1.11

A higher-fidelity diagram and screenshots live in Visuals.

Hosts and roles¶

Host	IP	Role
pve1	192.168.1.10	Proxmox node, hosts most VMs
pve2	192.168.1.11	Proxmox node, hosts the rest
K8s API	192.168.1.20	kube-apiserver VIP
k8cluster1	192.168.1.90	K8s worker / control plane
k8cluster2	192.168.1.89	K8s worker / control plane
k8cluster3	192.168.1.91	K8s worker / control plane
postgresql	192.168.1.123	Postgres 16, shared backend for Grafana, claude-bridge, and other stateful apps
uptime-kuma	192.168.1.129	Black-box monitoring VM (currently idle, see decisions)
NPM	LAN	nginx-proxy-manager, TLS termination and routing for `*.herro.me`

Kubernetes cluster¶

3 nodes, kubeadm-built, single control-plane endpoint at 192.168.1.20:6443.
Calico CNI, pod CIDR 10.244.0.0/16.
MetalLB in L2 mode announcing from the pool 192.168.1.225-240 (see ADR 0001).
ArgoCD reconciles the cluster from Git. App-of-apps pattern, source repo k8s-argocd.
Stakater Reloader watches ConfigMaps and Secrets and rolls Deployments when they change.
External Secrets Operator pulls credentials from outside the cluster so secrets never live in Git.

What runs where¶

Namespace	What	Notes
`argocd`	ArgoCD HA + Redis	HA mode, redis-ha-haproxy
`monitoring`	kube-prometheus-stack, Loki, Vector, Alertmanager, Grafana	Discord notification proxy lives here
`metallb-system`	MetalLB controller and speakers	L2Advertisement covers the whole pool
`external-secrets`	ESO controllers	Backed by a secret store outside the cluster
`reloader`	Stakater Reloader	`reloader.stakater.com/auto: "true"` triggers rollouts
`media`	*arr stack, Jellyfin, qBittorrent, Jellyseerr, NZBGet	`192.168.1.226` shared LB IP
`automation`	claude-bridge HITL Discord bridge	LB at `192.168.1.235`
`trading`	Trading dashboard + API	NQ-bias-engine
`science`	Dask cluster, lsdb workloads	LSDB (large scale astronomy databases)
`dashboard`	Homepage	Lab landing page
`security`	Falco runtime detection	Discord webhook for alerts

The full layout is in k8s-argocd (or wherever it lives publicly).

Why this shape¶

A few decisions worth calling out, with longer rationale in Decisions:

External Postgres VM instead of in-cluster CloudNativePG. Stateful data is the most expensive thing to lose, and I wanted backups, restore drills, and major-version upgrades to be boring even if K8s broke. (ADR 0002)
MetalLB L2 instead of BGP, because the home network is a single L2 segment and there is no router I trust to peer with. (ADR 0001)
ArgoCD instead of Flux, mostly for the UI when explaining the lab to other humans, and the App-of-Apps pattern fits how I think. (ADR 0003)
Discord as the alert sink because it is where I already live. A small custom proxy translates Alertmanager and Falco webhooks into channel-appropriate messages.