Network¶
CIDRs¶
| Range | What |
|---|---|
192.168.1.0/24 |
LAN, all physical and virtual hosts |
192.168.1.225-240 |
MetalLB L2 LoadBalancer pool |
10.244.0.0/16 |
Kubernetes pod CIDR (Calico) |
10.96.0.0/12 |
Kubernetes service CIDR |
How a request reaches an app¶
sequenceDiagram
participant U as User
participant CF as Cloudflare
participant NPM as NPM
participant LB as MetalLB IP
participant SP as Speaker on owning node
participant SVC as ClusterIP Service
participant POD as Pod
U->>CF: https://app.herro.me
CF->>NPM: TLS, host header
NPM->>LB: HTTP to LB IP
Note over LB,SP: ARP request resolved by elected speaker
SP->>SVC: kube-proxy DNAT
SVC->>POD: traffic to ready endpoint
A few things worth noting:
- MetalLB L2 elects one speaker per service IP. That speaker answers ARP for the IP. If the elected speaker has no ready endpoint for the service, MetalLB withdraws the announcement (
reason: notOwnerin the speaker logs). This is a feature, not a bug, and it surfaced in the pg_hba incident. - Pod-to-external SNAT. Calico masquerades pod traffic that egresses the cluster, so external services like the Postgres VM see the node IP as the source, not the pod IP. This is the reason
pg_hba.confrules need to allow node IPs, not the pod CIDR alone. - NPM at the edge. Most public hostnames terminate TLS at NPM and forward to a MetalLB IP behind it. Cloudflare proxies a subset of names for DDoS / privacy.
DNS¶
herro.me is the public zone. Hostnames split into three buckets:
Internet-reachable (Cloudflare, valid Let's Encrypt cert, anyone on the public internet can hit them):
| Hostname | Backed by |
|---|---|
docs.herro.me |
This site, served by GitHub Pages |
Internal-only on the LAN (resolved by the LAN DNS resolver, never published to Cloudflare, only reachable when on the home network or via WireGuard):
| Hostname | Backed by |
|---|---|
graphy.herro.me |
Grafana, MetalLB 192.168.1.229 |
argocd.herro.me |
ArgoCD UI, MetalLB 192.168.1.225 |
home.herro.me |
Homepage dashboard, MetalLB 192.168.1.228 |
prom.herro.me |
Prometheus UI, MetalLB 192.168.1.230 |
loki.herro.me |
Loki, MetalLB 192.168.1.231 |
alerts.herro.me |
Alertmanager, MetalLB 192.168.1.233 |
The decision to keep the operational UIs internal is intentional: every public surface is an attack surface, and these tools are not designed primarily as public-facing apps. Anything that legitimately needs an external view (a status page, a public read-only Grafana dashboard) gets its own carefully scoped exposure rather than punching through the whole stack.
MetalLB configuration¶
IPAddressPool covering 192.168.1.225-240, paired with a single L2Advertisement that selects the whole pool. No node selectors, all three speakers are eligible to win an election for any IP.
Quick checks:
kubectl get ipaddresspool,l2advertisement -n metallb-system
kubectl get svc -A --field-selector spec.type=LoadBalancer -o wide
# What is each speaker actually doing right now?
for p in $(kubectl -n metallb-system get pod -l component=speaker -o name); do
echo "=== $p ==="
kubectl -n metallb-system logs --tail=200 $p \
| grep -E "serviceAnnounced|serviceWithdrawn|notOwner"
done
Egress to the Postgres VM¶
This is the failure mode that bit me on 2026-05-03. The flow:
- Pod on
k8cluster1(10.244.216.x) opens a TCP connection to192.168.1.123:5432. - Calico's
cali-nat-outgoingrule SNATs the source to the node IP192.168.1.90. - Postgres sees the source as
192.168.1.90and consultspg_hba.conf.
If pg_hba.conf only allows the pod CIDR (10.244.0.0/16), traffic from the cluster will be rejected because by the time it reaches Postgres the source has already been rewritten. The fix is to allow the node IPs (.89, .90, .91) explicitly, or the LAN, or to disable masquerading for that destination in Calico (natOutgoing: false on a matching IPPool, which is heavier).
See the pg_hba runbook.