ADR 0002: Postgres on a dedicated VM, not in-cluster¶
Status: Accepted Date: 2025-08
Context¶
A handful of in-cluster apps need a relational store: Grafana, the claude-bridge HITL service, the trading dashboard, and a few hobby projects. I want to keep stateful data outside the K8s blast radius and make backups, restore drills, and major-version upgrades boring.
Decision¶
A single dedicated VM running Postgres 16 (Debian 12, PGDG packages, 192.168.1.123). Each app gets its own role, database, and pg_hba.conf rule. Backups via pgBackRest to a TrueNAS dataset, with a periodic restore drill into a throwaway VM.
Considered¶
- CloudNativePG (CNPG). Strong operator, real HA, in-cluster. Rejected for two reasons: I trust myself more with
pg_dumpandpgBackRestagainst a familiar VM than I trust myself to debug an in-cluster cluster at 2am, and CNPG cannot help if K8s itself is the problem (which it has been). - Stolon, Zalando-postgres-operator. Same shape as CNPG, less momentum, more legwork for me.
- A managed Postgres (RDS, Supabase). Defeats the purpose of a homelab and adds external dependency.
Consequences¶
- The data layer survives a cluster rebuild. This is the main win. I have rebuilt the K8s cluster more than once and no app data was at risk.
pg_hba.confbecomes a load-bearing piece of network policy. When CNI behavior around SNAT changes, this file has to know. The 2026-05-03 incident is the obvious example.- No automatic HA. A single Postgres VM means an OS-level failure on
192.168.1.123takes the dependent apps down. Acceptable for a lab. If I ever care, the upgrade path is to add a synchronous replica on a second Proxmox node and front it with Patroni or pgpool. Not now. - Ansible owns the box. All changes (
pg_hba.conf,postgresql.conf, package versions) belong inansible-quasarlab/labctl-runs/postgres. Manual edits during incidents are allowed but must be back-ported the same day.