5 Deployment and Operations
shad0w edited this page 2026-06-24 10:20:47 +00:00

Deployment and Operations

The app is self-hosted on a single Ubuntu server, pms1 — not a managed PaaS. Public traffic reaches it through a Pangolin/Traefik tunnel; the Next.js app, PostgreSQL, the Forgejo instance, and the CI runner all live on the same host.

            Internet (HTTPS, pms.pelagiamarine.com)
                         │
              ┌──────────▼───────────┐
              │  Pangolin / Traefik  │  reverse proxy + tunnel
              └──────────┬───────────┘
                         │
┌─────────────────────────────────────────────────────────┐
│ pms1 (Ubuntu)                                             │
│  ┌───────────────────────┐   ┌────────────────────────┐  │
│  │ Next.js (pm2: ppms)   │──▶│ PostgreSQL 16 (native, │  │
│  │ next start, :3000     │   │ localhost:5432, db      │  │
│  └───────────────────────┘   │ `pelagia`)              │  │
│        ├─▶ Cloudflare R2  (documents, prod)              │
│        └─▶ Resend         (email, prod)                  │
│  ┌──────────────────────────────────────────────────┐    │
│  │ Forgejo (Docker) + Actions runner (pm2)           │    │
│  │ issue→fix→PR→tag deploy   (see pipeline page)     │    │
│  │ also: pelagia_test (prod-mirror DB) + staging     │    │
│  └──────────────────────────────────────────────────┘    │
└───────────────────────────────────────────────────────────┘
  • App process: pm2 process ppms running next start on port 3000.
  • Database: native PostgreSQL 16, localhost:5432, database pelagia.
  • Repo remotes: pms1git.pelagiamarine.com/shad0w/pelagia-portal.git; vgrgit.tunnel.pelagiamarine.com/... (SSH-tunnel variant).

Production environment

All production env vars must be set (auth, DB, R2, Resend, optionally Forgejo/GST). Server-side env lives in ~/pms/App/.env. The full list and the dev/prod split is on Environment Variables.

Microservices

Three standalone Express + Playwright services run alongside the app on pms1, each on a fixed local port, reached by the app over localhost:

Service pm2 name Port Purpose App env
GstService gst-service 3003 GSTIN lookup for vendors GST_SERVICE_URL
EpfoService epfo-service 3004 UAN / EPFO member lookup (crewing) EPFO_SERVICE_URL, EPFO_LIVE
PdfService pdf-service 3005 Renders a PO to PDF for “Email to vendor” PDF_SERVICE_URL, PDF_SERVICE_TOKEN, APP_INTERNAL_URL

They are auto-deployed on release by the root ecosystem.config.js + .forgejo/workflows/deploy.yml: the deploy builds each service that is checked out, exports the few keys they need out of App/.env (e.g. PDF_SERVICE_TOKEN, EPFO_LIVE), then pm2 startOrReload ecosystem.config.js --update-env (creates them on first release, reloads after). Unset secrets fall back to harmless defaults (GST/EPFO stay stub-capable; PdfService skips token checks).

Email PO to vendor (PdfService). Enabled by setting PDF_SERVICE_URL + PDF_SERVICE_TOKEN in ~/pms/App/.env (gate isPdfServiceConfigured()); the same token must reach pdf-service (the deploy exports it). PdfService renders the app's /api/po/<id>/export?…&svc=<token> page (the token lets it fetch without a user session), uploads the PDF to R2, and the app returns a mailto: with a 7-day presigned link. APP_INTERNAL_URL is the URL PdfService uses to reach the app (defaults to NEXTAUTH_URL). The auth middleware lets the svc-token export request through (lib/pdf-export-auth.ts — without it the unauthenticated render is bounced to /login), and the rendered PDF is cached per PO at a deterministic key: repeat sends reuse the stored copy and only mint a fresh 7-day link, re-rendering only when the PO changed.

Release & deploy flow

Deploys are gated on a human merging a PR and pushing a release tag.

git pull
git tag v0.2.0            # semver: patch for fixes, minor for features
git push pms1 master --tags

Pushing a v* tag triggers .forgejo/workflows/deploy.yml on the host runner (pms1), which:

  1. loads nvm, checks out the tag into ~/pms (git checkout -f refs/tags/$TAG),
  2. cd App && pnpm install --frozen-lockfile,
  3. pnpm build (includes prisma generate),
  4. pnpm db:migrate:deploy (applies migrations),
  5. pm2 restart ppms --update-env,
  6. verifies GET http://127.0.0.1:3000/login returns HTTP 200.

Watch progress under Actions in Forgejo, or pm2 logs forgejo-runner.

Migrations before traffic. pnpm build only runs prisma generate — it does not apply migrations. Serving new code whose Prisma client expects a not-yet-migrated column yields P2022 … column does not exist at runtime. The deploy workflow runs migrate deploy for you; for manual deploys, run it (and restart) before/with the swap. This was a real production incident — see Changelog.

Staging (smoke test before deploy)

automation/staging-up.sh brings up a staging instance of the latest master so changes can be clicked through before a release tag deploys them.

  • Checkout ~/pelagia-staging; pm2 process ppms-staging on port 3200.
  • Runs against the prod-mirror test DB (pelagia_test) in safe dev mode (console email, local storage, SSO disabled).
  • SSH-tunnel only — binds 127.0.0.1:3200, not publicly reachable: ssh -L 3200:localhost:3200 shad0w@<pms1> then browse http://localhost:3200. On Windows the "Pelagia Staging (tunnel)" desktop shortcut (automation/staging-tunnel.cmd) opens tunnel + browser in one click.
  • Shows the "INTERNAL DEV / STAGING - NOT PRODUCTION" banner via NEXT_PUBLIC_ENV_LABEL (the EnvBanner component renders nothing when unset).
  • Log in with a password user (SSO off), e.g. admin@pelagiamarine.com.
  • Refresh to newer master + restart: re-run the script. Stop: pm2 delete ppms-staging.

Test database (pelagia_test)

A PostgreSQL DB on pms1 that is a daily mirror of production (pelagia), refreshed by automation/refresh-test-db.sh via cron at 03:30 (pg_dump pelagia | psql pelagia_test). Used by staging and by the automated fixer for realistic verification. Because it is refreshed daily, anything written to it is disposable. Never assume an empty DB — it holds prod-like data.

Operational notes

  • The automation fixer and staging run on port 3100 / 3200; never broad-kill (pkill next) on pms1 — production's next-server runs there too. Stop a dev server by port (fuser -k 3100/tcp).
  • Forgejo tokens: portal-report-issue (write:issue, used by the app) and claude-watcher (write:issue + write:repository, used by the watcher).
  • Known Forgejo 10 bug: clicking Update branch on a PR can show "broken due to missing fork information" even when mergeable: true. Fix: close and reopen the PR (UI or API). Resolves on upgrade past v10.

See Issue-to-Deploy Pipeline for the automation, and automation/README.md for the full runbook.