pelagia-portal/automation/README.md

# Automated issue-to-deploy pipeline

End-to-end flow from a user clicking **Report Issue** in the portal to a fix
running in production:

```
Portal header (bug icon)                          [App/components/layout/report-issue-button.tsx]
        │  server action → Forgejo API
        ▼
Forgejo issue  (label: portal)                    [git.pelagiamarine.com/shad0w/pelagia-portal]
        │  polled every 10 min by Windows Scheduled Task "PelagiaClaudeIssueWatcher"
        ▼
TRIAGE  (watcher phase 1)                          [dev PC, headless Claude Code, analysis only]
        │  Claude reads the issue + repo, posts a requirements-breakdown comment,
        │  and routes it: adds `claude-queue` (auto-fixable) or `interactive` (human)
        ▼
FIX  (watcher phase 2, only for claude-queue)      [headless Claude Code in C:\...\src\pelagia-autofix]
        │  Claude implements + verifies fix; watcher pushes branch claude/issue-N
        │  and opens a PR (label: claude-pr)
        ▼
Human review: merge the PR, then create a release tag vX.Y.Z
        │  tag push triggers .forgejo/workflows/deploy.yml
        ▼
forgejo-runner on pms1 (pm2: forgejo-runner, label "host")
        │  checks out the tag in ~/pms, pnpm install + build + prisma migrate deploy
        ▼
pm2 restart ppms  →  live at pms.pelagiamarine.com
```

`interactive`-routed issues stop after triage for a human to pick up (run with
Claude in a steered session). The triage breakdown comment is plain (no bot
marker) so, for `claude-queue` issues, the fix stage reads it back as refined
requirements.

## Contribution policy (all changes via PR)

**Every change lands through a pull request — no direct pushes to `master`.** This applies
to humans and to the automated pipeline alike (the watcher already opens PRs).

Each PR must include:

- **Tests** for any code change. Model: the integration test on `claude/issue-12` —
  it targets the prod-mirror test DB, anchors on existing rows, inserts fixtures via
  raw SQL (schema-tolerant), isolates them with a unique prefix, and cleans up in
  `afterEach`. Docs/config/automation-only PRs are exempt.
- **Docs** updates where relevant (`App/README.md`, `App/CLAUDE.md`, `Docs/`,
  this file, `CHANGELOG.md`).

**Enforcement** — [`.forgejo/workflows/pr-checks.yml`](../.forgejo/workflows/pr-checks.yml)
runs on every PR into `master`:

1. **Test-presence gate:** a PR touching `App/app|lib|components|hooks` with no test
   change fails. Justify genuine exceptions in the PR body for a reviewer to override.
2. **Type-check:** `pnpm type-check` must be clean across the whole project (tests
   included). The test suite's old type baseline was repaired when this gate landed.
3. **Unit tests:** `pnpm test` must pass.

All three are **hard** gates. `pnpm lint` is intentionally not run — it currently
requires an interactive ESLint migration (a follow-up). Integration tests are
type-checked here but executed against the `pelagia_test` DB by the autofix / locally
(not in this shared CI, to avoid prod-mirror schema drift).

A [`PULL_REQUEST_TEMPLATE.md`](../.forgejo/PULL_REQUEST_TEMPLATE.md) carries the checklist.

## Components

| Piece | Where | Notes |
|---|---|---|
| Report Issue button | `App/components/layout/report-issue-button.tsx` + `report-issue-actions.ts` | Any signed-in user; files issue with only the `portal` label (triage routes it) |
| Forgejo helper | `App/lib/forgejo.ts` | Needs `FORGEJO_URL`, `FORGEJO_REPO`, `FORGEJO_TOKEN` env (token scope: `write:issue`) |
| Issue watcher (active) | `automation/claude-issue-watcher.sh` on pms1 | Bash port; runs 24/7 via cron. Config + logs under `~/issue-watcher/` |
| Issue watcher (Windows, disabled) | `automation/claude-issue-watcher.ps1` | PowerShell original. `PelagiaClaudeIssueWatcher` task is **disabled** (pms1 is the sole worker; two pollers would race) |
| Forgejo helper | `App/lib/forgejo.ts` | Needs `FORGEJO_URL`, `FORGEJO_REPO`, `FORGEJO_TOKEN` env (token scope: `write:issue`) |
| Deploy workflow | `.forgejo/workflows/deploy.yml` | Triggers on `v*` tags; runs on the `host` runner |
| Runner | pms1 `~/forgejo-runner`, pm2 process `forgejo-runner` | Registered as `pms1-host` with labels `host`, `docker` |

## Where the watcher runs (pms1)

The watcher runs on **pms1** under cron (every 10 min), polling Forgejo over the
local loopback (`http://127.0.0.1:3001`).

- Script: `~/issue-watcher/claude-issue-watcher.sh` (source: `automation/claude-issue-watcher.sh`)
- Config: `~/issue-watcher/watcher.config.json` (gitignored; holds the token + `claudeExe` = the nvm `claude` path)
- Work clone: `~/pelagia-autofix` (separate from the deployed `~/pms`)
- Logs: `~/issue-watcher/logs/` (`watcher-<date>.log`, per-issue `claude-*.log`, `cron.log`)
- Crontab: `*/10 * * * * PATH=<nvm bin>:... ~/issue-watcher/claude-issue-watcher.sh >> ~/issue-watcher/logs/cron.log 2>&1`

**Auth:** Claude Code must be signed in on pms1 (`ssh` in, run `claude`, complete
the login → writes `~/.claude/.credentials.json`). The watcher has a preflight that
no-ops until those credentials exist, so cron can be enabled before sign-in and
activates automatically once signed in. (An `ANTHROPIC_API_KEY` env var also satisfies it.)

The Windows variant (`.ps1` + `register-watcher-task.ps1`) is the portable fallback;
re-enable its task only if pms1 is unavailable, and disable one before enabling the other.

## Test database (for autofix verification)

So the fix stage can verify against realistic data without touching production:

- **`pelagia_test`** — a PostgreSQL database on pms1, owned by `pelagia_user`, that is
  a **daily mirror of production** (`pelagia`). Created once as superuser; refreshed by
  `automation/refresh-test-db.sh` via cron at **03:30** (`pg_dump pelagia | psql pelagia_test`).
- The autofix clone's `~/pelagia-autofix/App/.env` points `DATABASE_URL` at `pelagia_test`
  and runs in **safe dev mode** — no Resend/SSO secrets, so email is console-logged and
  storage is local. `NEXTAUTH_URL`/`PORT` are set to **3100** (production app is on 3000).
- The fix prompt tells Claude it may run integration tests against this DB
  (`set -a; . ./.env; set +a; pnpm test:integration`) and may start a dev server on
  **port 3100 only**, stopping it by port (`fuser -k 3100/tcp`) — never a broad `pkill next`,
  which would take down production (it also runs a `next-server`).

Because the test DB is refreshed daily, anything the autofix writes to it (test data,
schema experiments) is disposable. Schema-migration issues are routed to `interactive`
by triage, so the unattended fixer should not be altering the schema anyway.

## Staging (smoke test before deploy)

`automation/staging-up.sh` (deployed to `~/issue-watcher/` on pms1) brings up a
**staging instance of the latest `master`** so changes can be clicked through
before a release tag deploys them to prod.

- Checkout: `~/pelagia-staging` (separate from `~/pms` and `~/pelagia-autofix`)
- Process: pm2 `ppms-staging` on **port 3200**, against the prod-mirror test DB
  (`pelagia_test`), safe dev mode (console email, local storage, SSO disabled).
- Refresh to newer master + restart: re-run `~/issue-watcher/staging-up.sh`.
- Stop: `pm2 delete ppms-staging`.
- **Access is SSH-tunnel only** — the dev server binds to `127.0.0.1:3200`, so it is
  not reachable from the public internet. Open a tunnel and browse `http://localhost:3200`:
  `ssh -L 3200:localhost:3200 shad0w@<pms1>`. On Windows, the desktop shortcut
  **"Pelagia Staging (tunnel)"** (`automation/staging-tunnel.cmd`) opens the tunnel and
  the browser in one click.
- A fixed banner **"INTERNAL DEV / STAGING - NOT PRODUCTION"** is shown (driven by
  `NEXT_PUBLIC_ENV_LABEL` in the staging `.env`; the `EnvBanner` component renders nothing
  when the var is unset, so production is unaffected).
- Log in with a password user (SSO is off here), e.g. `admin@pelagiamarine.com`.

## Issue label lifecycle

```
portal ──(triage)──▶ triaged + claude-queue ─▶ claude-working ─▶ claude-pr | claude-failed
              └─────▶ triaged + interactive   (stops here — handle with Claude interactively)
```

- **Triage owns routing for every `portal` issue.** Each untriaged portal issue is
  triaged once (`maxTriagePerRun` per run); triage adds `triaged`, a routing label
  (`claude-queue` or `interactive`), a type label (`bug` or `feature`), and posts a
  breakdown. Triage skips an issue only once it carries `triaged`, `interactive`,
  `claude-working`, `claude-pr`, or `claude-failed`.
- **`claude-queue` alone does NOT skip triage on a portal issue.** The Report Issue
  button may stamp `claude-queue` at creation; triage still claims the issue and
  decides routing (stripping the stray `claude-queue` if it routes to `interactive`).
  This is why triage works even if an older button build is deployed.
- `claude-queue` → `claude-working` → `claude-pr` (PR opened) or `claude-failed`.
- To retry a failed issue, re-add `claude-queue` (and remove `claude-failed`).
- To queue a **non-portal** issue for Claude (skipping triage), add `claude-queue`
  directly — triage never claims issues without the `portal` label.
- To force a portal issue straight to fix, add `triaged` + `claude-queue` yourself.

## Releasing

After merging a Claude PR (or any change) on `master`:

```powershell
git pull
git tag v0.2.0          # semver: bump patch for fixes, minor for features
git push pms1 master --tags
```

The runner deploys the tag and restarts the app. Watch progress under
**Actions** on the Forgejo repo, or `pm2 logs forgejo-runner` on pms1.

## Operational notes

- The watcher runs Claude Code with `--dangerously-skip-permissions` inside the
  dedicated `pelagia-autofix` clone — never point `workDir` at your main checkout.
- Watcher only works issues while this PC is on; queued issues are picked up on
  the next run after boot (`-StartWhenAvailable`).
- Tokens: `portal-report-issue` (write:issue, used by the app) and
  `claude-watcher` (write:issue + write:repository, used by the watcher).
  Both belong to the `shad0w` Forgejo account. Rotate via
  `docker exec -u 1000 forgejo forgejo admin user generate-access-token ...`.
- Server-side env for the button lives in `~/pms/App/.env` on pms1
  (`FORGEJO_URL=http://127.0.0.1:3001` so it does not depend on the tunnel).
- **Known Forgejo 10 bug:** clicking *Update branch* on a PR (or pushing to its
  head branch) can make the page show "This pull request is broken due to
  missing fork information" even though the PR is fine (API still reports
  `mergeable: true`). Fix: close and reopen the PR — via the UI, or:

  ```powershell
  $h = @{ Authorization = "token <claude-watcher token>" }
  Invoke-RestMethod -Method Patch -Headers $h -ContentType application/json `
    -Uri https://git.pelagiamarine.com/api/v1/repos/shad0w/pelagia-portal/pulls/<N> -Body '{"state":"closed"}'
  Invoke-RestMethod -Method Patch -Headers $h -ContentType application/json `
    -Uri https://git.pelagiamarine.com/api/v1/repos/shad0w/pelagia-portal/pulls/<N> -Body '{"state":"open"}'
  ```

  Fixed upstream in newer Gitea/Forgejo — resolves itself if Forgejo is upgraded past v10.