Commit graph

4 commits

Author SHA1 Message Date
19029a5a77 chore: restructure repo — flatten App/pelagia-portal to App, rename Prototype→Wireframe and Spec→Design
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 23:18:58 +05:30
340a3dcce0 feat(gst-service): structured logging, request tracing, and per-session captcha refresh
Logging (GstService):
- JSON-structured log lines: { ts, level, msg, ...ctx } — one per line,
  machine-parseable by any log aggregator (datadog, loki, etc.)
- LOG_LEVEL env var (DEBUG|INFO|WARN|ERROR, default INFO) — set DEBUG to
  see every captcha fetch, raw GST response body, and page console event
- WARN and ERROR lines go to stderr; INFO/DEBUG go to stdout so process
  supervisors can separate them
- Every log line carries relevant context: reqId, sessionId, gstin, ms, etc.
- errCtx() helper extracts errName, errMsg, and first 6 stack frames from
  any thrown value — no more bare String(e)
- elapsed() helper records wall-clock ms for every expensive step:
  browser launch, page navigation, captcha fetch, GST API call
- Request/response middleware: every HTTP request logs method, path,
  reqId, status, and duration; status >= 500 logs at ERROR, >= 400 at WARN
- Playwright page listeners: console errors/warnings, pageerror,
  requestfailed, and HTTP 4xx/5xx on GST portal endpoints
- process.on(uncaughtException) and process.on(unhandledRejection) so
  unexpected crashes surface in logs instead of silently dying
- Browser "disconnected" event logged; _browser reset so next request
  auto-relaunches without manual restart
- SESSION_TTL_MS configurable via env (default 3 min)
- closeSession() logs the reason (success / errorCode / exception / etc.)
- GET /health now returns browserConnected, per-session captchaCount,
  expiresInMs, and lastUsedMsAgo for operational visibility

Multiple captchas per session:
- Session now holds captchas: CaptchaEntry[] (ordered oldest→newest)
  so every image fetched in a session is kept for traceability
- GET /captcha/:sessionId — new endpoint that calls /services/captcha
  again within the SAME browser context (no page reload, ~200ms vs ~5s)
  and appends a new CaptchaEntry; resets TTL; returns totalCaptchas
- POST /search on SWEB_9034 (wrong captcha) no longer closes the session —
  returns { canRefresh: true, sessionId } so the caller can hit
  GET /captcha/:sessionId for a fresh image and retry immediately
- All other error paths (SWEB_9000, network error, no data) still close
  the session as before

Next.js proxy (app/api/gst/captcha/route.ts):
- GET /api/gst/captcha?refresh=<sessionId> proxies to the new
  GET /captcha/:sessionId endpoint on GstService
- Plain GET /api/gst/captcha still creates a new session as before

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-16 16:44:22 +05:30
2e6678f829 fix(gst): correct microservice default port and captcha field name
Default port changed 3002 → 3003 in the GstService and both proxy
routes.  The vendor-form was reading `captchaB64` from the API
response but the GstService returns `captchaBase64`, so the CAPTCHA
image was never displayed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 17:07:31 +05:30
f372fae953 feat(gst): replace API-key lookup with Playwright microservice
Problem: GST portal's public taxpayer search (services.gst.gov.in/
searchtp) now requires human CAPTCHA verification but no login.
The BIG-IP WAF blocks direct Node.js HTTP clients via TLS
fingerprinting; Playwright (real Chromium) bypasses it successfully.
Confirmed working: GSTIN 27AAHCP5787B1Z6 → full PELAGIA MARINE
SERVICES data including address, jurisdiction, filing status.

GstService/ (new standalone microservice):
- src/index.ts: Express + Playwright singleton browser
  GET  /health  → { ok: true }
  GET  /captcha → launches browser, loads GST portal, fetches
                  CAPTCHA image from same origin (sets CaptchaCookie),
                  stores BrowserContext in session map (3 min TTL)
                  → { sessionId, captchaBase64 }
  POST /search  → { sessionId, gstin, captcha } → submits form
                  via page.evaluate fetch() using live browser session,
                  closes context, returns parsed taxpayer data
- package.json, tsconfig.json, npm install
- src/test-lookup.ts: interactive CLI test (prompted user for captcha)

App changes:
- Remove playwright dep from Next.js app (was incorrectly added)
- Remove lib/gst-lookup.ts (sandbox.co.in placeholder — unused)
- Remove lib/gst-browser.ts (Playwright singleton — moved to service)
- app/api/gst/captcha/route.ts: thin proxy → GST_SERVICE_URL/captcha
- app/api/gst/route.ts: thin proxy POST → GST_SERVICE_URL/search
- vendor-form.tsx: two-step captcha UI
    Step 1: "Look up" → calls /api/gst/captcha → shows PNG inline
    Step 2: user types 6 digits → "Verify" → calls /api/gst → fills
            form (name, address, lat/lng from Nominatim geocoding)
    Wrong captcha → SWEB_9034 error with retry option
- .env.example: GST_SERVICE_URL=http://localhost:3002

Start the microservice: cd GstService && npm run dev

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 13:27:15 +05:30