Skip to content

Admin UI Phase 1 — Design

Status: design / spec Date: 2026-04-27 Author: Jim Holmes (with Claude)

1. Purpose and scope

The platform currently has no UI surface. Operations, support, and product queries against the system are answered by direct database queries or by reading service logs. This spec describes the first UI surface — an internal admin console — split into three sequential phases:

  • Phase 1 (this spec): scaffold + human auth + read-only status dashboard.
  • Phase 2 (separate spec, future): organisation / user CRUD.
  • Phase 3 (separate spec, future): workflow definition editor for the orchestrator service.

Phase 1 ships an internal-staff-only tool. Auth and tenancy primitives are designed so customer-organisation admins can be added later (Phase 4) without rebuilding the auth model.

The Phase 1 status dashboard covers:

  • Operational health — per-service up/down, BullMQ queue depths, Redis Stream lag, recent error counts.
  • Volume metrics — cases created today / week / month, broken down per organisation and per product.
  • AI review throughput — inferences run, success rate, average latency, recent failures with reasons.
  • Human review queue — open reviews, claimed but not submitted, average time-to-decision, decline counts.
  • Per-organisation drill-down — pick an org, see all of the above scoped to that org.

Explicitly out of Phase 1: time-series charts, recent-activity event stream, workflow instance metrics, write paths, MFA enforcement at the application level (delegated to the IDP).

2. Architecture overview

[Browser]
   │  cookie session
   ▼
[apps/admin-ui]            Vite + React + Mantine SPA
   │  (deployed as static files via S3 + CloudFront,
   │   same origin as admin-api)
   │  /api/* (same-origin)
   ▼
[services/admin-api]       NestJS BFF
   │  ├─ holds the platform JWT (server-side only)
   │  ├─ enforces admin scopes
   │  ├─ aggregates dashboard data fan-out
   │  ├─ emits its own audit log per admin action
   │  └─ Redis-backed session store
   │
   ├──→ services/auth                (Google OIDC callback, JWT issuance)
   ├──→ services/clinical-api        /v1/admin/stats   (NEW)
   ├──→ services/ai-review           /v1/admin/stats   (NEW)
   └──→ services/human-review        /v1/admin/stats   (NEW)

New surfaces

Component Purpose
apps/admin-ui/ First frontend in the repo (Vite + React + Mantine)
services/admin-api/ NestJS BFF; holds JWTs server-side, aggregates data
services/auth/ (extension) One new endpoint: Google OIDC callback
/v1/admin/stats × 3 New endpoint on clinical-api, ai-review, human-review

Auth scopes (new)

Scope Purpose
admin:read Dashboard read, status queries
admin:write Reserved for Phase 2 (CRUD on orgs / users)
admin:cross-tenant Already exists; admin-api requests it for SA staff sessions

Existing patterns reused

  • Service-to-service auth via SERVICE_AUTH_TOKEN (admin-api → backing services).
  • X-Actor-Context header carries the human user identity from the BFF down through the call chain.
  • @sa-platform/auth-client for JWT verification + scope guards.
  • AppConfigService env-validation pattern with production guard against dev default secrets.

3. Components in detail

3.1 apps/admin-ui/ — Vite + React + Mantine SPA

  • Build: Vite, TypeScript, Mantine UI primitives, Mantine Charts (recharts wrapper) for inline graphs.
  • Routing: React Router (/login, /, /orgs/:orgId for drill-down).
  • Data fetching: TanStack Query for caching + revalidation, points only at /api/* (same origin).
  • State: React Query is the only state surface — no Redux / Zustand for Phase 1.
  • Auth boundary: A <RequireSession> wrapper redirects to /login if /api/me returns 401.
  • Layout shell: Mantine AppShell with persistent left navigation (Dashboard | Organisations¹ | Users¹ | Workflows¹ | Settings¹). Items marked ¹ are stubs for Phase 2/3, render "Coming soon" placeholders.
  • Tests: Vitest + React Testing Library for components; one Playwright smoke test that mocks /api/* and walks login → dashboard → drill-down → logout.

3.2 services/admin-api/ — NestJS BFF

Modules:

  • AuthModule — Google OIDC redirect / callback, session create / destroy, /api/me.
  • SessionModuleSessionStore interface with RedisSessionStore implementation; ioredis client per existing service pattern.
  • DashboardModule/api/dashboard/health, /api/dashboard/volume, /api/dashboard/ai-review, /api/dashboard/human-review, /api/dashboard/orgs/:orgId.
  • ClientsModule — typed HTTP clients for clinical-api, ai-review, human-review (mirrors human-review's existing clients/ pattern).

Aggregation:

  • Each dashboard endpoint runs a parallel fan-out (Promise.all) across the relevant backing services, merges results, returns a flat shape the SPA consumes directly.

Audit:

  • New AdminAuditLog Prisma model — admin-api owns its own database. One row per admin action (actorId, action, target, at, metadata).

Allowlist (two-layer):

  1. Email domain check on OIDC callback (@skinanalytics.co.uk by default, env-configured via ADMIN_DOMAIN_ALLOWLIST).
  2. AdminUser Prisma model — a row must exist with role IN ('admin', 'support') for login to succeed. First-time domain-matched logins create a row in pending status (visible in Phase 2 for an existing admin to approve).

Tests: Jest unit + integration suite hitting real Redis and a mocked auth / clinical-api / ai-review / human-review (via nock).

3.3 services/auth/ extension

One new endpoint: POST /v1/oauth/google/callback that:

  • Verifies the Google id_token against Google's JWKS.
  • Checks email_verified === true and that the email domain matches ADMIN_DOMAIN_ALLOWLIST.
  • Calls admin-api over the internal service-token network to resolve the AdminUser row (POST /internal/admin-users/resolve { email, name }). admin-api returns the row's id and active/pending status; the auth service rejects pending rows during Phase 1 (no self-service onboarding yet — admins add each other manually in Phase 2).
  • Issues a platform JWT with scope: admin:read admin:cross-tenant, aud: admin-api, sub: ${user.id}, plus actor_context claims (email, name, role).
  • Returns the JWT to admin-api over the same server-to-server response (the flow in §4.1 step 4b is a back-channel call from admin-api to auth, not a browser redirect). The browser never sees the JWT.

Note on dependency direction: admin-api owns AdminUser (it is the source of truth for who is allowed in). The auth service depends on admin-api during login. This means the auth service must be deployed with ADMIN_API_BASE_URL configured. The reverse-direction call (admin-api → auth, OIDC token exchange) is the existing pattern used everywhere else in the platform; this design adds the new admin-api → auth direction only.

3.4 New /v1/admin/stats endpoints

Each returns a small JSON shape:

  • clinical-api: { casesToday, casesThisWeek, casesThisMonth, perOrg: [{orgId, name, count}], perProduct: [{productCode, count}] }
  • ai-review: { inferencesToday, successRate24h, avgLatencyMs24h, queueDepth, recentFailures: [{at, reason}] }
  • human-review: { openCount, claimedCount, avgTimeToDecisionMs, declineCount24h }

All require admin:read scope. Without admin:cross-tenant, the endpoints return only data for the JWT's org_id claim (matching the existing tenancy interceptor behaviour). With admin:cross-tenant (the default for SA-staff sessions), the endpoints honour an optional ?org=<orgId> query parameter; if omitted, they return platform-wide data.

4. Data flows

4.1 Login flow (Google OIDC)

1. User → admin-ui /login → "Sign in with Google" button
2. SPA → admin-api  GET /api/auth/google/start
   admin-api ↳ generates state + nonce, stores in Redis (5 min TTL),
              redirects browser to https://accounts.google.com/o/oauth2/v2/auth?...
3. Google → user picks account → 302 to admin-api /api/auth/google/callback?code=...&state=...
4. admin-api:
   a. Validates state, exchanges `code` for `id_token` at Google's token endpoint
      (server-to-server call to https://oauth2.googleapis.com/token using the
      Google client_secret — never exposed to the browser)
   b. Server-to-server (NOT a redirect) call to auth service:
      POST /v1/oauth/google/callback { id_token }
      using the SERVICE_AUTH_TOKEN bearer for internal authentication
   c. auth service verifies id_token signature (Google JWKS), checks
      email_verified, checks email domain allowlist, ensures AdminUser row
      exists in active state, issues platform JWT
      (scope=admin:read admin:cross-tenant, exp=8h)
   d. admin-api creates session row in Redis: { sessionId, userId, jwt,
      createdAt, expiresAt }
   e. Sets httpOnly + secure + SameSite=Lax cookie on the SPA's domain
   f. 302 to admin-ui home
5. SPA → /api/me → admin-api reads cookie → looks up session → returns user identity

4.2 Dashboard fetch flow (e.g., volume widget)

1. SPA mounts <VolumeCard/>
   → TanStack Query: GET /api/dashboard/volume?org=ALL&range=7d
2. admin-api:
   a. Reads session cookie → looks up session → grabs JWT
   b. Authorizes: scope must include admin:read
   c. Fan-out (Promise.all):
      - clinical-api  GET /v1/admin/stats?range=7d   (Authorization: Bearer <jwt>)
      - ai-review     GET /v1/admin/stats?range=7d
      - human-review  GET /v1/admin/stats?range=7d
   d. Merges into shape: { volume: {...}, ai: {...}, hr: {...}, generatedAt }
   e. Caches result in Redis for 30s (configurable per-endpoint TTL)
   f. Logs an AdminAuditLog row: { actorId, action='dashboard.volume.read',
      target='ALL', at }
3. SPA renders the cards

4.3 Per-org drill-down

Same as 4.2 but with ?org=<orgId> propagated to each backing service. The platform JWT carries admin:cross-tenant so the org filter is honoured rather than rejected.

4.4 Logout

1. SPA → /api/auth/logout
2. admin-api: deletes Redis session row, expires cookie, returns 204
3. SPA: clears React Query cache, redirects to /login

4.5 Errors and degraded-mode display

  • A backing service returning 5xx → admin-api returns the partial shape with a partial: true flag and a degradedFor: ['ai-review'] array; SPA renders affected cards with a "Stats unavailable" overlay rather than failing the whole page.
  • 401 from any layer → SPA forces a re-login (session likely expired).
  • 403 from a backing service → surface as a banner in the SPA (likely a configuration bug, not user error).

5. Testing strategy

  • apps/admin-ui: Vitest + React Testing Library for components; one Playwright e2e smoke that mocks /api/* and walks login → dashboard → drill-down → logout. No component snapshots — they rot.
  • services/admin-api: Jest unit tests + integration suite mirroring the human-review pattern (real Redis container, mocked downstream services via nock). Covers OIDC callback success, OIDC callback rejected (bad domain, unknown user, expired state, replay), session lifecycle, partial-failure aggregation, audit-log emission per request.
  • services/auth extension: Jest unit tests for the OIDC callback endpoint — id_token verification, domain check, JWT issuance with correct claims and scopes.
  • Per-service /v1/admin/stats: Jest unit tests for the new endpoint per service (3 services). Cover scope enforcement, org filtering with and without admin:cross-tenant, time-range parsing.

Pre-PR gates (matches existing repo): pnpm turbo run typecheck && pnpm turbo run test && pnpm format:check && pnpm turbo run lint.

6. Deployment topology

  • apps/admin-ui: built static (pnpm --filter admin-ui builddist/), uploaded to an S3 bucket, fronted by CloudFront with the same custom domain as the admin-api (e.g. admin.<env>.cdm.skinanalytics.co.uk). CloudFront routes /api/* to admin-api ALB, everything else to S3.
  • services/admin-api: deploys like every other NestJS service (ECS task in the private subnet, ALB target group). New env vars: GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, GOOGLE_REDIRECT_URI, ADMIN_DOMAIN_ALLOWLIST (comma-separated).
  • Secrets managed via the existing AppConfigService pattern (production guard rejects dev defaults).
  • Redis URL points at the existing platform Redis (separate logical key prefix admin-session:).

7. Documentation update

The Phase 1 PR must update the documentation site to reflect the new components:

  • New docs/audiences/tech/services/admin-api.md — service summary using the existing template.
  • New docs/audiences/tech/services/admin-ui.md — service summary, plus a few lines about the build and where to find Mantine docs.
  • Update docs/audiences/smt/capability-map.md — add admin-ui and admin-api rows.
  • Update docs/audiences/smt/architecture-glance.md (and docs/diagrams/architecture-glance.mmd) — add admin-ui as a client and admin-api as a new service, with dotted lines to the backing services.
  • Update docs/audiences/tech/README.md — service catalogue table.
  • Update docs/audiences/smt/roadmap.md — admin UI Phase 1 in "Shipped", Phases 2 / 3 in "Deferred".
  • Update mkdocs.yml nav — new pages.
  • Update docs/audiences/compliance/security-model.md — extend §1 (Authentication) with the Google OIDC SSO flow.
  • Update docs/audiences/compliance/audit-trail.md — describe the new AdminAuditLog table.
  • Update scripts/docs/soup-classification.yaml — add classifications for new packages (Mantine, react-router, tanstack-query).

8. Out of scope for Phase 1

Explicitly deferred — to be picked up in their own specs:

  • Phase 2: Organisation / user CRUD (admin-api gains POST / PATCH / DELETE endpoints; SPA gains routes; reuses the BFF auth and audit primitives).
  • Phase 3: Workflow definition editor (orchestrator's JSON DSL — non-trivial UX, JSON-schema validation, version history).
  • Phase 1.5 candidates (small follow-ups, not blockers):
  • Recent activity stream (event-bus subscription).
  • Workflow instance metrics.
  • Time-series charts.
  • Customer-organisation admin login (different IDP per org).
  • Reviewer UI — already a separate deferred item on the roadmap; not part of admin.
  • MFA enforcement — handled by Google Workspace at the IDP level; we don't reimplement.

9. Open questions to resolve at plan time

These do not block the design but are best decided when writing the implementation plan:

  • Exact S3 + CloudFront vs. ECS-served-static decision — pick during plan.
  • Mantine theme tokens (colours, fonts) — pick once, set in plan.
  • Cache TTLs per dashboard endpoint — sensible default of 30 s, tune in plan.