Skip to content

Human-Review Service — Design Spec

Date: 2026-04-27 Status: Draft for review Author: Jim Holmes (with brainstorming session) Tracking issue / PR: TBD


1. Service overview & boundaries

1.1 Purpose

Owns the human-review queue and decision capture for clinical cases. API-only — exposes REST for reviewer clients (or downstream UI consumers) to list the queue, claim cases, submit decisions, and decline. Receives human_review.requested events from the orchestrator (one per tier per case) and emits human_review.completed / human_review.failed back. Per-tenant clinician pools and an SA-owned panel both run through the same service; tenancy is enforced via org_id scoping.

The motivation: ai-review and orchestrator gave the platform an automated workflow brain, but real clinical decisions still require licensed clinicians. Human-review is the queue and decision-capture surface for those clinicians, integrated cleanly with the orchestrator's workflow steps so a workflow definition can request human review at any point and get a structured decision back.

1.2 What the service owns

  • Reviewer records — clinical credentialing metadata (specialty, license number, license jurisdiction, credentialing expiry, eligible tiers); FK to user-management for identity
  • Review records — one per human_review.requested event; per case per tier
  • Queue state and claim lifecycle (queued → claimed → submitted | declined → re-queued)
  • Decision payload capture (confirm with snapshotted AI diagnoses, override with reviewer-asserted SNOMED codes, decline with ReasonCode from the orchestrator's registry)
  • Decline tracking + cap-based escalation (3 declines → declined_exhaustedhuman_review.failed)
  • Suggested-queue ranking (specialty match, jurisdiction match, claim load, age — deterministic heuristic)
  • Per-state-transition audit log (ReviewAuditLog)

1.3 What it does not own

  • Reviewer authentication or basic profile — user-management
  • Decline reason vocabulary — orchestrator's ReasonCode registry, scope=human_decline
  • Workflow orchestration / case lifecycle — orchestrator
  • Image storage and presigned URLs — clinical-api (proxied on read)
  • Diagnosis projection / supersession — clinical-api projects via case.workflow.completed from orchestrator's emit_final step, using existing supersession logic

1.4 Modes

Only "active" — there is no client-driven equivalent for human review. The orchestrator dispatches human_review.requested; the reviewer client consumes the queue API. (Client-driven mode would mean clients writing reviews directly, bypassing the queue — not a v1 use case.)

1.5 Tenancy

Org-scoped, same patterns as every other service. Customer reviewers see only cases belonging to their org. SA panel reviewers (with human-review:cross-tenant scope) can read and act across tenants for QA.

1.6 Deployment

  • Service path: services/human-review/
  • Port: 3007 (next free)
  • NestJS, TypeScript, Prisma 7 (driver adapter), MySQL, Redis Streams — same stack as orchestrator
  • Standalone Prisma database (human_review)

1.7 v1 explicit non-goals

  • Reviewer-facing UI (separate frontend track; service is API-only)
  • Per-finding decision granularity (case-level only in v1)
  • Draft / partial-review state (claim is a commitment to submit/decline; orchestrator timeout is the backstop)
  • Push-assigned queue (pull-only with suggested ranking; push is a v2 layer)
  • Amendment of submitted decisions (immutable; correction via orchestrator's supersede rerun)
  • Automation-bias mitigations (blind review, double-blinded adjudication) — research-mode features for later
  • Adjudication when AI and customer-clinician disagree (clinical-api's most-recent-wins is the rule)
  • Reviewer presence/availability tracking (ReviewerSession cut from v1)
  • Per-product or per-org "show AI / hide AI" toggle (v1 always shows AI context)
  • Subspecialty matching (single specialty string field)

2. Data model & DB schema

Own MySQL database (human_review), Prisma-managed. Four tables.

2.1 Reviewer

Clinical credentialing metadata; one row per user-management user who can review.

id                       UUID PK
user_id                  UUID UNIQUE     -- FK to user-management user
specialty                VARCHAR         -- e.g. 'dermatology'
license_number           VARCHAR         -- e.g. GMC number, NPI
license_jurisdiction     VARCHAR         -- e.g. 'UK', 'US-CA'
credentialing_expiry     DATETIME NULL
eligible_tiers           JSON            -- ['customer_clinician'] | ['sa_qa'] | both
active                   BOOLEAN
created_at, updated_at
INDEX(active, specialty, license_jurisdiction)

2.2 Review

One row per human_review.requested event.

id                       UUID PK
case_id                  UUID
org_id                   UUID
product_id               UUID
tier                     ENUM(customer_clinician, sa_qa)
correlation_id           VARCHAR UNIQUE  -- ties back to orchestrator's request
status                   ENUM(queued, claimed, submitted, declined_exhausted, cancelled)
context_snapshot         JSON            -- frozen orchestrator context at request time
claimed_by_reviewer_id   UUID NULL FK
claimed_at               DATETIME NULL
submitted_by_reviewer_id UUID NULL FK
submitted_at             DATETIME NULL
decision                 ENUM(confirm, override) NULL
decision_payload         JSON NULL
notes                    TEXT NULL
decline_count            INT DEFAULT 0
created_at, updated_at
INDEX(org_id, status, tier, created_at)
INDEX(claimed_by_reviewer_id, status)
INDEX(case_id, tier)
INDEX(correlation_id)

decision_payload shapes:

  • decision='confirm': { confirmed_ai_diagnoses: [{ snomed_code, label, ai_diagnosis_id }] } — snapshot of the AI diagnoses the reviewer confirmed (referenced by ai_diagnosis_id, with codes/labels copied for audit immutability)
  • decision='override': { diagnoses: [{ snomed_code, label, confidence?, notes? }] } — reviewer-asserted SNOMED list

2.3 ReviewClaim

Append-only claim history; supports audit and decline-cap tracking.

id                       UUID PK
review_id                UUID FK
reviewer_id              UUID FK
claimed_at               DATETIME
released_at              DATETIME NULL
release_reason           ENUM(submitted, declined, unclaimed, timed_out) NULL
decline_reason_code      VARCHAR NULL    -- references orchestrator's ReasonCode (scope=human_decline)
decline_note             TEXT NULL
INDEX(review_id, claimed_at)
INDEX(reviewer_id, claimed_at)

2.4 ReviewAuditLog

Per-state-transition log.

id                       UUID PK
review_id                UUID FK
reviewer_id              UUID NULL FK    -- null if system-initiated
action                   ENUM(created, claimed, unclaimed, submitted, declined, decline_exhausted, cancelled)
metadata                 JSON
created_at
INDEX(review_id, created_at)

2.5 Key design choices

  1. Review.context_snapshot is frozen at event arrival. Same pattern as orchestrator's WorkflowInstance.definitionSnapshot. The reviewer's view of the case is what was true when the orchestrator emitted the request. Image presigned URLs are NOT stored here (they expire) — fetched fresh from clinical-api on read.

  2. correlation_id is unique. Duplicate human_review.requested events (consumer redelivery) are ignored at insert time. Matches the orchestrator's idempotency pattern.

  3. ReviewClaim is append-only. Every claim/release writes a new row. Lets us audit the decline cap (count rows where decline_reason_code IS NOT NULL for a review) and prevent same-reviewer re-claim (filter by reviewer_id).

  4. Review.decline_count is denormalized for fast queue filtering; bumped on each decline; capped at 3 → status flips to declined_exhausted.

  5. No PHI in human-review tables. context_snapshot may contain patient_id (UUID), case_id, image_ids — all opaque references. Actual patient demographics, image bytes, and diagnosis text live in clinical-api. Encryption-at-rest is therefore not needed in human-review's DB; tenancy isolation is enough. If context_snapshot is ever found to leak PHI in practice, add CryptoService later.


3. REST API surface

Standard NestJS controllers, scope-guarded via @sa-platform/auth-client. Multi-tenant via existing org-context interceptor.

3.1 Auth scopes

Registered in auth service's scope registry:

human-review:read-queue        → list queue entries the actor can see (org-scoped by default)
human-review:claim             → claim a queued review
human-review:submit            → submit a decision on a claimed review
human-review:decline           → decline a claimed review with a reason code
human-review:read-cross-tenant → SA panel + ops, see across orgs
human-review:admin             → admin endpoints (force-cancel, audit query, reviewer CRUD)

3.2 Reviewer endpoints (clinician/SA panel actor)

GET    /reviews/queue?tier=&status=
GET    /reviews/queue/suggested?tier=&limit=
GET    /reviews/my-claims
GET    /reviews/:id
POST   /reviews/:id/claim
POST   /reviews/:id/unclaim
POST   /reviews/:id/submit
POST   /reviews/:id/decline

3.3 Admin endpoints

GET    /admin/reviewers?org_id=
POST   /admin/reviewers
PATCH  /admin/reviewers/:id
DELETE /admin/reviewers/:id

GET    /admin/reviews?org_id=&status=&tier=
GET    /admin/reviews/:id/audit
POST   /admin/reviews/:id/cancel

3.4 Health

GET    /health
GET    /health/ready

3.5 Endpoint behaviors

Atomic claim (POST /reviews/:id/claim): transactional check-and-update — sets claimed_by_reviewer_id + claimed_at only if status='queued' and claimed_by_reviewer_id IS NULL. Race-safe: simultaneous claims from two reviewers result in exactly one success and one 409 Conflict. Writes a ReviewClaim row + ReviewAuditLog entry. Reviewer must be eligible (right tier, right org or cross-tenant).

Submit (POST /reviews/:id/submit): requires the calling reviewer to be the current claimant. Validates the decision payload (DTO + per-decision rules: confirm must reference existing AI diagnosis IDs from the context snapshot; override must have ≥1 SNOMED code). On success: writes decision to Review, transitions to submitted, releases claim with release_reason=submitted, writes audit log, emits human_review.completed to orchestrator with the same correlation_id.

Decline (POST /reviews/:id/decline): requires current claimant. Validates decline_reason_code exists in orchestrator's ReasonCode registry (scope=human_decline, scoped to org or system). Writes ReviewClaim row with the reason, increments Review.decline_count. If decline_count < 3: returns review to queued. If decline_count >= 3: transitions to declined_exhausted, emits human_review.failed with reason_code=no_reviewer_accepted, retryable=false.

Unclaim (POST /reviews/:id/unclaim): releases without a decline reason — reviewer just decided not to do this one. Writes ReviewClaim row with release_reason=unclaimed (does NOT count toward decline cap). Returns to queued. Audit-tracked. Allowed once per claim — repeated thrashing patterns are caught at audit-review time, not enforced in v1.

Suggested ranking (GET /reviews/queue/suggested):

  1. Filter to reviews matching the actor's reviewer record's eligible_tiers and org (or all orgs if cross-tenant)
  2. Score each: specialty_match (50) + jurisdiction_match (30) + age_minutes (1 per minute, capped at 60) - active_claim_load (10 per claim)
  3. Sort by score descending; return top limit (default 25)
  4. No ML, no learned weights — pure deterministic heuristic

Image URL proxy (GET /reviews/:id): human-review calls clinical-api (via service-to-service auth) to mint fresh presigned URLs for each image referenced in context_snapshot. URLs valid for 15 min. Reviewer's client fetches images directly from S3/Minio.

Tenancy enforcement: every read endpoint filters by the actor's org_id unless the actor has human-review:read-cross-tenant. Every write endpoint asserts the actor's org_id matches the review's org_id unless cross-tenant.


4. Event contracts

4.1 Inbound: human_review.requested

Already defined in @sa-platform/events:

humanReviewRequestedPayload = z.object({
  tier: z.enum(['customer_clinician', 'sa_qa']),
  context_snapshot: z.record(z.unknown()),
});

HumanReviewRequestedConsumer:

  1. Idempotency check via correlation_id UNIQUE on Review
  2. If new: create Review with status='queued', freeze context_snapshot from envelope payload
  3. Write ReviewAuditLog action='created'
  4. No outbound event yet — the case sits in queue waiting for a claim

4.2 Outbound: human_review.completed

Already defined:

humanReviewCompletedPayload = z.object({
  decision: z.enum(['confirm', 'override']),
  diagnoses: z.array(z.object({ snomed_code: z.string(), label: z.string() })).optional(),
  reviewer_id: z.string(),
});

Emitted on successful submission. correlation_id echoes the original human_review.requested correlation id.

  • decision='confirm': diagnoses is the snapshotted AI diagnoses (so the orchestrator's context — and clinical-api's downstream projection — receives the actual codes).
  • decision='override': diagnoses is the reviewer's submitted SNOMED list.

4.3 Outbound: human_review.failed

Already defined:

humanReviewFailedPayload = z.object({
  reason_code: z.string(),
  retryable: z.boolean(),
});

Emitted in two cases:

  1. Decline cap exhausted: reason_code='no_reviewer_accepted', retryable=false. After 3 declines, the review transitions to declined_exhausted and the orchestrator halts the workflow.
  2. Admin force-cancel: reason_code='cancelled_by_admin', retryable=false.

The reason_code references the ReasonCode registry (scope=step_failure for these system-emitted codes; the per-reviewer decline reasons themselves are scope=human_decline and live on the ReviewClaim row, not the failure event).

4.4 Transport

Redis Streams. Consumer group: human-review. Stream retention: 7 days (@sa-platform/events default).

4.5 Consumer/publisher matrix

Service Consumes Publishes
human-review human_review.requested human_review.completed, human_review.failed
orchestrator human_review.completed, human_review.failed human_review.requested

4.6 Idempotency

  • Inbound: correlation_id is the dedup key. Review.correlation_id is UNIQUE. Duplicate consumer redeliveries fail at insert time and are silently acked. Same pattern as orchestrator's WorkflowEvent dedup.
  • Outbound: event_id (from makeEnvelope) is a fresh UUID per emission; orchestrator's WorkflowEvent PK enforces dedup on its side.

4.7 Service-to-service synchronous calls

Two synchronous calls human-review makes:

  1. clinical-api.getImagePresignedUrls(caseId) — when a reviewer GETs a review detail. Auth via service JWT with the appropriate clinical-api:read-images scope. (Adds the scope if not yet present.)
  2. user-management.getUser(userId) — when listing reviewers (to display name/email). Auth via service JWT with user-management:read scope. Lightweight; cached in-memory for ~5 min per user.

Both calls follow the consent-client pattern from ai-review (Task 13 of the orchestrator plan).

4.8 Migration considerations

  • No existing reviews to backfill — greenfield service
  • Orchestrator already defines and emits human_review.requested (seeded into Redis Streams by orchestrator PR #29). Until human-review is deployed, those events sit unconsumed — orchestrator's per-step timeout halts the workflow with workflow_timeout after 24h. Acceptable transitional behavior.
  • After deployment: human-review's consumer starts processing the backlog of unconsumed human_review.requested events. May need ops attention if a large backlog accumulated.

5. Scheduling, retries, observability

5.1 Scheduling

No internal scheduling needed in v1. The only "timer" concern is the per-step timeout, which the orchestrator owns (24h for customer_clinician, 7d for sa_qa per the workflow definition). When a review sits unclaimed past the orchestrator's deadline, the orchestrator halts its own workflow with workflow_timeout — human-review just sees its Review row stay in queued state forever. (Eventually cleaned up by the retention policy; separate concern.)

No BullMQ. No background workers in v1. The poll-loop consumer reads from Redis Streams (same pattern as orchestrator).

5.2 Observability

Structured logs via NestJS Logger; correlation_id in every log line.

Metrics (Prometheus-shaped counters / histograms / gauges):

  • human_review_created_total{tier,org}
  • human_review_claimed_total{tier}
  • human_review_submitted_total{tier,decision}
  • human_review_declined_total{tier,reason_code}
  • human_review_decline_exhausted_total{tier}
  • human_review_queue_depth{tier,org} gauge
  • human_review_time_to_claim_seconds{tier} histogram
  • human_review_time_to_submit_seconds{tier} histogram

Health endpoints: /health (liveness), /health/ready (incl. Redis Streams + DB reachability).

Audit trail: ReviewAuditLog table holds per-state-transition entries. No separate audit-service hookup needed for v1.


6. Testing strategy

6.1 Unit tests (~40)

  • DTO validators (decision payloads, decline reason validation)
  • Atomic-claim transaction (race condition test using two concurrent claims against an in-memory mock)
  • Decline cap logic (count, exhaust, emit)
  • Suggested-ranking score calculation
  • Service-to-service client wrappers (clinical-api image URL fetch, user-management user fetch — both mocked)
  • ReviewAuditLog write paths
  • Tenancy enforcement (org-scoped vs cross-tenant)
  • Reviewer admin CRUD

6.2 Integration tests (~7 specs)

Real MySQL + Redis via testcontainers. msw-style mocks for clinical-api / user-management.

  1. Happy pathhuman_review.requested → claim → submit (override) → human_review.completed emitted with right correlation_id and reviewer's SNOMED list
  2. Confirm-AI — claim → submit (confirm) → emitted event has snapshotted AI diagnoses
  3. Decline + re-queue — claim → decline → review back to queued, decline_count=1
  4. Decline cap exhausted — 3 declines → declined_exhausted + human_review.failed emitted
  5. Atomic-claim race — two simultaneous claims → exactly one wins, other gets 409
  6. Tenancy isolation — reviewer in org X cannot see/claim reviews from org Y; SA cross-tenant reviewer can
  7. Idempotency — duplicate human_review.requested (same correlation_id) doesn't create duplicate Review

6.3 No live external services in CI

clinical-api and user-management calls mocked via msw. Real MySQL + Redis via testcontainers.


7. Out-of-scope / deferred

  • Reviewer-facing UI (separate frontend track)
  • Per-finding decision granularity
  • Draft state / partial-review save
  • Push-assigned queue (auto-assignment based on availability)
  • Amendment of submitted decisions
  • Adjudication when AI/clinician disagree
  • Reviewer presence/availability tracking (ReviewerSession cut from v1)
  • Per-org "show AI / hide AI" toggle (always shown in v1)
  • Subspecialty matching (single specialty string field)
  • Dedicated audit service hookup (existing in-DB ReviewAuditLog is the audit primitive)
  • Routing rules beyond the suggested-ranking heuristic (no ML, no learned weights)
  • Reviewer panels with shift schedules, handoffs, breaks
  • Per-product clinical decision support (CDS) hooks

8. Open questions for implementation plan

  1. Confirm clinical-api:read-images scope exists or add it as part of this PR's cross-service step. Verify against current packages/auth-client/src/auth.types.ts SCOPES.
  2. user-management's getUser(userId) REST shape — confirm it exists and matches what we need (id, email, name). If not present, light extension required.
  3. Service-to-service auth pattern — orchestrator's services use a shared service JWT minted with specific scopes. Confirm the issuance flow for human-review's outbound calls.
  4. ReasonCode lookup at decline time — human-review calls orchestrator's GET /reason-codes?scope=human_decline&org_id=... to validate the code, OR caches the registry locally with TTL. Design choice for the implementation plan.