simulator¶

Workspace-internal load and seed simulator for the clinical-data-model platform. Lives at apps/simulator/ (package: @sa-platform/simulator).

The simulator drives the canonical case lifecycle exactly as a real product would — token from auth, patient + case in clinical-api, image upload, submit, poll for result, auto-completion of human-review tasks via admin-api — at a configurable rate, against a synthetic organisation in an explicitly allow-listed target environment. It records seedable runs, asserts thresholds (error rate, p95 latency), and refuses to point at production by hard-coded design.

Prerequisites¶

Phase 1's upstream platform deliverables (Prereqs A–D) all landed in May 2026:

✅ Organisation.synthetic flag (clinical-api), immutable once set, gated on a mark-synthetic admin endpoint.
✅ admin-api: POST /api/admin/orgs/{orgId}/purge-data — refuses non-synthetic orgs.
✅ admin-api: POST /api/admin/reviews/{reviewId}/complete-as-system — refuses non-synthetic-org reviews.
✅ scripts/dev/provision-sim-org.ts — bootstraps a synthetic org + OAuth2 service account.

The integration tests under apps/simulator/test/integration/ are gated by RUN_SIMULATOR_INTEGRATION=1; without that env var they're skipped (so CI without a platform stack stays green).

You'll also need:

Docker Desktop / Colima running (for scripts/dev/start-all.sh).
Image fixtures dropped into apps/simulator/fixtures/valid/ (and optionally fixtures/invalid/). See apps/simulator/fixtures/PROVENANCE.md.
A populated apps/simulator/.env (copy from .env.example, then run pnpm sim:provision >> apps/simulator/.env).

Quickstart¶

Working directory: pnpm sim is a thin wrapper around pnpm -C apps/simulator sim, so it cds into apps/simulator/ before invoking the CLI. Profile paths are resolved against apps/simulator/, not the repo root. Pass profiles/local-trickle.yaml (or an absolute path), not apps/simulator/profiles/local-trickle.yaml. The CLI prints a hint if you trip on this.

# From repo root, with the docker stack running:
pnpm install
cp apps/simulator/.env.example apps/simulator/.env
# populate the SIM_* values from the provisioning script's output

# Smoke test — runs a single scenario end-to-end:
pnpm sim one-shot profiles/local-one-shot.yaml

# Continuous trickle — until you Ctrl-C or touch the kill-switch file:
pnpm sim trickle profiles/local-trickle.yaml

Subcommands¶

The CLI is exposed as pnpm sim <subcommand> (which runs tsx src/cli.ts under the hood, see apps/simulator/package.json). All subcommands return one of the documented exit codes; nothing in the runner ever throws past the boundary.

Command	Purpose
`sim trickle <profile>`	Run a trickle profile until SIGTERM or kill-switch.
`sim one-shot <profile>`	Single scenario invocation (smoke test).
`sim teardown <runId>`	Print teardown instructions (Phase 1 invokes purge manually).
`sim list-runs`	List local run summaries under `./runs/`.

`sim trickle <profile>`¶

Runs the profile's scenario continuously at the configured arrival rate until either SIGTERM/SIGINT or the kill-switch file is touched. Drains in-flight tasks for up to 30 s on shutdown, writes the run summary atomically, and exits with the assertion-based exit code.

pnpm sim trickle profiles/local-trickle.yaml
# stop with Ctrl-C, or:
touch /tmp/sim-stop-<runId>   # path is logged at startup

`sim one-shot <profile>`¶

Same as trickle, but the run loop stops after the first scenario invocation. Used as a smoke test from CI and when iterating on new scenarios — produces the same summary.json + cases.ndjson outputs.

pnpm sim one-shot profiles/local-one-shot.yaml

`sim teardown <runId>`¶

Reads runs/<runId>/summary.json, extracts the synthetic org id, and prints instructions for purging that org's data via admin-api. Phase 1 doesn't invoke the purge directly — operators run it manually so the destructive call has explicit human approval. Pair with --output-dir if your runs are stored elsewhere.

pnpm sim teardown 01HZX...   # the runId logged at the end of a run

`sim list-runs`¶

Lists local run summaries — one line per run with status, start time, and case count. Reads from ./runs/ by default (override with --output-dir).

pnpm sim list-runs

Global options (`trickle` and `one-shot`)¶

--seed <n> — override profile seed; recorded in summary for reproducibility.
--kill-switch-path <path> — override default /tmp/sim-stop-<runId>.
--output-dir <path> — override default ./runs/.
--fixtures-dir <path> — override default ./fixtures/.
--dry-run — validate config + safety pre-flight, do not run.
--name <orgname> — resolve an org by display name via admin-api and override the profile's target.syntheticOrgId. Also accepts the bareword form name=<orgname> so that pnpm sim trickle <profile> name=<orgname> works as written. The org must exist (404 ⇒ exit 1) and pass the synthetic-org pre-flight (non-synthetic ⇒ exit 3); the resolved id is recorded in the run summary.

Quote multi-word names so the shell doesn't split them into separate argv tokens:

pnpm sim trickle profiles/local-trickle.yaml name="Sim Org"
# or, equivalently:
pnpm sim trickle profiles/local-trickle.yaml --name "Sim Org"

Without quotes, name=Sim Org reaches the CLI as two tokens (name=Sim and Org); the CLI rejects the stray positional with [sim] too many arguments for 'trickle'… plus a quoting hint.

--client-id <id> — OAuth client id provisioned via the admin UI's Provision API client modal (on the integration page). When set, the simulator rotates that client's secret on startup using its admin secret and uses the fresh plaintext in memory only — operators don't have to copy-paste one-shot secrets into apps/simulator/.env. Also accepts the bareword form client_id=<id>. The previous secret is revoked immediately (overlap=0); audit log records the rotation as api_client.secret_rotated.service. When unset, the simulator reads SIM_SERVICE_ACCOUNT_CLIENT_ID + SIM_SERVICE_ACCOUNT_CLIENT_SECRET from env.

# Switch orgs without touching .env:
pnpm sim trickle profiles/local-trickle.yaml name=UHB client_id=<uhb-client-id>

This is the recommended flow for any environment where you provision OAuth clients via the admin UI — the one-shot secret reveal modal stays one-shot, but the simulator can still authenticate.

--rate <spec> — override the trickle profile's arrivals.intervalMs for this run. Lets a single profile cover background trickle, normal load, and burst by passing different flags. Also accepts the bareword form rate=<spec>. Only valid when arrivals.kind: trickle; other arrival shapes return a config error.

Accepts either a preset or an explicit rate:

Form	Examples	Effective rate
Preset	`low`	6 cases/hour — one every ~10 min
Preset	`medium`	60/hour — one per minute
Preset	`high`	600/hour — one every ~6 s
Explicit	`100/hour`, `100/h`	per-hour rate
Explicit	`5/min`, `5/m`	per-minute rate
Explicit	`2/sec`, `2/s`	per-second rate

pnpm sim trickle profiles/local-trickle.yaml rate=high      # burst (~6s)
pnpm sim trickle profiles/local-trickle.yaml rate=medium    # ~60s
pnpm sim trickle profiles/local-trickle.yaml rate=low       # ~10min
pnpm sim trickle profiles/local-trickle.yaml rate=100/hour  # explicit
pnpm sim trickle profiles/local-trickle.yaml --rate 5/min   # --flag form also works

The resolved intervalMs is logged at startup ([sim] --rate high → intervalMs=6000) and recorded in runs/<runId>/summary.json so the rate used is recoverable after the fact.

--force — bypass the synthetic-org pre-flight check. Also accepts the bareword form force=true (only the literal force=true opts in — force=false or anything else leaves the check in place). When set, the runner still calls getOrg to capture hasDermConfig, but skips the assertion and emits a loud safety: synthetic-org check BYPASSED via --force WARN line so the override is unmissable in the run audit trail. Intended for operators who knowingly want to seed synthetic data into a non-synthetic staging tenant — do not use this against production-shaped data.

Example — point the simulator at a different synthetic org without editing YAML:

pnpm sim trickle profiles/local-trickle.yaml name=sim-staging
# or, equivalently:
pnpm sim trickle profiles/local-trickle.yaml --name sim-staging

Profile YAML¶

Profiles live in apps/simulator/profiles/. Schema enforced by Zod at load time. See profiles/README.md for conventions; profiles/local-trickle.yaml is the canonical reference.

Non-secret config goes in YAML, secrets in env. Reference env vars with ${VAR_NAME} — interpolation happens before validation.

DERM integration¶

By default the simulator auto-detects whether to trigger a DERM (third-party AI medical device) review per case: it checks the resolved org's hasDermConfig flag during pre-flight and turns the DERM path on iff the org has DERM credentials provisioned. Provisioning DERM creds is a strong signal you want runs to exercise them, so the simulator just does. Profile flags can pin the behaviour when you need repeatable runs:

dermReview:
  enabled: true # force-on (even without creds; platform will mark `no_config`)
  # enabled: false   # force-off (don't trigger even with creds)
  # omit `enabled`   # auto-detect from org's DERM config presence (recommended)
  pollIntervalMs: 5000 # default: 5s
  pollTimeoutMs: 300000 # default: 5 min

Prerequisites for the DERM path to actually succeed:

The synthetic org must have DERM credentials configured via the admin UI (PATCH /api/orgs/:id/derm-config, which the admin UI relays to clinical-api at PATCH /v1/clinical-api/admin/organisations/:id/derm-config).
The API client provisioned by pnpm sim:provision must carry the derm_review:read + derm_review:write scopes (granted by default).

If enabled: true is set explicitly but the org has no DERM config, the platform-side worker will mark the review failed with error_code: no_config — the simulator records it as a failed DERM outcome but the scenario itself still succeeds (case ingestion is fine, the DERM trigger is independent). Outcome metric: sim.derm_review.count{outcome=completed|failed|timeout}.

DERM is dermoscopic-only, so cases the image picker selected as macro_only (no dermoscopic image) skip the DERM trigger entirely — the simulator detects the contradiction before calling and bumps sim.derm_review.skipped{reason=macro_only_case}. Without this guard, every macro_only case with dermReview.enabled would land as failed at triggerDermReview → 409 no_dermoscopic_image.

Case failures in the terminal¶

The runner logs an error line per failed scenario to stderr (in addition to recording it in summary.json). The line carries step + status + correlationId + the response body's reason, parsed from RFC-7807 detail / message / title. Network 503s from clinical-api now include the upstream cause (e.g. workflow_check_unavailable: orchestrator at http://… unreachable (ECONNREFUSED)), so operators can diagnose without tailing the service logs.

Run outputs¶

Every run produces:

runs/<runId>/summary.json — run metadata, config, scenario counts, full metrics, assertion results, errors.
runs/<runId>/cases.ndjson — append-only ledger, one line per scenario invocation, with per-step timings.

Summary file is written initially with status: 'running' so an active run is discoverable mid-flight, then atomically replaced (temp + rename) on exit.

Adding a new scenario¶

Create src/scenarios/<your-scenario>.ts exporting a Scenario (see src/scenarios/scenario.ts for the type).
Add the name to ScenarioNameSchema in src/config/profile-schema.ts.
Wire it into the runner's scenario lookup (currently a single import; will become a registry as the set grows).
Add a <your-scenario>.spec.ts exercising the happy path and at least one failure path with a stub PlatformClient.

Adding image fixtures¶

Layout (capture-type-tagged):

apps/simulator/fixtures/
├── valid/
│   ├── dermoscopic/   # passes validation, capture_type=dermoscopic
│   └── macroscopic/   # passes validation, capture_type=macroscopic
└── invalid/           # rejected by clinical-api's image validation

Drop files into the appropriate subdir, then append a row to fixtures/PROVENANCE.md. Files larger than ~10MB or with provenance constraints should go to fixtures/local-only/ (gitignored) — record the absence in PROVENANCE.md.

Loose files in valid/ (no capture-type subdir) are still picked up and treated as dermoscopic for backwards compatibility. Prefer the subdirs for new fixtures.

The simulator submits one of three valid case configurations per the operator spec:

Configuration	Shape
`derm_only`	exactly one dermoscopic image
`derm_plus_macro`	one dermoscopic + (≥2) macroscopic images
`macro_only`	no dermoscopic + (≥2) macroscopic images

Configurations are chosen by relative weight. The macroscopic count for derm_plus_macro / macro_only is drawn uniformly from macroscopicCount.{min,max} (both bounds must be ≥ 2). Defaults: 100% derm_only, all valid — matches the platform's existing happy-path expectation so a profile that ships only dermoscopic fixtures keeps working unchanged.

imageMix:
  configurations:
    derm_only: 1 # weights — don't have to sum to 1
    derm_plus_macro: 1
    macro_only: 1
  macroscopicCount:
    min: 2
    max: 4
  invalidProbability: 0.05 # per-slot probability the bytes come from invalid/

invalidProbability substitutes the slot's bytes from the invalid pool while keeping the slot's wire-format capture_type the same — so the platform sees the configuration's intended request shape but the bytes are wrong.

Metrics:

sim.case.configuration{configuration} — which configuration each case used.
sim.image.picked{capture_type, validity} — per-slot rollup of the planned cases.

Safety guardrails — what they refuse and why¶

Guardrail	Refuses	Why
Host allow-list	Any URL not matching the env's regex set	No `prod` key exists in the allow-list. Production targeting is hard-refused; see `src/safety/allow-list.ts`.
Synthetic-org pre-flight	Any org where admin-api reports `synthetic: false` (overridable via `--force` / `force=true` — emits a loud WARN audit line)	Belt-and-braces: tenant isolation + admin-api server-side guard + this client-side check. The override exists for staging tenants that were never explicitly marked synthetic; it does not unlock prod (the host allow-list still refuses).
Concurrency cap (schema)	`concurrency > 20`	v1 ceiling per spec §1.8; phase 2/3 may relax.
Kill switch	n/a — orderly drain	SIGTERM/SIGINT and a poison-pill file both stop the run gracefully (drains in-flight up to 30s).

There is no --force-prod flag. There will not be one. If a future legitimate need arises, it requires a code change reviewed under the same SDLC controls as any other safety-critical change. The --force flag described above only relaxes the synthetic-org assertion — the host allow-list (which is what refuses prod) is not bypassable.

Exit codes¶

Code	Meaning
0	Run completed; assertions passed (or none defined).
1	Runtime error (config loading failed, unexpected exception).
2	Threshold assertion breached.
3	Safety violation — host not allow-listed, or org not synthetic.

Troubleshooting¶

"Org X is not marked synthetic. Refusing to run." The Organisation.synthetic flag is missing or false. Run the provisioning script to create a properly-marked org, or update the existing org via admin-api by an operator with platform-admin scope.

"baseUrl not in allow-list for env Y." Either the env is wrong (e.g. you said local but pointed at staging) or the URL doesn't match the allow-list pattern. See src/safety/allow-list.ts.

"No valid image fixtures found under .../fixtures/valid/." Drop at least one image into fixtures/valid/. The simulator never generates images.

"Token request failed: 4xx." Check SIM_SERVICE_ACCOUNT_CLIENT_ID and SIM_SERVICE_ACCOUNT_CLIENT_SECRET. The provisioning script's output is the source of truth for these values.

"Required env var SIM_… is unset." Copy .env.example to .env and populate. For staging, secrets come from AWS Secrets Manager — don't put them in .env.

"Request failed: POST /v1/clinical-api/cases/…/derm-reviews → 403" (with a scope-fix hint appended) The simulator's service account is missing the derm_review:write scope. Service accounts provisioned before 2026-05-05 don't have it (the scope was added to scripts/dev/provision-sim-org.ts in commit a77d987). Two fixes:

Patch the existing client in place (preserves your org and data). The endpoint lives on auth-service (port 3001), which authenticates via Authorization: Bearer <ADMIN_API_SECRET> (note: this differs from admin-api, which uses x-admin-api-secret — see services/auth/src/admin/admin-auth.guard.ts):

Both the client id and the admin secret are read from local .env files in this command — run from the repo root, no shell exports needed:

CLIENT_ID=$(grep '^SIM_SERVICE_ACCOUNT_CLIENT_ID=' apps/simulator/.env | cut -d= -f2) && SECRET=$(grep '^ADMIN_API_SECRET=' services/auth/.env | cut -d= -f2) && curl -X PATCH "http://localhost:3001/v1/auth/admin/api-clients/$CLIENT_ID" -H "Authorization: Bearer $SECRET" -H "content-type: application/json" -d '{"scopes":["patients:read","patients:write","cases:read","cases:write","images:read","images:write","derm_review:read","derm_review:write"]}'

Or re-provision a fresh org via pnpm sim:provision and update apps/simulator/.env with the new credentials. Cleaner state but means a new org_id and your existing test data is orphaned.

Phase status¶

Phase 1 ships: scaffold, submitCase scenario, trickle arrivals, console + JSON exporters, safety, seedable runs, threshold assertions.

Phase 2 (planned): Poisson and ramp-spike arrivals, JSON metrics export polish, run-summary CLI, per-runId teardown.

Phase 3 (planned): OTLP metrics exporter, CI integration (per-PR ephemeral env smoke), staging profile + Secrets Manager loader, audit shipping.

Phase 4 (planned): SDK adapter once @sa-platform/clinical-client and friends are published.