Skip to content

Encryption at Rest + Histology Ingestion — Design Spec

Overview

Two related changes delivered as sequential plans:

  • Plan 12a adds application-level envelope encryption for all PHI fields, using per-patient data encryption keys (DEKs) and AES-256-GCM. Transparent to service code via Prisma middleware.
  • Plan 12b adds histology report ingestion following the same two-phase upload pattern as images, with encrypted fields from day one.

Plan 12a: Encryption at Rest

Architecture

Three components:

  1. KeyProvider — interface for wrapping/unwrapping DEKs. Two implementations:
  2. LocalKeyProvider (dev/test) — uses a static 256-bit master key from ENCRYPTION_MASTER_KEY env var. Wrap = AES-256-GCM encrypt the DEK with the master key. Unwrap = decrypt.
  3. KmsKeyProvider (production) — uses AWS KMS Encrypt/Decrypt API with a dedicated CMK ARN from KMS_CMK_ARN env var.
  4. CryptoService — generates random 256-bit DEKs, encrypts/decrypts individual field values using AES-256-GCM with random per-value IVs. Caches unwrapped DEKs in RequestContext for the request lifetime.
  5. Encryption Prisma Extension — a $extends layer (similar to the existing tenancy extension) that intercepts create/update to encrypt PHI fields before write, and intercepts read results to decrypt after. Configured with a declarative map of model → field list.

Encrypted value format

Each encrypted field is stored as a single string: base64(iv):base64(ciphertext):base64(authTag).

  • IV: 12 bytes (random per value)
  • Auth tag: 16 bytes
  • Ciphertext: variable length

This fits in a MySQL TEXT column.

Schema changes

Patient model:

  • Add encryptedDek field (@db.VarChar(512), stores the wrapped DEK as base64)

Widen PHI columns to @db.Text (ciphertext is longer than plaintext):

  • Patient: givenName, familyName, email, phone, postalCode, sexAtBirth, genderIdentity
  • Patient: dob — stored as encrypted ISO-8601 string; the Prisma type stays String (not DateTime) since the DB column holds ciphertext. Queries on DOB use dobHash.

Other models — change PHI columns to @db.Text:

  • SkinFinding: clinicalNotes (already Text, no change)
  • Diagnosis: freeText, notes (already Text, no change)
  • MedicationStatement: medicationFreeText, reasonFreeText, notes (some are VarChar, widen)
  • Case: clinicalContextJson (already Text, no change)
  • PatientMergeHistory: absorbedSnapshotJson (already Text, no change)
  • PatientMergeCandidate: matchFeaturesJson, candidateRankedJson (already Text, no change)

Hash columns — no change

The existing SHA-256 hash columns (emailHash, dobHash, postalCodeHash, valueHash) remain as-is. They serve equality lookup. On GDPR erasure, destroying the DEK makes the encrypted plaintext unreadable; the irreversible hashes cannot identify anyone alone.

DEK lifecycle

  • Create: On patient.create, generate a random 256-bit DEK, wrap it via KeyProvider.wrapKey(), store the wrapped DEK in patient.encryptedDek.
  • Read: On any query that returns PHI, retrieve patient.encryptedDek, unwrap via KeyProvider.unwrapKey(), decrypt fields. Cache the unwrapped DEK in RequestContext keyed by patient ID for the request lifetime.
  • Crypto-shred: On GDPR erasure, set patient.encryptedDek = null. All ciphertext in the DB (including backups) becomes permanently unreadable.

DEK resolution for non-Patient models

Models containing encrypted PHI need to resolve to a patient's DEK:

  • Casecase.patientId → Patient
  • SkinFindingfinding.caseId → Case → Patient
  • Diagnosisdiagnosis.findingId → SkinFinding → Case → Patient
  • MedicationStatementmedication.caseId → Case → Patient (direct, case has patientId)
  • PatientMergeHistorymerge.absorbedPatientId → Patient
  • PatientMergeCandidatecandidate.provisionalPatientId → Patient

The encryption middleware resolves the patient ID by following these paths. For efficiency, the middleware reads the patient's wrapped DEK in a single extra query when decrypting non-Patient model results. The unwrapped DEK is cached per-request so subsequent reads for the same patient don't re-unwrap.

Encryption field map (declarative config)

const ENCRYPTED_FIELDS: Record<string, { fields: string[]; patientIdPath: string }> = {
  Patient: {
    fields: [
      'givenName',
      'familyName',
      'dob',
      'email',
      'phone',
      'postalCode',
      'sexAtBirth',
      'genderIdentity',
    ],
    patientIdPath: 'id', // Patient IS the patient
  },
  Case: {
    fields: ['clinicalContextJson'],
    patientIdPath: 'patientId',
  },
  SkinFinding: {
    fields: ['clinicalNotes'],
    patientIdPath: '$case.patientId', // requires join
  },
  Diagnosis: {
    fields: ['freeText', 'notes'],
    patientIdPath: '$finding.$case.patientId', // requires nested join
  },
  MedicationStatement: {
    fields: ['medicationFreeText', 'reasonFreeText', 'notes'],
    patientIdPath: 'caseId->patientId', // case has patientId directly
  },
  PatientMergeHistory: {
    fields: ['absorbedSnapshotJson'],
    patientIdPath: 'absorbedPatientId',
  },
  PatientMergeCandidate: {
    fields: ['matchFeaturesJson', 'candidateRankedJson'],
    patientIdPath: 'provisionalPatientId',
  },
};

Handling DOB encryption

dob is currently a DateTime field in Prisma. After encryption it becomes a String column containing ciphertext. The schema change:

  • Rename column: dobdob remains, but type changes from DateTime @db.Date to String @db.Text
  • The service layer encrypts dob.toISOString() and decrypts back to a Date object
  • All DOB queries use dobHash for equality matching (already the case)
  • The migration must convert existing Date values to ISO strings, then encrypt them

Config additions

ENCRYPTION_MASTER_KEY=<64-char hex string for local dev>
ENCRYPTION_PROVIDER=local  # or 'kms' for production
KMS_CMK_ARN=               # required when ENCRYPTION_PROVIDER=kms

What is NOT encrypted

  • Hash columns (emailHash, dobHash, postalCodeHash, valueHash) — irreversible lookups
  • Actor snapshots (createdByActor, uploadedByActor, etc.) — deferred to admin module
  • Structural fields (IDs, statuses, timestamps, codes, S3 keys) — not PHI
  • Diagnosis code fields (codeSystem, codeValue, codeDisplay) — standardized codes, not PHI

Existing data migration

A one-time migration script that:

  1. Iterates all patients
  2. Generates a DEK for each, wraps and stores it
  3. Encrypts all plaintext PHI fields in the patient row
  4. For each patient, encrypts PHI in related models (cases, findings, diagnoses, medications, merge history, merge candidates)
  5. Runs as an idempotent script (skips already-encrypted rows by checking if encryptedDek is already set)

Plan 12b: Histology Ingestion

Data model

HistologyReport:

  • id (UUID v7)
  • organisationId (tenant scope)
  • caseId (required — belongs to a case)
  • findingId (optional — late-binding to a finding)
  • externalReference (optional — opaque reference from lab)
  • reportType (varchar — e.g., 'biopsy', 'excision', 'shave')
  • resultSummaryCiphertext (text — encrypted via patient's DEK)
  • receivedAt (datetime)
  • reportingLabSnapshotJson (text — lab identity snapshot)
  • ingestionStatus (varchar — pending/processing/processed/failed)
  • ingestionErrorCode (varchar, nullable)
  • correlationId (varchar, nullable)
  • createdAt, updatedAt

ReportFile:

  • id (UUID v7)
  • histologyReportId (FK)
  • s3Bucket, s3Key (varchar — file location)
  • mimeType (varchar)
  • contentHashSha256 (char(64), nullable — set after upload verified)
  • sizeBytes (int, nullable)
  • uploadedByActor (text — actor snapshot)
  • virusScanStatus (varchar — pending/clean/infected/skipped, default 'skipped' for v1)
  • createdAt, updatedAt

API endpoints

  • POST /v1/histology:initiatehistology:write scope. Accepts: case_id or case_external_reference, optional finding_id, report_type, result_summary (plaintext, encrypted by service), received_at, file manifest [{mime_type, expected_content_hash?}]. Returns: histology_report_id, file_uploads: [{file_id, upload_url}], status_url.
  • GET /v1/histology/{id}histology:read scope. Returns report metadata with encrypted fields decrypted.
  • PATCH /v1/histology/{id}histology:write scope. Reassign finding_id (late-binding).
  • POST /v1/histology/{id}/processhistology:write scope. Local dev trigger. Validates all files uploaded, marks processed.
  • GET /v1/status/histology/{id} — via async status adapter (Plan 10).

Lab case resolution

Labs supply case_external_reference instead of case_id. The service resolves it:

case = prisma.case.findFirst({ where: { organisationId, externalReference } })

Labs cannot enumerate cases — they must know the external reference.

Late-binding to findings

  • If finding_id is supplied at initiate: validate the finding belongs to the same case, attach.
  • If omitted: report belongs to the case only. PATCH /v1/histology/{id} with finding_id attaches later.
  • Case status never blocks histology attachment.

Scope boundaries

  • ✅ HistologyReport + ReportFile models, initiate with multi-file presigned URLs, process, get, reassign finding, status adapter, integration tests.
  • ❌ No actual virus scanning (virusScanStatus defaults to 'skipped').
  • ❌ No content hash verification (expected_content_hash accepted but not enforced in v1).
  • ❌ No Lambda/SQS pipeline.

Testing strategy

Plan 12a tests

  • Unit: CryptoService (encrypt/decrypt round-trip, DEK generation, wrap/unwrap via LocalKeyProvider)
  • Unit: Encryption middleware (mocked Prisma, verifies fields are encrypted on write and decrypted on read)
  • Integration: Full patient CRUD round-trip with encryption (create patient → read back decrypted → verify DB contains ciphertext)
  • Integration: Cross-model encryption (create patient + case + finding + diagnosis → verify all PHI encrypted in DB, all decrypted on read)
  • Integration: Crypto-shred (null out DEK → verify reads return null/empty for PHI fields)

Plan 12b tests

  • Unit: HistologyService (initiate, process, findById, reassign)
  • Unit: HistologyStatusAdapter (maps ingestion status to async status shape)
  • Integration: Full histology upload lifecycle (initiate → upload files → process → get → reassign finding)
  • Integration: Lab case resolution via external_reference
  • Integration: Cross-tenant isolation

Open decisions

None — all key decisions resolved during brainstorming:

  • Local key provider for dev, KMS for production
  • SHA-256 hashes retained for lookup, GCM encryption for plaintext fields
  • Actor snapshots deferred to admin module
  • Histology follows image ingestion pattern with multi-file support