Encryption at Rest + Histology Ingestion — Design Spec¶
Overview¶
Two related changes delivered as sequential plans:
- Plan 12a adds application-level envelope encryption for all PHI fields, using per-patient data encryption keys (DEKs) and AES-256-GCM. Transparent to service code via Prisma middleware.
- Plan 12b adds histology report ingestion following the same two-phase upload pattern as images, with encrypted fields from day one.
Plan 12a: Encryption at Rest¶
Architecture¶
Three components:
- KeyProvider — interface for wrapping/unwrapping DEKs. Two implementations:
LocalKeyProvider(dev/test) — uses a static 256-bit master key fromENCRYPTION_MASTER_KEYenv var. Wrap = AES-256-GCM encrypt the DEK with the master key. Unwrap = decrypt.KmsKeyProvider(production) — uses AWS KMSEncrypt/DecryptAPI with a dedicated CMK ARN fromKMS_CMK_ARNenv var.- CryptoService — generates random 256-bit DEKs, encrypts/decrypts individual field values using AES-256-GCM with random per-value IVs. Caches unwrapped DEKs in
RequestContextfor the request lifetime. - Encryption Prisma Extension — a
$extendslayer (similar to the existing tenancy extension) that intercepts create/update to encrypt PHI fields before write, and intercepts read results to decrypt after. Configured with a declarative map of model → field list.
Encrypted value format¶
Each encrypted field is stored as a single string: base64(iv):base64(ciphertext):base64(authTag).
- IV: 12 bytes (random per value)
- Auth tag: 16 bytes
- Ciphertext: variable length
This fits in a MySQL TEXT column.
Schema changes¶
Patient model:
- Add
encryptedDekfield (@db.VarChar(512), stores the wrapped DEK as base64)
Widen PHI columns to @db.Text (ciphertext is longer than plaintext):
- Patient:
givenName,familyName,email,phone,postalCode,sexAtBirth,genderIdentity - Patient:
dob— stored as encrypted ISO-8601 string; the Prisma type staysString(notDateTime) since the DB column holds ciphertext. Queries on DOB usedobHash.
Other models — change PHI columns to @db.Text:
- SkinFinding:
clinicalNotes(already Text, no change) - Diagnosis:
freeText,notes(already Text, no change) - MedicationStatement:
medicationFreeText,reasonFreeText,notes(some are VarChar, widen) - Case:
clinicalContextJson(already Text, no change) - PatientMergeHistory:
absorbedSnapshotJson(already Text, no change) - PatientMergeCandidate:
matchFeaturesJson,candidateRankedJson(already Text, no change)
Hash columns — no change¶
The existing SHA-256 hash columns (emailHash, dobHash, postalCodeHash, valueHash) remain as-is. They serve equality lookup. On GDPR erasure, destroying the DEK makes the encrypted plaintext unreadable; the irreversible hashes cannot identify anyone alone.
DEK lifecycle¶
- Create: On
patient.create, generate a random 256-bit DEK, wrap it viaKeyProvider.wrapKey(), store the wrapped DEK inpatient.encryptedDek. - Read: On any query that returns PHI, retrieve
patient.encryptedDek, unwrap viaKeyProvider.unwrapKey(), decrypt fields. Cache the unwrapped DEK inRequestContextkeyed by patient ID for the request lifetime. - Crypto-shred: On GDPR erasure, set
patient.encryptedDek = null. All ciphertext in the DB (including backups) becomes permanently unreadable.
DEK resolution for non-Patient models¶
Models containing encrypted PHI need to resolve to a patient's DEK:
Case→case.patientId→ PatientSkinFinding→finding.caseId→ Case → PatientDiagnosis→diagnosis.findingId→ SkinFinding → Case → PatientMedicationStatement→medication.caseId→ Case → Patient (direct, case has patientId)PatientMergeHistory→merge.absorbedPatientId→ PatientPatientMergeCandidate→candidate.provisionalPatientId→ Patient
The encryption middleware resolves the patient ID by following these paths. For efficiency, the middleware reads the patient's wrapped DEK in a single extra query when decrypting non-Patient model results. The unwrapped DEK is cached per-request so subsequent reads for the same patient don't re-unwrap.
Encryption field map (declarative config)¶
const ENCRYPTED_FIELDS: Record<string, { fields: string[]; patientIdPath: string }> = {
Patient: {
fields: [
'givenName',
'familyName',
'dob',
'email',
'phone',
'postalCode',
'sexAtBirth',
'genderIdentity',
],
patientIdPath: 'id', // Patient IS the patient
},
Case: {
fields: ['clinicalContextJson'],
patientIdPath: 'patientId',
},
SkinFinding: {
fields: ['clinicalNotes'],
patientIdPath: '$case.patientId', // requires join
},
Diagnosis: {
fields: ['freeText', 'notes'],
patientIdPath: '$finding.$case.patientId', // requires nested join
},
MedicationStatement: {
fields: ['medicationFreeText', 'reasonFreeText', 'notes'],
patientIdPath: 'caseId->patientId', // case has patientId directly
},
PatientMergeHistory: {
fields: ['absorbedSnapshotJson'],
patientIdPath: 'absorbedPatientId',
},
PatientMergeCandidate: {
fields: ['matchFeaturesJson', 'candidateRankedJson'],
patientIdPath: 'provisionalPatientId',
},
};
Handling DOB encryption¶
dob is currently a DateTime field in Prisma. After encryption it becomes a String column containing ciphertext. The schema change:
- Rename column:
dob→dobremains, but type changes fromDateTime @db.DatetoString @db.Text - The service layer encrypts
dob.toISOString()and decrypts back to a Date object - All DOB queries use
dobHashfor equality matching (already the case) - The migration must convert existing Date values to ISO strings, then encrypt them
Config additions¶
ENCRYPTION_MASTER_KEY=<64-char hex string for local dev>
ENCRYPTION_PROVIDER=local # or 'kms' for production
KMS_CMK_ARN= # required when ENCRYPTION_PROVIDER=kms
What is NOT encrypted¶
- Hash columns (emailHash, dobHash, postalCodeHash, valueHash) — irreversible lookups
- Actor snapshots (createdByActor, uploadedByActor, etc.) — deferred to admin module
- Structural fields (IDs, statuses, timestamps, codes, S3 keys) — not PHI
- Diagnosis code fields (codeSystem, codeValue, codeDisplay) — standardized codes, not PHI
Existing data migration¶
A one-time migration script that:
- Iterates all patients
- Generates a DEK for each, wraps and stores it
- Encrypts all plaintext PHI fields in the patient row
- For each patient, encrypts PHI in related models (cases, findings, diagnoses, medications, merge history, merge candidates)
- Runs as an idempotent script (skips already-encrypted rows by checking if
encryptedDekis already set)
Plan 12b: Histology Ingestion¶
Data model¶
HistologyReport:
id(UUID v7)organisationId(tenant scope)caseId(required — belongs to a case)findingId(optional — late-binding to a finding)externalReference(optional — opaque reference from lab)reportType(varchar — e.g., 'biopsy', 'excision', 'shave')resultSummaryCiphertext(text — encrypted via patient's DEK)receivedAt(datetime)reportingLabSnapshotJson(text — lab identity snapshot)ingestionStatus(varchar — pending/processing/processed/failed)ingestionErrorCode(varchar, nullable)correlationId(varchar, nullable)createdAt,updatedAt
ReportFile:
id(UUID v7)histologyReportId(FK)s3Bucket,s3Key(varchar — file location)mimeType(varchar)contentHashSha256(char(64), nullable — set after upload verified)sizeBytes(int, nullable)uploadedByActor(text — actor snapshot)virusScanStatus(varchar — pending/clean/infected/skipped, default 'skipped' for v1)createdAt,updatedAt
API endpoints¶
POST /v1/histology:initiate—histology:writescope. Accepts:case_idorcase_external_reference, optionalfinding_id,report_type,result_summary(plaintext, encrypted by service),received_at, file manifest[{mime_type, expected_content_hash?}]. Returns:histology_report_id,file_uploads: [{file_id, upload_url}],status_url.GET /v1/histology/{id}—histology:readscope. Returns report metadata with encrypted fields decrypted.PATCH /v1/histology/{id}—histology:writescope. Reassignfinding_id(late-binding).POST /v1/histology/{id}/process—histology:writescope. Local dev trigger. Validates all files uploaded, marks processed.GET /v1/status/histology/{id}— via async status adapter (Plan 10).
Lab case resolution¶
Labs supply case_external_reference instead of case_id. The service resolves it:
case = prisma.case.findFirst({ where: { organisationId, externalReference } })
Labs cannot enumerate cases — they must know the external reference.
Late-binding to findings¶
- If
finding_idis supplied at initiate: validate the finding belongs to the same case, attach. - If omitted: report belongs to the case only.
PATCH /v1/histology/{id}withfinding_idattaches later. - Case status never blocks histology attachment.
Scope boundaries¶
- ✅ HistologyReport + ReportFile models, initiate with multi-file presigned URLs, process, get, reassign finding, status adapter, integration tests.
- ❌ No actual virus scanning (virusScanStatus defaults to 'skipped').
- ❌ No content hash verification (expected_content_hash accepted but not enforced in v1).
- ❌ No Lambda/SQS pipeline.
Testing strategy¶
Plan 12a tests¶
- Unit: CryptoService (encrypt/decrypt round-trip, DEK generation, wrap/unwrap via LocalKeyProvider)
- Unit: Encryption middleware (mocked Prisma, verifies fields are encrypted on write and decrypted on read)
- Integration: Full patient CRUD round-trip with encryption (create patient → read back decrypted → verify DB contains ciphertext)
- Integration: Cross-model encryption (create patient + case + finding + diagnosis → verify all PHI encrypted in DB, all decrypted on read)
- Integration: Crypto-shred (null out DEK → verify reads return null/empty for PHI fields)
Plan 12b tests¶
- Unit: HistologyService (initiate, process, findById, reassign)
- Unit: HistologyStatusAdapter (maps ingestion status to async status shape)
- Integration: Full histology upload lifecycle (initiate → upload files → process → get → reassign finding)
- Integration: Lab case resolution via external_reference
- Integration: Cross-tenant isolation
Open decisions¶
None — all key decisions resolved during brainstorming:
- Local key provider for dev, KMS for production
- SHA-256 hashes retained for lookup, GCM encryption for plaintext fields
- Actor snapshots deferred to admin module
- Histology follows image ingestion pattern with multi-file support