Skip to content

Retention + GDPR Erasure

This document describes the data retention policy mechanism, the crypto-shredding erasure approach, the GDPR Article 17 right-to-erasure flow, legal holds, and data subject access requests (DSARs). All code citations are verified against the source.


1. Retention Policy Mechanism

Configuration model

Retention is governed by RetentionPolicy rows in the clinical-api database (services/clinical-api/prisma/schema.prisma):

RetentionPolicy {
  organisationId  // tenant
  entityType      // e.g. "Patient", "Case"
  retentionDays   // how long to keep after the relevant date
  action          // default: "soft_delete"
  active
}

Each Organisation can have one RetentionPolicy per entity type (unique constraint on [organisationId, entityType]).

Enforcement

The retention module is located at services/clinical-api/src/retention/retention.service.ts and is exposed via the RetentionController (services/clinical-api/src/retention/retention.controller.ts).

Planned: automated nightly cron execution of retention policies is not shipped in v1. The data model, API endpoints, and service logic exist, but no scheduler or cron job invokes the retention scan automatically. Enforcement must currently be triggered out-of-band (e.g. via direct API call or an external scheduler).

Retention API endpoints (all require events:read scope in v1 — note: scope assignment here may be reviewed in a future plan):

GET  /v1/clinical-api/retention/policies
POST /v1/clinical-api/retention/policies
POST /v1/clinical-api/retention/legal-holds
DELETE /v1/clinical-api/retention/legal-holds/:id
POST /v1/clinical-api/retention/erase

2. Crypto-Shredding

Crypto-shredding is the mechanism used to make encrypted patient data unreadable without physically deleting every encrypted column. The process is:

  1. The patient's encryptedDek column in the Patient row is set to null.
  2. Without the DEK, every AES-256-GCM ciphertext encrypted under that key becomes computationally unrecoverable.
  3. The patient row itself is soft-deleted (deletedAt = now()).

Implementation in services/clinical-api/src/retention/erasure.service.ts:

// 1. Crypto-shred: null out the DEK
await this.prisma.patient.update({
  where: { id: patientId },
  data: { encryptedDek: null, deletedAt: new Date() },
});

The DekResolver (packages/common/src/crypto/dek-resolver.ts) reads encryptedDek on every request; once it is null, all decrypt calls for that patient will throw "No encryption key found for patient: <id>", making the data inaccessible at the application layer.

The underlying key wrapping is provided by LocalKeyProvider (dev) or KmsKeyProvider (prod) — see Security Model §3.


3. GDPR Article 17 Erasure (Right to Erasure)

The erasure flow is implemented in services/clinical-api/src/retention/erasure.service.tsErasureService.erasePatient(patientId).

The flow proceeds as follows:

  1. Tenant verification: The requesting client's organisationId (from the OAuth token) must match the patient's organisationId. Cross-tenant erasure is rejected.

  2. Legal hold check: If any LegalHold row exists for the patient with releasedAt = null, erasure is blocked and the reason is returned:

{ erased: false, reason: "Patient is under legal hold: <reason>" }
  1. Crypto-shred: encryptedDek is nulled and deletedAt is set (see §2 above).

  2. S3 object deletion: The service enumerates all cases for the patient, then all findings per case, then all images per finding, and deletes each S3 object (StorageService.deleteObject(bucket, key)). S3 deletion errors are swallowed to avoid partial failures blocking the DEK null.

  3. Histology report file deletion: S3 objects for all ReportFile records attached to the patient's cases are also deleted.

The erasure endpoint:

POST /v1/clinical-api/retention/erase
Body: { "patient_id": "<uuid>" }

Synthetic-org bulk purge

A separate POST /v1/clinical-api/admin/organisations/:id/purge-data endpoint (admin-secret bearer) clears every clinical record AND the org's S3 image / image-derivative / report-file objects in one call. Refuses non-synthetic orgs with 403. Returns purgedCounts including s3Objects (deleted) + s3ObjectsFailed (per-key errors that were logged but didn't fail the transaction). Used by the simulator workflow to reset the synthetic tenant between load runs without leaking 35 GiB of orphan MinIO objects.

What is retained after erasure:

  • Patient row (soft-deleted, encryptedDek = null) — required for audit linkage.
  • AuditLog rows — audit records are not deleted to preserve the audit trail.
  • LegalHold rows — remain for compliance history.
  • PatientMergeHistory — merge history is retained for regulatory purposes.
  • Database rows for Case, SkinFinding, Diagnosis, etc. — row shells remain but their encrypted content fields are rendered unreadable.

Limitation in v1: Encrypted content in ai-review (AiReviewResult.rawCiphertext) is not deleted by the erasure service. The raw ciphertext uses the patient's DEK indirectly (it is encrypted with a DEK fetched for the case), but the ErasureService does not explicitly null or delete ai_review_result rows. The DEK nulling makes the ciphertext unreadable, but the ciphertext bytes remain on disk. This should be addressed in a future plan.


A LegalHold record prevents erasure for patients subject to litigation, regulatory investigation, or other legal requirements. The model is defined in services/clinical-api/prisma/schema.prisma:

LegalHold {
  id             PK
  organisationId FK → Organisation
  patientId      FK → Patient
  reason         VARCHAR(255)
  holdSince      default now()
  releasedAt     DateTime?   // null = active hold
}

A patient may have multiple holds. An erasure request will be rejected if any hold has releasedAt = null.

Holds are managed via:

POST   /v1/clinical-api/retention/legal-holds        { patient_id, reason }
DELETE /v1/clinical-api/retention/legal-holds/:id    (sets releasedAt = now())

5. Data Subject Access Requests (DSAR)

Planned: there is no dedicated DSAR export endpoint in v1.

A data subject's data can be retrieved by combining the existing read endpoints:

  • GET /v1/clinical-api/patients/:id — patient record
  • GET /v1/clinical-api/patients/:id/cases — cases
  • GET /v1/clinical-api/cases/:id/findings — skin findings and diagnoses
  • GET /v1/clinical-api/patients/:id/consents (via consent service)

This process is currently manual. An automated DSAR export endpoint (collating all data for a patient into a structured package) is flagged as future work.