Agent-First Design

How we build radiology tools so AI agents can work with hospital systems without ever seeing patient data.

The Problem

Cloud-based AI agents send their entire context to external servers. If an agent interacts with a hospital PACS or HL7 system using conventional tools, patient names, IDs, and birth dates flow into the agent context and out of the hospital network. This violates GDPR, hospital data policies, and common sense.

Traditional DICOM and HL7 tools were built for humans. They output unstructured text full of patient identifiers. An agent parsing that output will inevitably ingest PHI.

The Pattern

Every tool we build follows the same principle: the tool is a privacy gateway. It sits on the hospital network, handles PHI internally, and exposes only de-identified, structured data to the agent.

Cloud (Agent)                    Hospital Network
+-------------------+            +----------------------------+
| AI Agent          |            | Privacy Gateway            |
|                   | accession  |                            |
| Never sees:       +----------->+-> PACS / RIS / HL7         |
| - patient names   | case_id   |                            |
| - patient IDs     <-----------+  Handles internally:       |
| - birth dates     |            |  - patient names           |
| - any PHI         |            |  - patient IDs             |
+-------------------+            |  - all identifiers         |
                                 +----------------------------+

Six Design Principles

1. No PHI in agent context

The tool handles patient identifiers internally and returns only de-identified or coded data. The agent never needs to see a patient name. Input is accession numbers. Output is pseudonymized case IDs with metadata.

2. Structured, machine-readable output

JSON by default, not human text that agents have to parse with regex. This is the inverse of most tools. Human-readable output is an optional flag for debugging.

3. Idempotent operations

Agents retry on failure. Every tool handles duplicate requests gracefully. Already-loaded studies are skipped with status "skipped", not re-downloaded or errored.

4. Built-in verification

Agents cannot visually inspect results. Tools provide programmatic quality checks: image counts, series counts, outlier detection comparing cases within a project. The agent assesses data quality through structured fields, not by opening files.

5. Audit trail

Every action is logged: who, what, when, outcome. The audit log itself contains no PHI, only accession numbers and case IDs. Supports compliance reporting and forensic investigation.

6. Accession-based addressing

Accession numbers identify studies without revealing patient identity. They are the universal unit of work. No tool accepts patient names or IDs as input.

How It Works: DICOM

Our DICOM loader (agent-rad-tools) implements mandatory, bypass-proof anonymization:

Allowlist, not blocklist. Only explicitly whitelisted DICOM tags survive. Everything else is deleted. This is safer than trying to enumerate every possible PHI tag, because unknown or vendor-specific tags default to removal.

What is preserved: accession number, study/series UIDs, modality, study date, acquisition parameters (TR, TE, slice thickness, kVp, mAs), pixel data, spatial geometry. Everything needed for research.

What is removed: patient name, patient ID, birth date, referring physician, performing physician, institution name, all private tags, all unknown tags. Patient name and ID fields are overwritten with the pseudonymized case ID.

Anonymization happens immediately when images arrive via C-STORE, before they touch disk. There is no code path that exposes raw PHI to the agent.

How It Works: HL7

Our HL7 engine (Arina) applies the same pattern at the database layer. Agent-facing query tools connect as a read-only database user that can only access PID-safe views. The agent can query message flow rates, study timelines, error rates, and modality distributions, but cannot see raw HL7 content, patient identifiers, or clinical text.

Agent Workflow Example

# Researcher provides accession numbers (non-PHI)
$ rad-loader echo                              # test connection
$ rad-loader query VAR9946804                  # check study exists
$ rad-loader load brain-study --file list.txt  # download + anonymize
$ rad-loader status brain-study                # outlier detection

# Agent reports back with case IDs, never patient names:
#   case0001 (VAR9946804): MR, 5 series, 450 images
#   case0002 (VAR9946805): MR, 5 series, 380 images
#   Warning: case0003 has 1 series (median is 5)

The agent orchestrates the entire research data loading workflow. It verifies completeness, flags outliers, and reports to the researcher. At no point does it see who the patients are.

Beyond PACS

The same pattern applies to every tool in our stack:

rad.pacs - PACS administration queries return aggregate statistics, not patient-level data
rad.dose - Dose tracking aggregates exposure data without patient identifiers
rad.qa-viewer - Uses agent-rad-tools as its PACS gateway for loading comparison images
rad.hr / rad.siso - Scheduling and HR have no patient data by design, only staff data

The principle is always the same: local privacy gateway, mandatory anonymization, structured output, audit trail. This is how AI agents can be useful in a hospital without becoming a liability.