Guides

Data flow auditing with Trace

Per-entity audit reports that follow one order, product, or invoice across every topology that ever touched it, with a real delivery status at each boundary.


Operational dashboards answer "is the platform healthy?". Trace answers a different question: "what happened to this order, end to end, from the moment it entered the integration estate until it landed (or failed to land) on the other side?". The view is per-entity, not per-process: one timeline per order, per SKU, per invoice, per email, no matter how many topologies and external systems it touched along the way.

Trace is a paid auditing capability built on top of the records the platform already produces. It does not run a parallel logging system; it adds two opt-in mechanisms to the things you build (entity tagging and audit checkpoints), and turns the result into a per-entity report you can read from the Admin UI or query through an API.

Where Trace is available. Trace is included out of the box in Enterprise Edition (dedicated cloud or self-hosted) and in Pro Level 3. Customers on Pro Level 1 and Level 2 can enable Trace as a paid add-on. The Starter plan and Community Edition still record every process, but the per-entity report and the boundary-aware delivery status are not part of those tiers. The setup walkthrough lives in Operations: Trace auditing.

The two-layer model #

Trace is built from two independent layers. Each one solves a different problem; you can adopt them separately.

Layer 1: entity tagging #

Every message that flows through a topology can carry a small label that says "this message represents the order with id ord-017 and tracking number TRK-100017". The platform indexes those labels so that, given any entity identifier you defined, it can list every process and every node that ever saw a message carrying it.

You declare in the Admin UI which entities exist (order, product, invoice, …) and which fields are searchable for each one (id, SKU, EAN, email, …). Then a connector that pulls or produces an order stamps the corresponding label onto the message. From that moment on, the entity is searchable in Trace.

Without this layer, Trace cannot map a value back to a real-world record. Operationally that means: stamp the label as soon as you have the identity, typically right inside the input connector that loaded the data from an external system.

Layer 2: audit checkpoints #

The label tells Trace that an entity passed through a process. Audit checkpoints capture what state it was in at significant business steps, and whether the operation actually delivered.

A checkpoint is declared on a node in the topology with three things:

  • a role (process_entry, process_step, process_exit),
  • an allowlist of fields to capture from the message body,
  • optionally, a path into the body if the entity is wrapped (for example order when the body is { order: {...} }).

When the message passes that node, the platform writes a single structured log line containing the snapshot, the role, and a delivery status (Delivered, Failed, Repeating, Limit, Trashed, Unknown). The delivery status is the part that makes Trace different from "I logged it before the call": it reflects what the external system actually did, because the log line is emitted after the boundary call returns.

What you actually get #

  • One timeline per entity. "Show me everything that happened to order ord-017" opens a card with entry, optional steps, and exit for every run, in time order, across every topology that touched it.
  • Real delivery status at each boundary. A green Delivered badge means the external system returned success. A red Failed badge means the call ran and was rejected (with the rejection message visible inline). Repeating and Limit cover backoff and rate-limit cases that would otherwise look identical to "still running".
  • GDPR / PCI marker mode. For sensitive entities you can declare a checkpoint that records that the entity passed the boundary without capturing any field values. Operators see the timeline and the delivery status; the body never reaches the log store.
  • Failed-call evidence inline. When a downstream system rejects a call, the rejection message (truncated, sanitized) shows directly on the timeline card. Operators triage and dispute without opening the underlying log store.
  • AI-ready audit data. The same per-entity report is exposed through an API so internal AI assistants can answer "what happened to invoice 8841 last quarter?" in plain language, scoped to the asking user's permissions. The conversational interface itself is in finishing; the audit data is already there.

The boundary connector pattern #

The single most common modeling mistake with checkpoints is also the easiest to avoid: declare entry/exit checkpoints on the boundary connector itself, not on a passthrough node placed in front of it.

The reason is delivery status. The platform emits the audit log line after the boundary call returns. When the checkpoint sits on the connector that calls the external system, the status reflects the actual outcome of that call. When the checkpoint sits on a passthrough node placed in front of the connector, there is nothing for the passthrough to fail at, so the status is always Delivered, even when the downstream call is rejected. Such a checkpoint is misleading by construction.

Use a passthrough audit node only for process_step markers in the middle of a chain ("order validated", "inventory reserved"), or for boundary nodes that are not connectors (a custom webhook receiver, for example).

Granularity and retention #

Trace adds one log line per checkpoint per message. That cost matters at high volume.

A */15 * * * * topology with a 1M-item batch and a process_step checkpoint placed after the split would produce 1 M log lines every quarter hour. Three rules of thumb keep this under control:

  • Checkpoint on the batch node (before the split). One log line per batch, with a narrow allowlist that fits in the per-line size cap.
  • Marker mode (allowlist intentionally empty) for hot points where you only need the signal "passed this step", not the body.
  • No checkpoint at all behind a high-volume fan-out. The entity tagging from Layer 1 still lets Trace assemble the per-entity timeline; it just won't have a boundary snapshot at every step.

The history horizon is the same as for any other operational record on the platform: it is bounded by the log retention configured for your instance. Holding an unbounded multi-year audit trail at the per-message level is not currently sustainable; reports cover the configured retention window (typically days to weeks depending on tier and instance). Plan retention to match the audit horizon you actually need.

Security by construction #

Trace is opinionated about what cannot leak into the audit log:

  • The allowlist is mandatory. There is no "log everything" fallback. A checkpoint with no allowlist is rejected outright. This is a forcing function for privacy review: every captured field is a deliberate choice.
  • Wildcard fields are not supported. Same reason: every field has to pass through a developer's eyes.
  • Last-resort regex masking. Anything resembling a secret (password, token, api_key, auth, secret) is masked even if it slips into an allowlist by accident.
  • A hard size cap per log line prevents body bloat and partial-payload leakage. Oversized snapshots are replaced with a marker that records the original size.
  • Sanitized request headers. Authorization, cookies, and similar are stripped from the log line that the audit pipeline consumes; they never make it into the audit record.

For genuinely sensitive entities (PCI scope, GDPR special categories), the recommended pattern is marker mode plus a derived hash (for example a SHA-256 of the email) prepared inside the connector. The audit shows that the entity passed; the raw value is never logged.

Where next #