Trace auditing

Where Trace is available. Trace is included out of the box in Enterprise Edition (dedicated cloud or self-hosted) and in Pro Level 3. Customers on Pro Level 1 and Level 2 can enable Trace as a paid add-on. The audit pipeline (entity index in Mongo, audit log lines in Loki, and the audit MCP) ships with these tiers — the Starter plan and Community Edition do not include it.

Trace turns the records the platform already produces into a per-entity timeline: "show me everything that happened to order ord-017" opens one card per run, with a process_entry, optional process_steps and a process_exit, each carrying a snapshot of the entity at that boundary and a real delivery status. The conceptual overview lives in Learn: Data flow auditing with Trace; this page is the developer reference for wiring it up.

The two layers #

Layer	Question it answers	Where it is declared	Where it is stored
`AuditEntity` + `audit-entity` header (per-message)	"Which entities (id, SKU, EAN, …) appeared in which `correlationId`?"	Connector calls `dto.addAuditHeader(...)` or `dto.addItemWithAudit(...)`	Mongo collection `audit_data` (Trace MCP indexes it)
`getAuditCheckpoint()` on a node (per-business-step)	"What did the entity look like at entry / step / exit, and did the call actually deliver?"	Override on a boundary `AConnector` (preferred for entry / exit) or on an `AuditCheckpointNode` passthrough (for `process_step`)	Loki — structured INFO log lines `auditCheckpoint.{role, payload, resultCode, resultStatus, resultMessage, httpStatus}`

The two layers are independent. The header alone gives you "the entity passed". The checkpoint alone gives you boundary snapshots without per-entity grouping. Used together you get the full Trace report.

Node.js SDK only (today). The addAuditHeader, addItemWithAudit, getAuditCheckpoint and AuditCheckpointNode helpers currently exist only in @orchesty/nodejs-sdk. PHP nodes can still receive Trace coverage if a Node.js boundary node sets the audit-entity and audit-checkpoint headers on their behalf, but the helper API is Node.js-only at this time.

Step 1: define the AuditEntity in the Admin UI #

AuditEntity is the platform-wide definition of one business entity. Without it Trace has no way to map the values from the audit-entity header to a real-world record. Create one entity per business domain, typically once during rollout.

Open Trace → Audit entities → New entity.
Fill in:
- key — stable identifier the connector uses as the first argument of dto.addAuditHeader('order', ...). Lower-case, snake/kebab, no spaces. Must be unique across the installation. Example: order, product, invoice.
- name — human-readable label shown in the UI. Example: Order, Product, Invoice.
- fields — searchable fields, each with its own key and name. The Trace UI exposes them as filters. Example for an order: id, trackingId. Example for a product: id, externalId, SKU, EAN.
Save. The audit_entity collection is updated.

The UI's fields are search facets (which columns the user can filter Trace results by). The SDK's IAuditCheckpoint.fields is the log allowlist (which columns end up inside auditCheckpoint.payload in Loki). They often overlap but are not the same list.

Recommendations:

Start minimal. A couple of fields you actually search on. Add more later.
id is almost always the right primary key — it anchors per-entity history queries from any audit consumer.
Never rename key once entries exist in audit_data; existing records reference it directly.

Step 2: tag messages with the `audit-entity` header #

The audit-entity header is the contract between the connector and the Bridge. The connector sets it via dto.addAuditHeader(...); the Bridge writes a per-message record into audit_data linking correlationId ↔ entity values.

Single-message connector #

import AConnector from '@orchesty/nodejs-sdk/dist/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/ProcessDto';

export default class FetchOrderConnector extends AConnector {

    public getName(): string {
        return 'fetch-order-connector';
    }

    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        const order = await fetchOrderFromCrm(dto);
        dto.setJsonData(order);

        dto.addAuditHeader('order', 'id', [{
            id: order.id,
            trackingId: order.trackingId,
        }]);

        return dto;
    }

}

Arguments:

Argument	Meaning
`entity` (`'order'`)	Must match a `key` from `AuditEntity` in the Admin UI.
`key` (`'id'`)	Which field is the primary identifier (Trace uses it as the anchor for per-entity queries).
`fields`	Concrete values that are present in this message. Not a template, not derived.

Batch connector — use `addItemWithAudit`, not `addAuditHeader` #

In a batch, never combine dto.addItem(o) with dto.addAuditHeader(...) on the batch DTO. The Bridge copies the parent's headers onto every child message via CopyBatchItem; if the parent's audit-entity carries all N entities, every child message will reference all N, and per-entity Trace breaks. Use the per-item helper instead, which scopes each audit-entity to the single item it represents:

import ABatchNode from '@orchesty/nodejs-sdk/dist/lib/Batch/ABatchNode';
import BatchProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/BatchProcessDto';

export default class AllOrdersBatch extends ABatchNode {

    public getName(): string {
        return 'all-orders-batch';
    }

    public async processAction(dto: BatchProcessDto): Promise<BatchProcessDto> {
        ORDERS.forEach((order) => {
            dto.addItemWithAudit(
                order,
                'order',
                'id',
                [{ id: order.id, trackingId: order.trackingId }],
            );
        });

        return dto;
    }

}

Where not to add the header #

On nodes that introduce no new identity (a pure passthrough transformer).
Before the entity exists (a cron-trigger or random generator that has nothing to identify yet). Add the header on the first node that produces it.

Step 3: declare audit checkpoints #

A checkpoint is an explicit point in the topology where the Bridge writes a structured log line to Loki containing a snapshot of allowlisted fields plus the delivery status. Any node declares one by overriding getAuditCheckpoint(): IAuditCheckpoint | null (the default returns null, meaning "neutral node, no audit").

The `IAuditCheckpoint` spec #

export interface IAuditCheckpoint {
    role: 'process_entry' | 'process_step' | 'process_exit';
    /** Dot-path to the entity inside the request body. Defaults to `$` (root). */
    entityPath?: string;
    /** REQUIRED allowlist of fields to extract from the entity. `[]` = marker only. */
    fields: string[];
}

3a) Override on the boundary connector — preferred for entry / exit #

The Bridge emits the audit log line after processAction returns. The resultStatus therefore reflects whether the boundary call actually succeeded (success / failed / repeat / trashed / limit). That is exactly what an entry / exit audit needs to record.

import { IAuditCheckpoint } from '@orchesty/nodejs-sdk/dist/lib/Commons/IAuditCheckpoint';
import AConnector from '@orchesty/nodejs-sdk/dist/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/ProcessDto';

export default class MockErpOutputConnector extends AConnector {

    public getName(): string {
        return 'mock-erp-output-connector';
    }

    public getAuditCheckpoint(): IAuditCheckpoint {
        return {
            role: 'process_exit',
            fields: ['id', 'orderNumber', 'trackingId', 'erpReferenceNumber', 'status'],
        };
    }

    public async processAction(dto: ProcessDto<IOrder>): Promise<ProcessDto<IOrder>> {
        await this.callErp(dto.getJsonData());
        return dto;
    }

}

A passthrough placed in front of this connector cannot fail (it has no external call), so its audit log would always say success even when the downstream ERP rejected the order. The override on the connector itself avoids that pitfall.

3b) Override on `AuditCheckpointNode` — for `process_step` and non-connector boundaries #

Use the dedicated passthrough node when the audit point is in the middle of a chain, or when the boundary is not an AConnector (a custom webhook receiver, for example).

import AuditCheckpointNode from '@orchesty/nodejs-sdk/dist/lib/Commons/AuditCheckpointNode';
import { IAuditCheckpoint } from '@orchesty/nodejs-sdk/dist/lib/Commons/IAuditCheckpoint';

export default class OrderValidatedAudit extends AuditCheckpointNode {

    public getName(): string {
        return 'order-validated-audit';
    }

    public getAuditCheckpoint(): IAuditCheckpoint {
        return {
            role: 'process_step',
            fields: ['id', 'status', 'validationErrors'],
        };
    }

}

For entry / exit through a passthrough, resultStatus is always success (a passthrough has nothing to fail on). Real delivery status on entry / exit only comes from declaring the audit on the boundary connector itself.

The audit log line #

The Bridge writes the following INFO log to Loki for every audit checkpoint:

{
    "level": "info",
    "auditCheckpoint": {
        "role": "process_exit",
        "payload": { "id": "ord-017", "trackingId": "TRK-100017", "erpReferenceNumber": "ERP-9921" },
        "resultCode": 0,
        "resultStatus": "success",
        "resultMessage": "",
        "httpStatus": 200
    },
    "correlationId": "...",
    "topologyName": "create-order",
    "nodeName": "mock-erp-output-connector"
}

resultStatus is computed by audit.ClassifyStatus(resultCode, httpStatus) on the Bridge:

SDK `ResultCode`	HTTP status	`resultStatus`
`0` (`SUCCESS`)	2xx	`success`
`1001` (`DO_NOT_CONTINUE`)	2xx	`success` (terminal but OK)
`1002` (`REPEAT`) / `1004` (`FORWARD_TO_REPEATER`) / `1010` (`LIMIT_EXCEEDED`)	any	`repeat`
`1003` (`STOP_AND_FAILED`) / `1006` (`SPLITTER_BATCH_END_WITH_ERROR`)	any	`failed`
`1005` (`SPLITTER_BATCH_END`)	2xx	`success`
`1009` (`MESSAGE_WILL_BE_TRASHED`)	any	`trashed`
`1011` (`MESSAGE_LIMIT`)	any	`limit`
other / unset	5xx	`failed`
other / unset	4xx	`unknown`

Each resultStatus value corresponds to a delivery status — Delivered, Failed, Repeating, Limit, Trashed, or Unknown — that audit consumers (the audit MCP and AI assistants on top of it, custom dashboards, downstream ETL) surface to operators.

`fields` semantics #

Form	Meaning
`fields: ['id', 'totalAmount']`	Allowlist. The Bridge picks only the listed fields from the request body and writes them into `auditCheckpoint.payload`.
`fields: []`	Marker mode. A log line is emitted but the `payload` key is omitted. Use for highly sensitive entities (PCI / PII) where the audit signal is "passed this point".
`fields: ['customer.email']`	Dot-paths are supported. Extracts a nested field.
`entityPath: 'order'`	The Bridge first descends into `body.order`, then applies `fields`. Default is `$` (root).

Wildcards (fields: ['*']) are not supported by design. Every captured field has to pass through a developer's eyes (forcing function for privacy review).

Limits and fallbacks #

64 KB hard limit on the marshalled payload. Oversized snapshots are replaced with { "_truncated": true, "_originalSizeBytes": N } and a WARN line is written to the Bridge's stdout so the operator can narrow the allowlist.
resultMessage is truncated to 512 runes (UTF-8 safe). Longer messages get a trailing ….
Last-resort regex masking. Anything matching (?i)(password|passwd|secret|token|api[-_]?key|auth) is replaced with <redacted>, even if it slipped into the allowlist by accident.
Invalid JSON in the body falls back to { "_invalidJson": true, "_base64": "..." } (subject to the size cap).
Unresolved entityPath results in payload: {} (the signal is preserved).

Batch granularity #

If a batch connector (ABatchNode) overrides getAuditCheckpoint(), the Bridge emits one log line for the whole batch. The payload is therefore an array narrowed by the allowlist. Individual child messages (after split) come from addItemWithAudit(...), each carries its own audit-entity header, and per-entity correlation joins back through Mongo (audit_data), not through Loki.

public getAuditCheckpoint(): IAuditCheckpoint {
    return {
        role: 'process_entry',
        fields: ['id', 'orderNumber', 'trackingId', 'status'],
    };
}

Keep the batch allowlist narrow. The payload is an array and easily exceeds the 64 KB cap. If you need one log line per child, place an AuditCheckpointNode after the split (and read the section on granularity below before doing so at high volume).

Granularity: when not to add a checkpoint #

The audit log is emitted per message. A */15 * * * * topology with a 1 M-item batch and a process_step checkpoint after the split produces 1 M log lines every quarter hour. Three strategies keep this under control:

Checkpoint on the batch node (before the split). One log line per batch, allowlist narrowed. Works as long as N × item size ≲ 64 KB.
Marker mode (fields: []) after the split. A log line without payload is much smaller and still records "passed point X".
No checkpoint after the split. For very high-volume syncs the audit-entity header from addItemWithAudit is enough; Trace assembles the per-entity timeline from audit_data even without per-step Loki snapshots.

Patterns and anti-patterns #

Patterns #

Boundary connector pattern. Declare process_entry on the input connector and process_exit on the output connector via overridden getAuditCheckpoint(). The topology shape stays purely business: start → input (process_entry) → business-nodes → output (process_exit) → end. The Bridge logs the delivery status after processAction returns.
AuditCheckpointNode for process_step markers. Use a passthrough only between business steps. The status badge will read success (a passthrough has nothing to fail on), but the snapshot is captured.
Same allowlist for entry and exit when the entity has the same shape on both sides. Share a constant so they stay in sync.
Tag with audit-entity as soon as the id is known — typically the first node that loads the entity from an external system, never the output node.
addItemWithAudit in every batch. Never addItem + addAuditHeader on the batch DTO.
Narrow allowlist on batch connectors. The payload is an array; stay well under 64 KB.
Republish + recreate the Bridge container after changing the topology graph. The Bridge holds a graph cache and otherwise ignores the new spec.

Anti-patterns #

Passthrough AuditCheckpointNode placed before an output connector. The log line will always be success even when the downstream call fails. Move the override onto the output connector itself.
Checkpoint after a fan-out of 1 M messages. See the granularity section above.
fields: ['*'] / wildcard. Not supported; every captured field has to be intentional.
PII / PCI in the allowlist (even with intent). Use fields: [] (marker), or a derived hash / masked field prepared inside the connector.
Shared connector for several entities with different shapes. If one connector can produce both Orders and Products, the allowlists collide. Split into two connectors.
Header without an existing AuditEntity in the UI. The Bridge will store the record but Trace cannot resolve it back to the entity (the mapping is missing).
Checkpoint on a cron-trigger node. The request body is typically {}; the log line carries no useful information.

Security and compliance #

The audit pipeline implements defense in depth:

Layer	What it catches
1. Required `fields` allowlist	Forcing function. No "log everything" fallback. A spec without an allowlist is rejected.
2. Last-resort regex `(?i)(password\|secret\|token\|api[-_]?key\|auth)`	Masks anything resembling a secret, even if it leaks into the allowlist.
3. 64 KB hard limit	Prevents Loki bloat and protects against split UTF-8 / partial payload leaks.
4. JSON validation	Invalid input → base64 + flag, rather than raw bytes in the log.
5. Role whitelist	A spec with an unknown role is rejected.
6. Header denylist (`SanitizeHeaders`)	`Authorization`, `Cookie`, `X-API-Key`, … never make it into the INFO log.
7. `auditEntityIds` are never written to the audit log line	Cross-attribute lookup happens in Mongo, not in Loki.

For sensitive domains:

Use marker mode (fields: []) for entities in GDPR / PCI scope.
If you need an identifier, prepare a derived hash (for example a SHA-256 of the email) inside the connector, and audit the hash.
For borderline fields, comment getAuditCheckpoint() with the reason and route the change through a privacy review.

Operational recipes #

"I cannot find an entity in Trace, even though it definitely passed" #

Mongo: db.audit_data.find({ "data.id": "ord-017" }). If empty, the connector did not send the audit-entity header. Check that dto.addAuditHeader(...) is being called and that the entity matches AuditEntity.key.
Verify there is an AuditEntity with that key in the Admin UI. Without it, MCP cannot resolve the entity.

"I see the entity but `entry` / `exit` is null" #

Loki query: {topologyName="create-order", correlationId="..."}. Is there any auditCheckpoint log line at all?
Bridge stdout: ERROR audit checkpoint: ... indicates an invalid spec.
Confirm the boundary connector overrides getAuditCheckpoint(). Having addAuditHeader alone is not enough.
Republish the topology after any graph change (POST /topologies/{id}/publish) and recreate the Bridge container.

"The exit audit shows `success`, even though the external system failed" #

The audit is declared on a passthrough AuditCheckpointNode placed before the output connector. Move the getAuditCheckpoint() override onto the output AConnector itself.
The output connector has no getAuditCheckpoint() override. Add one.
The connector swallows the exception from the external call and never propagates the failure to the SDK. Either rethrow it or call dto.setStopProcess(ResultCode.STOP_AND_FAILED, ...).

"Loki is being flooded with audit lines" #

Some checkpoint sits after a batch split. Move it before the split, or switch it to marker mode.
The topology has too many process_step checkpoints. Aim for 0–2 per business process.

"The payload in Loki says `_truncated: true`" #

The allowlist is too wide, or the entity is genuinely large.
Narrow fields, or split the checkpoint into several smaller ones with process_step role at different points.

Onboarding checklists #

Onboarding a new entity (one-time) #

AuditEntity created in the Admin UI with a unique key.
fields in the UI cover every searchable column you expect operators to filter on.
The key is fixed in a developer-side constant (for example export const AUDIT_ENTITY_ORDER = 'order').

Onboarding a new topology #

The input connector overrides getAuditCheckpoint() with role: 'process_entry'.
The output connector overrides getAuditCheckpoint() with role: 'process_exit'.
The connector that produces the entity calls dto.addAuditHeader(...) (or dto.addItemWithAudit(...) in a batch).
Any process_step markers are declared via AuditCheckpointNode passthroughs.
Every getAuditCheckpoint() has an explicit fields allowlist (no PII / PCI).
The topology is published (POST /topologies/{id}/publish).
The Bridge container has been recreated after the graph change.
Smoke test: run the topology and confirm in Loki that there is a log line with auditCheckpoint.role = "process_entry" and another with "process_exit", both with resultStatus = "success". Force a 5xx from the output system and confirm the exit audit reports resultStatus = "failed" with a non-empty resultMessage.

Code review for a new `getAuditCheckpoint()` #

Override is on the boundary AConnector (entry / exit), not on a passthrough placed in front of it.
For AuditCheckpointNode passthroughs the role is process_step (or, exceptionally, entry / exit on a non-connector boundary).
getAuditCheckpoint() returns a constant spec; no dynamic branching off the dto (that belongs in the business code).
fields does not contain password / secret / token / api_key / auth.
On batch connectors the allowlist is narrow enough to fit under 64 KB.
Sensitive entities use fields: [] (marker mode) with a comment explaining why.
The node is registered in the worker's src/index.ts.
The topology is republished after the change.