Trace auditing

Where Trace is available. Trace is included out of the box in Enterprise Edition (dedicated cloud or self-hosted) and in Pro Level 3. Customers on Pro Level 1 and Level 2 can enable Trace as a paid add-on. The audit pipeline (entity index in Mongo, audit log lines in Loki, and the audit MCP) ships with these tiers — the Starter plan and Community Edition do not include it.

Trace turns the records the platform already produces into a per-entity timeline: "show me everything that happened to order ord-017" opens one card per run, with a process_entry, optional process_steps and a process_exit, each carrying a snapshot of the entity at that boundary and a real delivery status. The conceptual overview lives in Learn: Data flow auditing with Trace; this page is the developer reference for wiring it up.

The two layers #

LayerQuestion it answersWhere it is declaredWhere it is stored
AuditEntity + audit-entity header (per-message)"Which entities (id, SKU, EAN, …) appeared in which correlationId?"Connector calls dto.addAuditHeader(...) or dto.addItemWithAudit(...)Mongo collection audit_data (Trace MCP indexes it)
getAuditCheckpoint() on a node (per-business-step)"What did the entity look like at entry / step / exit, and did the call actually deliver?"Override on a boundary AConnector (preferred for entry / exit) or on an AuditCheckpointNode passthrough (for process_step)Loki — structured INFO log lines auditCheckpoint.{role, payload, resultCode, resultStatus, resultMessage, httpStatus}

The two layers are independent. The header alone gives you "the entity passed". The checkpoint alone gives you boundary snapshots without per-entity grouping. Used together you get the full Trace report.

Node.js SDK only (today). The addAuditHeader, addItemWithAudit, getAuditCheckpoint and AuditCheckpointNode helpers currently exist only in @orchesty/nodejs-sdk. PHP nodes can still receive Trace coverage if a Node.js boundary node sets the audit-entity and audit-checkpoint headers on their behalf, but the helper API is Node.js-only at this time.

Step 1: define the AuditEntity in the Admin UI #

AuditEntity is the platform-wide definition of one business entity. Without it Trace has no way to map the values from the audit-entity header to a real-world record. Create one entity per business domain, typically once during rollout.

  1. Open Trace → Audit entities → New entity.
  2. Fill in:
    • key — stable identifier the connector uses as the first argument of dto.addAuditHeader('order', ...). Lower-case, snake/kebab, no spaces. Must be unique across the installation. Example: order, product, invoice.
    • name — human-readable label shown in the UI. Example: Order, Product, Invoice.
    • fields — searchable fields, each with its own key and name. The Trace UI exposes them as filters. Example for an order: id, trackingId. Example for a product: id, externalId, SKU, EAN.
  3. Save. The audit_entity collection is updated.

The UI's fields are search facets (which columns the user can filter Trace results by). The SDK's IAuditCheckpoint.fields is the log allowlist (which columns end up inside auditCheckpoint.payload in Loki). They often overlap but are not the same list.

Recommendations:

  • Start minimal. A couple of fields you actually search on. Add more later.
  • id is almost always the right primary key — it anchors per-entity history queries from any audit consumer.
  • Never rename key once entries exist in audit_data; existing records reference it directly.

Step 2: tag messages with the audit-entity header #

The audit-entity header is the contract between the connector and the Bridge. The connector sets it via dto.addAuditHeader(...); the Bridge writes a per-message record into audit_data linking correlationId ↔ entity values.

Single-message connector #

import AConnector from '@orchesty/nodejs-sdk/dist/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/ProcessDto';

export default class FetchOrderConnector extends AConnector {

    public getName(): string {
        return 'fetch-order-connector';
    }

    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        const order = await fetchOrderFromCrm(dto);
        dto.setJsonData(order);

        dto.addAuditHeader('order', 'id', [{
            id: order.id,
            trackingId: order.trackingId,
        }]);

        return dto;
    }

}

Arguments:

ArgumentMeaning
entity ('order')Must match a key from AuditEntity in the Admin UI.
key ('id')Which field is the primary identifier (Trace uses it as the anchor for per-entity queries).
fieldsConcrete values that are present in this message. Not a template, not derived.

Batch connector — use addItemWithAudit, not addAuditHeader #

In a batch, never combine dto.addItem(o) with dto.addAuditHeader(...) on the batch DTO. The Bridge copies the parent's headers onto every child message via CopyBatchItem; if the parent's audit-entity carries all N entities, every child message will reference all N, and per-entity Trace breaks. Use the per-item helper instead, which scopes each audit-entity to the single item it represents:

import ABatchNode from '@orchesty/nodejs-sdk/dist/lib/Batch/ABatchNode';
import BatchProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/BatchProcessDto';

export default class AllOrdersBatch extends ABatchNode {

    public getName(): string {
        return 'all-orders-batch';
    }

    public async processAction(dto: BatchProcessDto): Promise<BatchProcessDto> {
        ORDERS.forEach((order) => {
            dto.addItemWithAudit(
                order,
                'order',
                'id',
                [{ id: order.id, trackingId: order.trackingId }],
            );
        });

        return dto;
    }

}

Where not to add the header #

  • On nodes that introduce no new identity (a pure passthrough transformer).
  • Before the entity exists (a cron-trigger or random generator that has nothing to identify yet). Add the header on the first node that produces it.

Step 3: declare audit checkpoints #

A checkpoint is an explicit point in the topology where the Bridge writes a structured log line to Loki containing a snapshot of allowlisted fields plus the delivery status. Any node declares one by overriding getAuditCheckpoint(): IAuditCheckpoint | null (the default returns null, meaning "neutral node, no audit").

The IAuditCheckpoint spec #

export interface IAuditCheckpoint {
    role: 'process_entry' | 'process_step' | 'process_exit';
    /** Dot-path to the entity inside the request body. Defaults to `$` (root). */
    entityPath?: string;
    /** REQUIRED allowlist of fields to extract from the entity. `[]` = marker only. */
    fields: string[];
}

3a) Override on the boundary connector — preferred for entry / exit #

The Bridge emits the audit log line after processAction returns. The resultStatus therefore reflects whether the boundary call actually succeeded (success / failed / repeat / trashed / limit). That is exactly what an entry / exit audit needs to record.

import { IAuditCheckpoint } from '@orchesty/nodejs-sdk/dist/lib/Commons/IAuditCheckpoint';
import AConnector from '@orchesty/nodejs-sdk/dist/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/ProcessDto';

export default class MockErpOutputConnector extends AConnector {

    public getName(): string {
        return 'mock-erp-output-connector';
    }

    public getAuditCheckpoint(): IAuditCheckpoint {
        return {
            role: 'process_exit',
            fields: ['id', 'orderNumber', 'trackingId', 'erpReferenceNumber', 'status'],
        };
    }

    public async processAction(dto: ProcessDto<IOrder>): Promise<ProcessDto<IOrder>> {
        await this.callErp(dto.getJsonData());
        return dto;
    }

}

A passthrough placed in front of this connector cannot fail (it has no external call), so its audit log would always say success even when the downstream ERP rejected the order. The override on the connector itself avoids that pitfall.

3b) Override on AuditCheckpointNode — for process_step and non-connector boundaries #

Use the dedicated passthrough node when the audit point is in the middle of a chain, or when the boundary is not an AConnector (a custom webhook receiver, for example).

import AuditCheckpointNode from '@orchesty/nodejs-sdk/dist/lib/Commons/AuditCheckpointNode';
import { IAuditCheckpoint } from '@orchesty/nodejs-sdk/dist/lib/Commons/IAuditCheckpoint';

export default class OrderValidatedAudit extends AuditCheckpointNode {

    public getName(): string {
        return 'order-validated-audit';
    }

    public getAuditCheckpoint(): IAuditCheckpoint {
        return {
            role: 'process_step',
            fields: ['id', 'status', 'validationErrors'],
        };
    }

}

For entry / exit through a passthrough, resultStatus is always success (a passthrough has nothing to fail on). Real delivery status on entry / exit only comes from declaring the audit on the boundary connector itself.

The audit log line #

The Bridge writes the following INFO log to Loki for every audit checkpoint:

{
    "level": "info",
    "auditCheckpoint": {
        "role": "process_exit",
        "payload": { "id": "ord-017", "trackingId": "TRK-100017", "erpReferenceNumber": "ERP-9921" },
        "resultCode": 0,
        "resultStatus": "success",
        "resultMessage": "",
        "httpStatus": 200
    },
    "correlationId": "...",
    "topologyName": "create-order",
    "nodeName": "mock-erp-output-connector"
}

resultStatus is computed by audit.ClassifyStatus(resultCode, httpStatus) on the Bridge:

SDK ResultCodeHTTP statusresultStatus
0 (SUCCESS)2xxsuccess
1001 (DO_NOT_CONTINUE)2xxsuccess (terminal but OK)
1002 (REPEAT) / 1004 (FORWARD_TO_REPEATER) / 1010 (LIMIT_EXCEEDED)anyrepeat
1003 (STOP_AND_FAILED) / 1006 (SPLITTER_BATCH_END_WITH_ERROR)anyfailed
1005 (SPLITTER_BATCH_END)2xxsuccess
1009 (MESSAGE_WILL_BE_TRASHED)anytrashed
1011 (MESSAGE_LIMIT)anylimit
other / unset5xxfailed
other / unset4xxunknown

Each resultStatus value corresponds to a delivery status — Delivered, Failed, Repeating, Limit, Trashed, or Unknown — that audit consumers (the audit MCP and AI assistants on top of it, custom dashboards, downstream ETL) surface to operators.

fields semantics #

FormMeaning
fields: ['id', 'totalAmount']Allowlist. The Bridge picks only the listed fields from the request body and writes them into auditCheckpoint.payload.
fields: []Marker mode. A log line is emitted but the payload key is omitted. Use for highly sensitive entities (PCI / PII) where the audit signal is "passed this point".
fields: ['customer.email']Dot-paths are supported. Extracts a nested field.
entityPath: 'order'The Bridge first descends into body.order, then applies fields. Default is $ (root).

Wildcards (fields: ['*']) are not supported by design. Every captured field has to pass through a developer's eyes (forcing function for privacy review).

Limits and fallbacks #

  • 64 KB hard limit on the marshalled payload. Oversized snapshots are replaced with { "_truncated": true, "_originalSizeBytes": N } and a WARN line is written to the Bridge's stdout so the operator can narrow the allowlist.
  • resultMessage is truncated to 512 runes (UTF-8 safe). Longer messages get a trailing .
  • Last-resort regex masking. Anything matching (?i)(password|passwd|secret|token|api[-_]?key|auth) is replaced with <redacted>, even if it slipped into the allowlist by accident.
  • Invalid JSON in the body falls back to { "_invalidJson": true, "_base64": "..." } (subject to the size cap).
  • Unresolved entityPath results in payload: {} (the signal is preserved).

Batch granularity #

If a batch connector (ABatchNode) overrides getAuditCheckpoint(), the Bridge emits one log line for the whole batch. The payload is therefore an array narrowed by the allowlist. Individual child messages (after split) come from addItemWithAudit(...), each carries its own audit-entity header, and per-entity correlation joins back through Mongo (audit_data), not through Loki.

public getAuditCheckpoint(): IAuditCheckpoint {
    return {
        role: 'process_entry',
        fields: ['id', 'orderNumber', 'trackingId', 'status'],
    };
}

Keep the batch allowlist narrow. The payload is an array and easily exceeds the 64 KB cap. If you need one log line per child, place an AuditCheckpointNode after the split (and read the section on granularity below before doing so at high volume).

Granularity: when not to add a checkpoint #

The audit log is emitted per message. A */15 * * * * topology with a 1 M-item batch and a process_step checkpoint after the split produces 1 M log lines every quarter hour. Three strategies keep this under control:

  1. Checkpoint on the batch node (before the split). One log line per batch, allowlist narrowed. Works as long as N × item size ≲ 64 KB.
  2. Marker mode (fields: []) after the split. A log line without payload is much smaller and still records "passed point X".
  3. No checkpoint after the split. For very high-volume syncs the audit-entity header from addItemWithAudit is enough; Trace assembles the per-entity timeline from audit_data even without per-step Loki snapshots.

Patterns and anti-patterns #

Patterns #

  • Boundary connector pattern. Declare process_entry on the input connector and process_exit on the output connector via overridden getAuditCheckpoint(). The topology shape stays purely business: start → input (process_entry) → business-nodes → output (process_exit) → end. The Bridge logs the delivery status after processAction returns.
  • AuditCheckpointNode for process_step markers. Use a passthrough only between business steps. The status badge will read success (a passthrough has nothing to fail on), but the snapshot is captured.
  • Same allowlist for entry and exit when the entity has the same shape on both sides. Share a constant so they stay in sync.
  • Tag with audit-entity as soon as the id is known — typically the first node that loads the entity from an external system, never the output node.
  • addItemWithAudit in every batch. Never addItem + addAuditHeader on the batch DTO.
  • Narrow allowlist on batch connectors. The payload is an array; stay well under 64 KB.
  • Republish + recreate the Bridge container after changing the topology graph. The Bridge holds a graph cache and otherwise ignores the new spec.

Anti-patterns #

  • Passthrough AuditCheckpointNode placed before an output connector. The log line will always be success even when the downstream call fails. Move the override onto the output connector itself.
  • Checkpoint after a fan-out of 1 M messages. See the granularity section above.
  • fields: ['*'] / wildcard. Not supported; every captured field has to be intentional.
  • PII / PCI in the allowlist (even with intent). Use fields: [] (marker), or a derived hash / masked field prepared inside the connector.
  • Shared connector for several entities with different shapes. If one connector can produce both Orders and Products, the allowlists collide. Split into two connectors.
  • Header without an existing AuditEntity in the UI. The Bridge will store the record but Trace cannot resolve it back to the entity (the mapping is missing).
  • Checkpoint on a cron-trigger node. The request body is typically {}; the log line carries no useful information.

Security and compliance #

The audit pipeline implements defense in depth:

LayerWhat it catches
1. Required fields allowlistForcing function. No "log everything" fallback. A spec without an allowlist is rejected.
2. Last-resort regex (?i)(password|secret|token|api[-_]?key|auth)Masks anything resembling a secret, even if it leaks into the allowlist.
3. 64 KB hard limitPrevents Loki bloat and protects against split UTF-8 / partial payload leaks.
4. JSON validationInvalid input → base64 + flag, rather than raw bytes in the log.
5. Role whitelistA spec with an unknown role is rejected.
6. Header denylist (SanitizeHeaders)Authorization, Cookie, X-API-Key, … never make it into the INFO log.
7. auditEntityIds are never written to the audit log lineCross-attribute lookup happens in Mongo, not in Loki.

For sensitive domains:

  • Use marker mode (fields: []) for entities in GDPR / PCI scope.
  • If you need an identifier, prepare a derived hash (for example a SHA-256 of the email) inside the connector, and audit the hash.
  • For borderline fields, comment getAuditCheckpoint() with the reason and route the change through a privacy review.

Operational recipes #

"I cannot find an entity in Trace, even though it definitely passed" #

  1. Mongo: db.audit_data.find({ "data.id": "ord-017" }). If empty, the connector did not send the audit-entity header. Check that dto.addAuditHeader(...) is being called and that the entity matches AuditEntity.key.
  2. Verify there is an AuditEntity with that key in the Admin UI. Without it, MCP cannot resolve the entity.

"I see the entity but entry / exit is null" #

  1. Loki query: {topologyName="create-order", correlationId="..."}. Is there any auditCheckpoint log line at all?
  2. Bridge stdout: ERROR audit checkpoint: ... indicates an invalid spec.
  3. Confirm the boundary connector overrides getAuditCheckpoint(). Having addAuditHeader alone is not enough.
  4. Republish the topology after any graph change (POST /topologies/{id}/publish) and recreate the Bridge container.

"The exit audit shows success, even though the external system failed" #

  • The audit is declared on a passthrough AuditCheckpointNode placed before the output connector. Move the getAuditCheckpoint() override onto the output AConnector itself.
  • The output connector has no getAuditCheckpoint() override. Add one.
  • The connector swallows the exception from the external call and never propagates the failure to the SDK. Either rethrow it or call dto.setStopProcess(ResultCode.STOP_AND_FAILED, ...).

"Loki is being flooded with audit lines" #

  1. Some checkpoint sits after a batch split. Move it before the split, or switch it to marker mode.
  2. The topology has too many process_step checkpoints. Aim for 0–2 per business process.

"The payload in Loki says _truncated: true" #

  • The allowlist is too wide, or the entity is genuinely large.
  • Narrow fields, or split the checkpoint into several smaller ones with process_step role at different points.

Onboarding checklists #

Onboarding a new entity (one-time) #

  • AuditEntity created in the Admin UI with a unique key.
  • fields in the UI cover every searchable column you expect operators to filter on.
  • The key is fixed in a developer-side constant (for example export const AUDIT_ENTITY_ORDER = 'order').

Onboarding a new topology #

  • The input connector overrides getAuditCheckpoint() with role: 'process_entry'.
  • The output connector overrides getAuditCheckpoint() with role: 'process_exit'.
  • The connector that produces the entity calls dto.addAuditHeader(...) (or dto.addItemWithAudit(...) in a batch).
  • Any process_step markers are declared via AuditCheckpointNode passthroughs.
  • Every getAuditCheckpoint() has an explicit fields allowlist (no PII / PCI).
  • The topology is published (POST /topologies/{id}/publish).
  • The Bridge container has been recreated after the graph change.
  • Smoke test: run the topology and confirm in Loki that there is a log line with auditCheckpoint.role = "process_entry" and another with "process_exit", both with resultStatus = "success". Force a 5xx from the output system and confirm the exit audit reports resultStatus = "failed" with a non-empty resultMessage.

Code review for a new getAuditCheckpoint() #

  • Override is on the boundary AConnector (entry / exit), not on a passthrough placed in front of it.
  • For AuditCheckpointNode passthroughs the role is process_step (or, exceptionally, entry / exit on a non-connector boundary).
  • getAuditCheckpoint() returns a constant spec; no dynamic branching off the dto (that belongs in the business code).
  • fields does not contain password / secret / token / api_key / auth.
  • On batch connectors the allowlist is narrow enough to fit under 64 KB.
  • Sensitive entities use fields: [] (marker mode) with a comment explaining why.
  • The node is registered in the worker's src/index.ts.
  • The topology is republished after the change.

See also #

© 2025 Orchesty Solutions. All rights reserved.