Trace auditing
Where Trace is available. Trace is included out of the box in Enterprise Edition (dedicated cloud or self-hosted) and in Pro Level 3. Customers on Pro Level 1 and Level 2 can enable Trace as a paid add-on. The audit pipeline (entity index in Mongo, audit log lines in Loki, and the audit MCP) ships with these tiers — the Starter plan and Community Edition do not include it.
Trace turns the records the platform already produces into a per-entity timeline: "show me everything that happened to order ord-017" opens one card per run, with a process_entry, optional process_steps and a process_exit, each carrying a snapshot of the entity at that boundary and a real delivery status. The conceptual overview lives in Learn: Data flow auditing with Trace; this page is the developer reference for wiring it up.
The two layers #
| Layer | Question it answers | Where it is declared | Where it is stored |
|---|---|---|---|
AuditEntity + audit-entity header (per-message) | "Which entities (id, SKU, EAN, …) appeared in which correlationId?" | Connector calls dto.addAuditHeader(...) or dto.addItemWithAudit(...) | Mongo collection audit_data (Trace MCP indexes it) |
getAuditCheckpoint() on a node (per-business-step) | "What did the entity look like at entry / step / exit, and did the call actually deliver?" | Override on a boundary AConnector (preferred for entry / exit) or on an AuditCheckpointNode passthrough (for process_step) | Loki — structured INFO log lines auditCheckpoint.{role, payload, resultCode, resultStatus, resultMessage, httpStatus} |
The two layers are independent. The header alone gives you "the entity passed". The checkpoint alone gives you boundary snapshots without per-entity grouping. Used together you get the full Trace report.
Node.js SDK only (today). The addAuditHeader, addItemWithAudit, getAuditCheckpoint and AuditCheckpointNode helpers currently exist only in @orchesty/nodejs-sdk. PHP nodes can still receive Trace coverage if a Node.js boundary node sets the audit-entity and audit-checkpoint headers on their behalf, but the helper API is Node.js-only at this time.
Step 1: define the AuditEntity in the Admin UI #
AuditEntity is the platform-wide definition of one business entity. Without it Trace has no way to map the values from the audit-entity header to a real-world record. Create one entity per business domain, typically once during rollout.
- Open Trace → Audit entities → New entity.
- Fill in:
key— stable identifier the connector uses as the first argument ofdto.addAuditHeader('order', ...). Lower-case, snake/kebab, no spaces. Must be unique across the installation. Example:order,product,invoice.name— human-readable label shown in the UI. Example:Order,Product,Invoice.fields— searchable fields, each with its ownkeyandname. The Trace UI exposes them as filters. Example for an order:id,trackingId. Example for a product:id,externalId,SKU,EAN.
- Save. The
audit_entitycollection is updated.
The UI's fields are search facets (which columns the user can filter Trace results by). The SDK's IAuditCheckpoint.fields is the log allowlist (which columns end up inside auditCheckpoint.payload in Loki). They often overlap but are not the same list.
Recommendations:
- Start minimal. A couple of fields you actually search on. Add more later.
idis almost always the right primary key — it anchors per-entity history queries from any audit consumer.- Never rename
keyonce entries exist inaudit_data; existing records reference it directly.
Step 2: tag messages with the audit-entity header #
The audit-entity header is the contract between the connector and the Bridge. The connector sets it via dto.addAuditHeader(...); the Bridge writes a per-message record into audit_data linking correlationId ↔ entity values.
Single-message connector #
import AConnector from '@orchesty/nodejs-sdk/dist/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/ProcessDto';
export default class FetchOrderConnector extends AConnector {
public getName(): string {
return 'fetch-order-connector';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const order = await fetchOrderFromCrm(dto);
dto.setJsonData(order);
dto.addAuditHeader('order', 'id', [{
id: order.id,
trackingId: order.trackingId,
}]);
return dto;
}
}
Arguments:
| Argument | Meaning |
|---|---|
entity ('order') | Must match a key from AuditEntity in the Admin UI. |
key ('id') | Which field is the primary identifier (Trace uses it as the anchor for per-entity queries). |
fields | Concrete values that are present in this message. Not a template, not derived. |
Batch connector — use addItemWithAudit, not addAuditHeader #
In a batch, never combine dto.addItem(o) with dto.addAuditHeader(...) on the batch DTO. The Bridge copies the parent's headers onto every child message via CopyBatchItem; if the parent's audit-entity carries all N entities, every child message will reference all N, and per-entity Trace breaks. Use the per-item helper instead, which scopes each audit-entity to the single item it represents:
import ABatchNode from '@orchesty/nodejs-sdk/dist/lib/Batch/ABatchNode';
import BatchProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/BatchProcessDto';
export default class AllOrdersBatch extends ABatchNode {
public getName(): string {
return 'all-orders-batch';
}
public async processAction(dto: BatchProcessDto): Promise<BatchProcessDto> {
ORDERS.forEach((order) => {
dto.addItemWithAudit(
order,
'order',
'id',
[{ id: order.id, trackingId: order.trackingId }],
);
});
return dto;
}
}
Where not to add the header #
- On nodes that introduce no new identity (a pure passthrough transformer).
- Before the entity exists (a cron-trigger or random generator that has nothing to identify yet). Add the header on the first node that produces it.
Step 3: declare audit checkpoints #
A checkpoint is an explicit point in the topology where the Bridge writes a structured log line to Loki containing a snapshot of allowlisted fields plus the delivery status. Any node declares one by overriding getAuditCheckpoint(): IAuditCheckpoint | null (the default returns null, meaning "neutral node, no audit").
The IAuditCheckpoint spec #
export interface IAuditCheckpoint {
role: 'process_entry' | 'process_step' | 'process_exit';
/** Dot-path to the entity inside the request body. Defaults to `$` (root). */
entityPath?: string;
/** REQUIRED allowlist of fields to extract from the entity. `[]` = marker only. */
fields: string[];
}
3a) Override on the boundary connector — preferred for entry / exit #
The Bridge emits the audit log line after processAction returns. The resultStatus therefore reflects whether the boundary call actually succeeded (success / failed / repeat / trashed / limit). That is exactly what an entry / exit audit needs to record.
import { IAuditCheckpoint } from '@orchesty/nodejs-sdk/dist/lib/Commons/IAuditCheckpoint';
import AConnector from '@orchesty/nodejs-sdk/dist/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/dist/lib/Utils/ProcessDto';
export default class MockErpOutputConnector extends AConnector {
public getName(): string {
return 'mock-erp-output-connector';
}
public getAuditCheckpoint(): IAuditCheckpoint {
return {
role: 'process_exit',
fields: ['id', 'orderNumber', 'trackingId', 'erpReferenceNumber', 'status'],
};
}
public async processAction(dto: ProcessDto<IOrder>): Promise<ProcessDto<IOrder>> {
await this.callErp(dto.getJsonData());
return dto;
}
}
A passthrough placed in front of this connector cannot fail (it has no external call), so its audit log would always say success even when the downstream ERP rejected the order. The override on the connector itself avoids that pitfall.
3b) Override on AuditCheckpointNode — for process_step and non-connector boundaries #
Use the dedicated passthrough node when the audit point is in the middle of a chain, or when the boundary is not an AConnector (a custom webhook receiver, for example).
import AuditCheckpointNode from '@orchesty/nodejs-sdk/dist/lib/Commons/AuditCheckpointNode';
import { IAuditCheckpoint } from '@orchesty/nodejs-sdk/dist/lib/Commons/IAuditCheckpoint';
export default class OrderValidatedAudit extends AuditCheckpointNode {
public getName(): string {
return 'order-validated-audit';
}
public getAuditCheckpoint(): IAuditCheckpoint {
return {
role: 'process_step',
fields: ['id', 'status', 'validationErrors'],
};
}
}
For entry / exit through a passthrough, resultStatus is always success (a passthrough has nothing to fail on). Real delivery status on entry / exit only comes from declaring the audit on the boundary connector itself.
The audit log line #
The Bridge writes the following INFO log to Loki for every audit checkpoint:
{
"level": "info",
"auditCheckpoint": {
"role": "process_exit",
"payload": { "id": "ord-017", "trackingId": "TRK-100017", "erpReferenceNumber": "ERP-9921" },
"resultCode": 0,
"resultStatus": "success",
"resultMessage": "",
"httpStatus": 200
},
"correlationId": "...",
"topologyName": "create-order",
"nodeName": "mock-erp-output-connector"
}
resultStatus is computed by audit.ClassifyStatus(resultCode, httpStatus) on the Bridge:
SDK ResultCode | HTTP status | resultStatus |
|---|---|---|
0 (SUCCESS) | 2xx | success |
1001 (DO_NOT_CONTINUE) | 2xx | success (terminal but OK) |
1002 (REPEAT) / 1004 (FORWARD_TO_REPEATER) / 1010 (LIMIT_EXCEEDED) | any | repeat |
1003 (STOP_AND_FAILED) / 1006 (SPLITTER_BATCH_END_WITH_ERROR) | any | failed |
1005 (SPLITTER_BATCH_END) | 2xx | success |
1009 (MESSAGE_WILL_BE_TRASHED) | any | trashed |
1011 (MESSAGE_LIMIT) | any | limit |
| other / unset | 5xx | failed |
| other / unset | 4xx | unknown |
Each resultStatus value corresponds to a delivery status — Delivered, Failed, Repeating, Limit, Trashed, or Unknown — that audit consumers (the audit MCP and AI assistants on top of it, custom dashboards, downstream ETL) surface to operators.
fields semantics #
| Form | Meaning |
|---|---|
fields: ['id', 'totalAmount'] | Allowlist. The Bridge picks only the listed fields from the request body and writes them into auditCheckpoint.payload. |
fields: [] | Marker mode. A log line is emitted but the payload key is omitted. Use for highly sensitive entities (PCI / PII) where the audit signal is "passed this point". |
fields: ['customer.email'] | Dot-paths are supported. Extracts a nested field. |
entityPath: 'order' | The Bridge first descends into body.order, then applies fields. Default is $ (root). |
Wildcards (fields: ['*']) are not supported by design. Every captured field has to pass through a developer's eyes (forcing function for privacy review).
Limits and fallbacks #
- 64 KB hard limit on the marshalled payload. Oversized snapshots are replaced with
{ "_truncated": true, "_originalSizeBytes": N }and a WARN line is written to the Bridge's stdout so the operator can narrow the allowlist. resultMessageis truncated to 512 runes (UTF-8 safe). Longer messages get a trailing….- Last-resort regex masking. Anything matching
(?i)(password|passwd|secret|token|api[-_]?key|auth)is replaced with<redacted>, even if it slipped into the allowlist by accident. - Invalid JSON in the body falls back to
{ "_invalidJson": true, "_base64": "..." }(subject to the size cap). - Unresolved entityPath results in
payload: {}(the signal is preserved).
Batch granularity #
If a batch connector (ABatchNode) overrides getAuditCheckpoint(), the Bridge emits one log line for the whole batch. The payload is therefore an array narrowed by the allowlist. Individual child messages (after split) come from addItemWithAudit(...), each carries its own audit-entity header, and per-entity correlation joins back through Mongo (audit_data), not through Loki.
public getAuditCheckpoint(): IAuditCheckpoint {
return {
role: 'process_entry',
fields: ['id', 'orderNumber', 'trackingId', 'status'],
};
}
Keep the batch allowlist narrow. The payload is an array and easily exceeds the 64 KB cap. If you need one log line per child, place an AuditCheckpointNode after the split (and read the section on granularity below before doing so at high volume).
Granularity: when not to add a checkpoint #
The audit log is emitted per message. A */15 * * * * topology with a 1 M-item batch and a process_step checkpoint after the split produces 1 M log lines every quarter hour. Three strategies keep this under control:
- Checkpoint on the batch node (before the split). One log line per batch, allowlist narrowed. Works as long as N × item size ≲ 64 KB.
- Marker mode (
fields: []) after the split. A log line withoutpayloadis much smaller and still records "passed point X". - No checkpoint after the split. For very high-volume syncs the
audit-entityheader fromaddItemWithAuditis enough; Trace assembles the per-entity timeline fromaudit_dataeven without per-step Loki snapshots.
Patterns and anti-patterns #
Patterns #
- Boundary connector pattern. Declare
process_entryon the input connector andprocess_exiton the output connector via overriddengetAuditCheckpoint(). The topology shape stays purely business:start → input (process_entry) → business-nodes → output (process_exit) → end. The Bridge logs the delivery status afterprocessActionreturns. AuditCheckpointNodeforprocess_stepmarkers. Use a passthrough only between business steps. The status badge will readsuccess(a passthrough has nothing to fail on), but the snapshot is captured.- Same allowlist for entry and exit when the entity has the same shape on both sides. Share a constant so they stay in sync.
- Tag with
audit-entityas soon as the id is known — typically the first node that loads the entity from an external system, never the output node. addItemWithAuditin every batch. NeveraddItem+addAuditHeaderon the batch DTO.- Narrow allowlist on batch connectors. The payload is an array; stay well under 64 KB.
- Republish + recreate the Bridge container after changing the topology graph. The Bridge holds a graph cache and otherwise ignores the new spec.
Anti-patterns #
- Passthrough
AuditCheckpointNodeplaced before an output connector. The log line will always besuccesseven when the downstream call fails. Move the override onto the output connector itself. - Checkpoint after a fan-out of 1 M messages. See the granularity section above.
fields: ['*']/ wildcard. Not supported; every captured field has to be intentional.- PII / PCI in the allowlist (even with intent). Use
fields: [](marker), or a derived hash / masked field prepared inside the connector. - Shared connector for several entities with different shapes. If one connector can produce both Orders and Products, the allowlists collide. Split into two connectors.
- Header without an existing
AuditEntityin the UI. The Bridge will store the record but Trace cannot resolve it back to the entity (the mapping is missing). - Checkpoint on a cron-trigger node. The request body is typically
{}; the log line carries no useful information.
Security and compliance #
The audit pipeline implements defense in depth:
| Layer | What it catches |
|---|---|
1. Required fields allowlist | Forcing function. No "log everything" fallback. A spec without an allowlist is rejected. |
2. Last-resort regex (?i)(password|secret|token|api[-_]?key|auth) | Masks anything resembling a secret, even if it leaks into the allowlist. |
| 3. 64 KB hard limit | Prevents Loki bloat and protects against split UTF-8 / partial payload leaks. |
| 4. JSON validation | Invalid input → base64 + flag, rather than raw bytes in the log. |
| 5. Role whitelist | A spec with an unknown role is rejected. |
6. Header denylist (SanitizeHeaders) | Authorization, Cookie, X-API-Key, … never make it into the INFO log. |
7. auditEntityIds are never written to the audit log line | Cross-attribute lookup happens in Mongo, not in Loki. |
For sensitive domains:
- Use marker mode (
fields: []) for entities in GDPR / PCI scope. - If you need an identifier, prepare a derived hash (for example a SHA-256 of the email) inside the connector, and audit the hash.
- For borderline fields, comment
getAuditCheckpoint()with the reason and route the change through a privacy review.
Operational recipes #
"I cannot find an entity in Trace, even though it definitely passed" #
- Mongo:
db.audit_data.find({ "data.id": "ord-017" }). If empty, the connector did not send theaudit-entityheader. Check thatdto.addAuditHeader(...)is being called and that theentitymatchesAuditEntity.key. - Verify there is an
AuditEntitywith thatkeyin the Admin UI. Without it, MCP cannot resolve the entity.
"I see the entity but entry / exit is null" #
- Loki query:
{topologyName="create-order", correlationId="..."}. Is there anyauditCheckpointlog line at all? - Bridge stdout:
ERROR audit checkpoint: ...indicates an invalid spec. - Confirm the boundary connector overrides
getAuditCheckpoint(). HavingaddAuditHeaderalone is not enough. - Republish the topology after any graph change (
POST /topologies/{id}/publish) and recreate the Bridge container.
"The exit audit shows success, even though the external system failed" #
- The audit is declared on a passthrough
AuditCheckpointNodeplaced before the output connector. Move thegetAuditCheckpoint()override onto the outputAConnectoritself. - The output connector has no
getAuditCheckpoint()override. Add one. - The connector swallows the exception from the external call and never propagates the failure to the SDK. Either rethrow it or call
dto.setStopProcess(ResultCode.STOP_AND_FAILED, ...).
"Loki is being flooded with audit lines" #
- Some checkpoint sits after a batch split. Move it before the split, or switch it to marker mode.
- The topology has too many
process_stepcheckpoints. Aim for 0–2 per business process.
"The payload in Loki says _truncated: true" #
- The allowlist is too wide, or the entity is genuinely large.
- Narrow
fields, or split the checkpoint into several smaller ones withprocess_steprole at different points.
Onboarding checklists #
Onboarding a new entity (one-time) #
-
AuditEntitycreated in the Admin UI with a uniquekey. -
fieldsin the UI cover every searchable column you expect operators to filter on. - The
keyis fixed in a developer-side constant (for exampleexport const AUDIT_ENTITY_ORDER = 'order').
Onboarding a new topology #
- The input connector overrides
getAuditCheckpoint()withrole: 'process_entry'. - The output connector overrides
getAuditCheckpoint()withrole: 'process_exit'. - The connector that produces the entity calls
dto.addAuditHeader(...)(ordto.addItemWithAudit(...)in a batch). - Any
process_stepmarkers are declared viaAuditCheckpointNodepassthroughs. - Every
getAuditCheckpoint()has an explicitfieldsallowlist (no PII / PCI). - The topology is published (
POST /topologies/{id}/publish). - The Bridge container has been recreated after the graph change.
- Smoke test: run the topology and confirm in Loki that there is a log line with
auditCheckpoint.role = "process_entry"and another with"process_exit", both withresultStatus = "success". Force a 5xx from the output system and confirm the exit audit reportsresultStatus = "failed"with a non-emptyresultMessage.
Code review for a new getAuditCheckpoint() #
- Override is on the boundary
AConnector(entry / exit), not on a passthrough placed in front of it. - For
AuditCheckpointNodepassthroughs the role isprocess_step(or, exceptionally, entry / exit on a non-connector boundary). -
getAuditCheckpoint()returns a constant spec; no dynamic branching off the dto (that belongs in the business code). -
fieldsdoes not containpassword / secret / token / api_key / auth. - On batch connectors the allowlist is narrow enough to fit under 64 KB.
- Sensitive entities use
fields: [](marker mode) with a comment explaining why. - The node is registered in the worker's
src/index.ts. - The topology is republished after the change.
See also #
- Learn: Data flow auditing with Trace — concept overview and the rationale behind the boundary-connector pattern.
- Operations: Logging — the SDK logger that automatically attaches correlation context.
- Operations: Integration monitoring — dashboards and process detail.
- Concepts: Processes and Messages — the underlying record Trace builds on.