Error handling and retries
Orchesty's promise is that a failure in one step never silently swallows a process. To honour that promise you need three things in your worker code: a way to mark a step as failed, sensible retry decisions on each connector, and the assurance that anything still failing will land in the Trash inbox where someone can deal with it.
For background see Concepts: Processes and Messages.
How to fail a step #
A node fails by either throwing or by calling setStopProcess(STOP_AND_FAILED, '...'). Both are persisted; the difference is just style:
- Use
setStopProcess(STOP_AND_FAILED, ...)for "expected" failures: the upstream returned a 4xx with a known meaning, validation rejected the input, the record is in a state we don't know how to handle. - Use
throwfor "unexpected" failures: the SDK call blew up, the JSON didn't parse, the database was unreachable.
import { ResultCode } from '@orchesty/nodejs-sdk/dist/lib/Utils/ResultCode';
if (response.statusCode === 404) {
dto.setStopProcess(ResultCode.STOP_AND_FAILED, 'Customer not found upstream');
return dto;
}
The setStopProcess(DO_NOT_CONTINUE, ...) shortcut is for the third case: not a failure, just "this message is fine but should not flow further". Use it for filters and dedup checks.
Retries #
A retry policy is set in code, not in the Admin UI. The platform doesn't store a per-node retry config that an operator edits; what it does store is the result of one execution — result-code plus repeat-interval and repeat-max-hops headers — and acts on it. Your job is to throw the right thing.
Throwing a retry from a node #
Both SDKs ship an OnRepeatException that carries the interval (in seconds) and the maximum number of attempts. Throw it for transient failures the platform should re-deliver after a pause; let STOP_AND_FAILED (or any other unhandled exception) handle deterministic failures.
import OnRepeatException from '@orchesty/nodejs-sdk/dist/lib/Exception/OnRepeatException';
if (sqlError === 'TOO_MANY_CONNECTIONS') {
throw new OnRepeatException(60, 10, sqlError);
}
OnRepeatException defaults to 60 s between attempts and 10 attempts before the message is sent to Trash. Pick numbers that match what the upstream system expects — shorter for soft API rate limits, longer for ETL recovery windows.
What the SDK does for HTTP connectors automatically #
When a connector calls getSender().send(dto) without explicit ranges, the SDK applies a default classification of the HTTP status code:
| Range | Outcome |
|---|---|
< 300 | success — pass response back |
300–407 | OnStopAndFailException — straight to Trash, no retry |
408 | OnRepeatException(60, 10) — retry (request timeout) |
409–499 | OnStopAndFailException — straight to Trash, no retry |
>= 500 | OnRepeatException(60, 10) — retry |
| Network timeout | OnRepeatException(60, 10) — retry |
Override per call when the API has its own conventions:
await this.getSender().send(
requestDto,
{ success: [200, 201], repeat: [429, '>=500'], stopAndFail: ['400-428'] },
30, // seconds between attempts
5, // max attempts
);
Two rules of thumb:
- Retry transient failures — network blips, 502 / 503, occasional 429.
- Don't retry deterministic failures — 400, 422, "bad request", "validation failed". Same input will fail again; send to Trash.
Per-node override #
The platform allows operators to override the in-code defaults per node via the node's systemConfigs.repeater (enabled, interval, hops). When enabled is true, the platform replaces whatever values your OnRepeatException carried with the configured ones. This is not exposed in the Admin UI today — treat it as a back-office escape hatch and make your in-code defaults the source of truth.
When retries are exhausted #
If repeat-max-hops is reached and the node still fails, the message lands in the Trash inbox for the topology, with the original payload, the failure detail, and the node where it gave up. From there an operator can inspect, edit, replay, or discard it.
The Trash workflow is operational, not part of your worker code — see Operations: Trash inbox.