Retry Policy
What is Retry Policy? #
Retry policy in Orchesty defines how the system handles failures and temporary issues during data processing. Instead of immediately failing when something goes wrong, Orchesty can automatically retry operations after a specified interval, giving transient issues time to resolve.
Key concepts:
- Repeater: Automatic retry mechanism with configurable intervals and limits
- OnRepeatException: Exception type that triggers retries
- Result code ranges: Define which HTTP status codes trigger retries
- Hops: Number of retry attempts remaining
- Interval: Time between retry attempts
Why Retry Policy Matters #
Modern integrations deal with many temporary failures:
- Network timeouts: Brief connectivity issues
- Rate limiting: API throttling (try again later)
- Async operations: Waiting for jobs to complete
- Service unavailability: Temporary outages
- Resource locks: Database/file locks that clear quickly
Without retries, these temporary issues would cause permanent failures. With a proper retry policy, most transient issues resolve automatically.
How Retries Work #
The Retry Flow #
sequenceDiagram
participant N as Node
participant M as Middleware
participant Q as Queue
participant R as RabbitMQ
N->>N: Process message
N->>N: Error occurs
N->>M: Throw OnRepeatException
M->>M: Check retry count
alt Has retries remaining
M->>Q: Add to retry queue
Note over Q: Wait interval seconds
Q->>N: Retry processing
else Max retries exceeded
M->>R: Send to trash
Note over R: Manual recovery needed
end
Retry States #
stateDiagram-v2
[*] --> Processing
Processing --> Success: Operation succeeds
Processing --> TransientError: Temporary failure
TransientError --> Waiting: Set repeater
Waiting --> Processing: After interval
Processing --> PermanentError: Validation/business error
Processing --> MaxRetries: Retries exhausted
Success --> [*]
PermanentError --> [*]
MaxRetries --> Trash
Trash --> [*]
Implementing Retries #
Method 1: Using Repeater (Explicit Control) #
The most common approach - explicitly set retry parameters:
import AConnector from '@orchesty/nodejs-sdk/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/lib/Utils/ProcessDto';
import ResultCode from '@orchesty/nodejs-sdk/lib/Utils/ResultCode';
export default class PollingConnector extends AConnector {
public getName(): string {
return 'poll-job-status';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const { jobId } = dto.getJsonData();
// Check job status
const status = await this.checkJobStatus(jobId);
if (status === 'pending' || status === 'processing') {
// Job not ready yet - retry every 30 seconds, up to 10 times
dto.setRepeater(
30, // interval in seconds
10, // maximum retry attempts
`Job ${jobId} still ${status}`
);
dto.setJsonData({ status, jobId });
return dto;
}
if (status === 'completed') {
// Success! Remove repeater and continue
dto.removeRepeater();
const result = await this.fetchJobResult(jobId);
dto.setJsonData(result);
dto.setSuccessProcess('Job completed successfully');
return dto;
}
if (status === 'failed') {
// Permanent failure - don't retry
dto.setStopProcess(
ResultCode.STOP_AND_FAILED,
`Job ${jobId} failed permanently`
);
return dto;
}
return dto;
}
private async checkJobStatus(jobId: string): Promise<string> {
// Implementation
return 'pending';
}
private async fetchJobResult(jobId: string): Promise<any> {
// Implementation
return {};
}
}
Method 2: Automatic HTTP Retries #
The SDK automatically retries certain HTTP errors:
import AConnector from '@orchesty/nodejs-sdk/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/lib/Utils/ProcessDto';
import RequestDto from '@orchesty/nodejs-sdk/lib/Transport/Curl/RequestDto';
import { HttpMethods } from '@orchesty/nodejs-sdk/lib/Transport/HttpMethods';
export default class ApiCallConnector extends AConnector {
public getName(): string {
return 'api-call';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const requestDto = new RequestDto(
'https://api.example.com/data',
HttpMethods.GET,
dto
);
// The SDK automatically retries:
// - HTTP 408 (Request Timeout)
// - HTTP 500+ (Server Errors)
// - Network timeouts
//
// Default: 60 second interval, 10 retries
const response = await this.getSender().send(requestDto);
dto.setData(response.getBody());
return dto;
}
}
Method 3: Custom Result Code Ranges #
Control which HTTP status codes trigger retries:
import { IResultRanges } from '@orchesty/nodejs-sdk/lib/Transport/Curl/ResultCodeRange';
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const requestDto = new RequestDto(url, HttpMethods.GET, dto);
// Define custom retry behavior
const codeRanges: IResultRanges = {
success: '<300', // 2xx = success
stopAndFail: ['300-408', '409-429', '430-499'], // 4xx = fail (except 429)
repeat: [408, 429, '>=500'] // Retry on timeout, rate limit, and 5xx
};
// Pass custom ranges to sender
const response = await this.getSender().send(
requestDto,
codeRanges,
45, // retry interval (seconds)
5 // max hops
);
dto.setData(response.getBody());
return dto;
}
Method 4: Throwing OnRepeatException #
Programmatically trigger retries:
import OnRepeatException from '@orchesty/nodejs-sdk/lib/Exception/OnRepeatException';
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
try {
const result = await this.performOperation();
dto.setJsonData(result);
return dto;
} catch (error) {
if (this.isTransientError(error)) {
// Retry every 20 seconds, up to 5 times
throw new OnRepeatException(
20, // interval
5, // maxHops
`Transient error: ${error.message}`
);
}
// Not transient - throw normally
throw error;
}
}
private isTransientError(error: any): boolean {
return error.code === 'ECONNRESET' ||
error.code === 'ETIMEDOUT' ||
error.message.includes('temporarily unavailable');
}
Retry Configuration #
Setting Repeater Parameters #
dto.setRepeater(
interval, // Seconds between retries (minimum: 1)
maxHops, // Maximum retry attempts (minimum: 1)
reason // Reason for retry (for logging)
);
Examples:
// Quick retries for fast-resolving issues
dto.setRepeater(5, 3, 'Database lock');
// Standard retries
dto.setRepeater(30, 10, 'Waiting for async job');
// Slow retries for rate limiting
dto.setRepeater(300, 4, 'API rate limit exceeded');
// Patient retries for long-running operations
dto.setRepeater(60, 60, 'Waiting for file processing (up to 1 hour)');
Removing Repeater #
// Once operation succeeds, remove repeater
if (jobCompleted) {
dto.removeRepeater();
dto.setJsonData(result);
}
Node-Level Repeater Configuration #
Configure default repeaters in Orchesty Admin at the node level:
{
"repeater": {
"enabled": true,
"interval": 60,
"hops": 10
}
}
When node configuration exists, it overrides repeater set in code.
Common Retry Patterns #
Pattern 1: Polling Async Operations #
export default class AsyncJobPoller extends AConnector {
public getName(): string {
return 'async-job-poller';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const input = dto.getJsonData();
const jobId = input.jobId || input.id;
// Check job status
const job = await this.checkStatus(jobId);
switch (job.status) {
case 'queued':
case 'processing':
// Still working - retry every 15 seconds
dto.setRepeater(15, 40, `Job ${job.status}`);
dto.setJsonData({
status: job.status,
progress: job.progress,
jobId: jobId
});
break;
case 'completed':
// Done! Get results and continue
dto.removeRepeater();
const result = await this.getResults(jobId);
dto.setJsonData(result);
dto.setSuccessProcess(`Job ${jobId} completed`);
break;
case 'failed':
// Permanent failure
dto.setStopProcess(
ResultCode.STOP_AND_FAILED,
`Job ${jobId} failed: ${job.error}`
);
break;
default:
// Unknown status - stop
dto.setStopProcess(
ResultCode.STOP_AND_FAILED,
`Unknown job status: ${job.status}`
);
}
return dto;
}
}
Pattern 2: Rate Limit Handling #
export default class RateLimitedConnector extends AConnector {
public getName(): string {
return 'rate-limited-api';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
try {
const requestDto = new RequestDto(url, HttpMethods.GET, dto);
const response = await this.getSender().send(requestDto);
dto.setData(response.getBody());
return dto;
} catch (error) {
if (error.response?.status === 429) {
// Rate limited - check Retry-After header
const retryAfter = parseInt(
error.response.headers['retry-after'] || '60'
);
throw new OnRepeatException(
retryAfter,
5,
`Rate limited. Retry after ${retryAfter} seconds`
);
}
throw error;
}
}
}
Pattern 3: Exponential Backoff #
export default class ExponentialBackoffConnector extends AConnector {
private readonly BASE_INTERVAL = 10;
private readonly MAX_INTERVAL = 300;
public getName(): string {
return 'exponential-backoff';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
try {
const result = await this.attemptOperation(dto);
dto.setJsonData(result);
return dto;
} catch (error) {
// Calculate retry interval based on attempt number
const attempt = this.getAttemptNumber(dto);
const interval = Math.min(
this.BASE_INTERVAL * Math.pow(2, attempt),
this.MAX_INTERVAL
);
dto.setRepeater(
interval,
10,
`Attempt ${attempt + 1}: ${error.message}`
);
return dto;
}
}
private getAttemptNumber(dto: ProcessDto): number {
const repeatHops = dto.getHeader('repeat-hops') || '0';
return parseInt(repeatHops, 10);
}
}
Pattern 4: Resource Availability Check #
export default class ResourceWaiter extends AConnector {
public getName(): string {
return 'wait-for-resource';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const { resourceId } = dto.getJsonData();
// Check if resource is available
const available = await this.checkResourceAvailability(resourceId);
if (!available) {
// Not ready - retry every 10 seconds, up to 30 times (5 minutes)
dto.setRepeater(
10,
30,
`Waiting for resource ${resourceId} to become available`
);
return dto;
}
// Resource available - proceed
dto.removeRepeater();
const resource = await this.getResource(resourceId);
dto.setJsonData(resource);
return dto;
}
}
Pattern 5: Conditional Retry #
export default class ConditionalRetryConnector extends AConnector {
public getName(): string {
return 'conditional-retry';
}
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
try {
const result = await this.performOperation();
dto.setJsonData(result);
return dto;
} catch (error) {
// Only retry specific error types
if (this.shouldRetry(error)) {
dto.setRepeater(
30,
5,
`Retrying due to: ${error.message}`
);
return dto;
}
// Don't retry - fail immediately
dto.setStopProcess(
ResultCode.STOP_AND_FAILED,
`Operation failed: ${error.message}`
);
return dto;
}
}
private shouldRetry(error: any): boolean {
const retryableCodes = ['ECONNRESET', 'ETIMEDOUT', 'ENOTFOUND'];
return retryableCodes.includes(error.code) ||
error.message.includes('temporary') ||
error.status >= 500;
}
}
Result Code Ranges #
Default Ranges #
{
success: '<300', // 200-299
stopAndFail: ['300-408', '409-500'], // 300-407, 409-499
repeat: [408, '>=500'] // 408, 500+
}
Custom Range Examples #
// Treat 404 as success (resource doesn't exist is OK)
{
success: ['<300', 404],
stopAndFail: ['300-404', '405-500'],
repeat: [408, '>=500']
}
// Retry on 429 (rate limit)
{
success: '<300',
stopAndFail: ['300-408', '409-429', '430-500'],
repeat: [408, 429, '>=500']
}
// Never retry (fail immediately)
{
success: '<300',
stopAndFail: '>=300',
repeat: []
}
// Retry everything (except 2xx)
{
success: '<300',
stopAndFail: [],
repeat: '>=300'
}
Range Syntax #
// Single number
200
// Array of numbers
[200, 201, 204]
// Range string
'200-299' // 200 to 299 (inclusive)
// Comparison operators
'<300' // Less than 300
'<=299' // Less than or equal to 299
'>499' // Greater than 499
'>=500' // Greater than or equal to 500
// Mixed array
[200, '201-205', '>=300']
Monitoring and Debugging #
Checking Retry Status #
import { getRepeatHops, getRepeatInterval } from '@orchesty/nodejs-sdk/lib/Utils/Headers';
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const headers = dto.getHeaders();
const currentHop = getRepeatHops(headers) || 0;
const interval = getRepeatInterval(headers);
console.log(`Retry attempt ${currentHop}, interval: ${interval}s`);
// Process...
return dto;
}
Logging Retries #
import logger from '@orchesty/nodejs-sdk/lib/Logger/Logger';
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const { jobId } = dto.getJsonData();
const status = await this.checkStatus(jobId);
if (status !== 'completed') {
logger.info(
`Job ${jobId} not ready (${status}). Setting repeater.`,
dto
);
dto.setRepeater(30, 10, `Job ${status}`);
}
return dto;
}
Viewing Retries in Orchesty Admin #
In Orchesty Admin, you can see:
- Number of retry attempts
- Retry interval
- Reason for retry
- Time until next retry
- Retry history
Best Practices #
1. Choose Appropriate Intervals #
// Fast operations (locks, quick checks)
dto.setRepeater(5, 12, 'Waiting for lock'); // 1 minute total
// Standard operations (API calls, async jobs)
dto.setRepeater(30, 10, 'Processing'); // 5 minutes total
// Slow operations (file processing, reports)
dto.setRepeater(60, 30, 'Generating report'); // 30 minutes total
// Rate limits
dto.setRepeater(300, 4, 'Rate limited'); // 20 minutes total
2. Limit Maximum Retries #
// Don't retry forever
dto.setRepeater(30, 10, 'Waiting...'); // 5 minutes max
// Not recommended
dto.setRepeater(60, 1000, 'Waiting...'); // 16+ hours!
3. Provide Clear Retry Reasons #
// Good
dto.setRepeater(30, 10, `Job ${jobId} is ${status}. Progress: ${progress}%`);
// Bad
dto.setRepeater(30, 10, 'Not ready');
4. Remove Repeater on Success #
if (operationSucceeded) {
dto.removeRepeater(); // Important!
dto.setJsonData(result);
}
5. Don't Retry Validation Errors #
// Validation error - don't retry
if (!data.email) {
dto.setStopProcess(
ResultCode.STOP_AND_FAILED,
'Email is required'
);
return dto;
}
// Network error - retry
try {
await this.callApi();
} catch (error) {
if (error.code === 'ETIMEDOUT') {
throw new OnRepeatException(30, 5, 'Network timeout');
}
throw error;
}
6. Use Different Strategies for Different Errors #
try {
return await this.callApi();
} catch (error) {
if (error.status === 429) {
// Rate limit - long interval, few retries
throw new OnRepeatException(300, 3, 'Rate limited');
}
if (error.status >= 500) {
// Server error - standard retry
throw new OnRepeatException(30, 10, 'Server error');
}
if (error.code === 'ETIMEDOUT') {
// Timeout - quick retry
throw new OnRepeatException(10, 5, 'Timeout');
}
// Other errors - don't retry
throw error;
}
Retry vs Stop #
When to Retry #
- Network timeouts
- HTTP 5xx server errors
- HTTP 429 rate limiting
- HTTP 408 request timeout
- Temporary resource unavailability
- "Try again later" responses
- Async operation not complete
When to Stop (Don't Retry) #
- HTTP 4xx client errors (except 408, 429)
- Validation failures
- Authentication errors
- Resource not found (404)
- Permission denied (403)
- Invalid input data
- Business logic violations
Troubleshooting #
Retries Not Working #
Check:
- Is repeater set correctly?
dto.setRepeater(interval, hops, reason);
- Is node configuration overriding code?
- Check node settings in Orchesty Admin
- Are retries exhausted?
- Check maximum hops limit
Infinite Retries #
// Don't do this
while (true) {
dto.setRepeater(30, 10, 'Retrying');
return dto;
}
// Instead, check condition
if (!isComplete) {
dto.setRepeater(30, 10, 'Not complete yet');
} else {
dto.removeRepeater();
}
Retry Interval Not Respected #
- Minimum interval is 1 second
- RabbitMQ scheduling has ~1 second granularity
- Node configuration may override code settings
Related Concepts #
- Error Handling - Different error types and when to retry
- Data Flow - Understanding message lifecycle
- Logging - Monitoring retry attempts
- Connector - Implementing retries in connectors
- Rate Limiting - Preventing rate limit errors
API References #
- ProcessDto -
setRepeater()andremoveRepeater()methods - OnRepeatException - Exception class
- ErrorHandler - Retry middleware
- CurlSender - HTTP retry logic
- ResultCode - Result codes
Next Steps #
- Learn about Error Handling for managing different failure types
- Understand Data Flow to see where retries fit in the message lifecycle
- Explore Rate Limiting to prevent errors that require retries
- Read ProcessDto reference for complete retry method documentation