Retry Policy

What is Retry Policy? #

Retry policy in Orchesty defines how the system handles failures and temporary issues during data processing. Instead of immediately failing when something goes wrong, Orchesty can automatically retry operations after a specified interval, giving transient issues time to resolve.

Key concepts:

Repeater: Automatic retry mechanism with configurable intervals and limits
OnRepeatException: Exception type that triggers retries
Result code ranges: Define which HTTP status codes trigger retries
Hops: Number of retry attempts remaining
Interval: Time between retry attempts

Why Retry Policy Matters #

Modern integrations deal with many temporary failures:

Network timeouts: Brief connectivity issues
Rate limiting: API throttling (try again later)
Async operations: Waiting for jobs to complete
Service unavailability: Temporary outages
Resource locks: Database/file locks that clear quickly

Without retries, these temporary issues would cause permanent failures. With a proper retry policy, most transient issues resolve automatically.

How Retries Work #

The Retry Flow #

sequenceDiagram
    participant N as Node
    participant M as Middleware
    participant Q as Queue
    participant R as RabbitMQ
    
    N->>N: Process message
    N->>N: Error occurs
    N->>M: Throw OnRepeatException
    M->>M: Check retry count
    alt Has retries remaining
        M->>Q: Add to retry queue
        Note over Q: Wait interval seconds
        Q->>N: Retry processing
    else Max retries exceeded
        M->>R: Send to trash
        Note over R: Manual recovery needed
    end

Retry States #

stateDiagram-v2
    [*] --> Processing
    Processing --> Success: Operation succeeds
    Processing --> TransientError: Temporary failure
    TransientError --> Waiting: Set repeater
    Waiting --> Processing: After interval
    Processing --> PermanentError: Validation/business error
    Processing --> MaxRetries: Retries exhausted
    Success --> [*]
    PermanentError --> [*]
    MaxRetries --> Trash
    Trash --> [*]

Implementing Retries #

Method 1: Using Repeater (Explicit Control) #

The most common approach - explicitly set retry parameters:

import AConnector from '@orchesty/nodejs-sdk/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/lib/Utils/ProcessDto';
import ResultCode from '@orchesty/nodejs-sdk/lib/Utils/ResultCode';

export default class PollingConnector extends AConnector {
    
    public getName(): string {
        return 'poll-job-status';
    }
    
    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        const { jobId } = dto.getJsonData();
        
        // Check job status
        const status = await this.checkJobStatus(jobId);
        
        if (status === 'pending' || status === 'processing') {
            // Job not ready yet - retry every 30 seconds, up to 10 times
            dto.setRepeater(
                30,    // interval in seconds
                10,    // maximum retry attempts
                `Job ${jobId} still ${status}`
            );
            
            dto.setJsonData({ status, jobId });
            return dto;
        }
        
        if (status === 'completed') {
            // Success! Remove repeater and continue
            dto.removeRepeater();
            
            const result = await this.fetchJobResult(jobId);
            dto.setJsonData(result);
            dto.setSuccessProcess('Job completed successfully');
            return dto;
        }
        
        if (status === 'failed') {
            // Permanent failure - don't retry
            dto.setStopProcess(
                ResultCode.STOP_AND_FAILED,
                `Job ${jobId} failed permanently`
            );
            return dto;
        }
        
        return dto;
    }
    
    private async checkJobStatus(jobId: string): Promise<string> {
        // Implementation
        return 'pending';
    }
    
    private async fetchJobResult(jobId: string): Promise<any> {
        // Implementation
        return {};
    }
}

Method 2: Automatic HTTP Retries #

The SDK automatically retries certain HTTP errors:

import AConnector from '@orchesty/nodejs-sdk/lib/Connector/AConnector';
import ProcessDto from '@orchesty/nodejs-sdk/lib/Utils/ProcessDto';
import RequestDto from '@orchesty/nodejs-sdk/lib/Transport/Curl/RequestDto';
import { HttpMethods } from '@orchesty/nodejs-sdk/lib/Transport/HttpMethods';

export default class ApiCallConnector extends AConnector {
    
    public getName(): string {
        return 'api-call';
    }
    
    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        const requestDto = new RequestDto(
            'https://api.example.com/data',
            HttpMethods.GET,
            dto
        );
        
        // The SDK automatically retries:
        // - HTTP 408 (Request Timeout)
        // - HTTP 500+ (Server Errors)
        // - Network timeouts
        //
        // Default: 60 second interval, 10 retries
        const response = await this.getSender().send(requestDto);
        
        dto.setData(response.getBody());
        return dto;
    }
}

Method 3: Custom Result Code Ranges #

Control which HTTP status codes trigger retries:

import { IResultRanges } from '@orchesty/nodejs-sdk/lib/Transport/Curl/ResultCodeRange';

public async processAction(dto: ProcessDto): Promise<ProcessDto> {
    const requestDto = new RequestDto(url, HttpMethods.GET, dto);
    
    // Define custom retry behavior
    const codeRanges: IResultRanges = {
        success: '<300',                    // 2xx = success
        stopAndFail: ['300-408', '409-429', '430-499'],  // 4xx = fail (except 429)
        repeat: [408, 429, '>=500']        // Retry on timeout, rate limit, and 5xx
    };
    
    // Pass custom ranges to sender
    const response = await this.getSender().send(
        requestDto,
        codeRanges,
        45,   // retry interval (seconds)
        5     // max hops
    );
    
    dto.setData(response.getBody());
    return dto;
}

Method 4: Throwing OnRepeatException #

Programmatically trigger retries:

import OnRepeatException from '@orchesty/nodejs-sdk/lib/Exception/OnRepeatException';

public async processAction(dto: ProcessDto): Promise<ProcessDto> {
    try {
        const result = await this.performOperation();
        dto.setJsonData(result);
        return dto;
    } catch (error) {
        if (this.isTransientError(error)) {
            // Retry every 20 seconds, up to 5 times
            throw new OnRepeatException(
                20,    // interval
                5,     // maxHops
                `Transient error: ${error.message}`
            );
        }
        
        // Not transient - throw normally
        throw error;
    }
}

private isTransientError(error: any): boolean {
    return error.code === 'ECONNRESET' ||
           error.code === 'ETIMEDOUT' ||
           error.message.includes('temporarily unavailable');
}

Retry Configuration #

Setting Repeater Parameters #

dto.setRepeater(
    interval,   // Seconds between retries (minimum: 1)
    maxHops,    // Maximum retry attempts (minimum: 1)
    reason      // Reason for retry (for logging)
);

Examples:

// Quick retries for fast-resolving issues
dto.setRepeater(5, 3, 'Database lock');

// Standard retries
dto.setRepeater(30, 10, 'Waiting for async job');

// Slow retries for rate limiting
dto.setRepeater(300, 4, 'API rate limit exceeded');

// Patient retries for long-running operations
dto.setRepeater(60, 60, 'Waiting for file processing (up to 1 hour)');

Removing Repeater #

// Once operation succeeds, remove repeater
if (jobCompleted) {
    dto.removeRepeater();
    dto.setJsonData(result);
}

Node-Level Repeater Configuration #

Configure default repeaters in Orchesty Admin at the node level:

{
  "repeater": {
    "enabled": true,
    "interval": 60,
    "hops": 10
  }
}

When node configuration exists, it overrides repeater set in code.

Common Retry Patterns #

Pattern 1: Polling Async Operations #

export default class AsyncJobPoller extends AConnector {
    
    public getName(): string {
        return 'async-job-poller';
    }
    
    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        const input = dto.getJsonData();
        const jobId = input.jobId || input.id;
        
        // Check job status
        const job = await this.checkStatus(jobId);
        
        switch (job.status) {
            case 'queued':
            case 'processing':
                // Still working - retry every 15 seconds
                dto.setRepeater(15, 40, `Job ${job.status}`);
                dto.setJsonData({ 
                    status: job.status,
                    progress: job.progress,
                    jobId: jobId
                });
                break;
                
            case 'completed':
                // Done! Get results and continue
                dto.removeRepeater();
                const result = await this.getResults(jobId);
                dto.setJsonData(result);
                dto.setSuccessProcess(`Job ${jobId} completed`);
                break;
                
            case 'failed':
                // Permanent failure
                dto.setStopProcess(
                    ResultCode.STOP_AND_FAILED,
                    `Job ${jobId} failed: ${job.error}`
                );
                break;
                
            default:
                // Unknown status - stop
                dto.setStopProcess(
                    ResultCode.STOP_AND_FAILED,
                    `Unknown job status: ${job.status}`
                );
        }
        
        return dto;
    }
}

Pattern 2: Rate Limit Handling #

export default class RateLimitedConnector extends AConnector {
    
    public getName(): string {
        return 'rate-limited-api';
    }
    
    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        try {
            const requestDto = new RequestDto(url, HttpMethods.GET, dto);
            const response = await this.getSender().send(requestDto);
            
            dto.setData(response.getBody());
            return dto;
            
        } catch (error) {
            if (error.response?.status === 429) {
                // Rate limited - check Retry-After header
                const retryAfter = parseInt(
                    error.response.headers['retry-after'] || '60'
                );
                
                throw new OnRepeatException(
                    retryAfter,
                    5,
                    `Rate limited. Retry after ${retryAfter} seconds`
                );
            }
            
            throw error;
        }
    }
}

Pattern 3: Exponential Backoff #

export default class ExponentialBackoffConnector extends AConnector {
    
    private readonly BASE_INTERVAL = 10;
    private readonly MAX_INTERVAL = 300;
    
    public getName(): string {
        return 'exponential-backoff';
    }
    
    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        try {
            const result = await this.attemptOperation(dto);
            dto.setJsonData(result);
            return dto;
            
        } catch (error) {
            // Calculate retry interval based on attempt number
            const attempt = this.getAttemptNumber(dto);
            const interval = Math.min(
                this.BASE_INTERVAL * Math.pow(2, attempt),
                this.MAX_INTERVAL
            );
            
            dto.setRepeater(
                interval,
                10,
                `Attempt ${attempt + 1}: ${error.message}`
            );
            
            return dto;
        }
    }
    
    private getAttemptNumber(dto: ProcessDto): number {
        const repeatHops = dto.getHeader('repeat-hops') || '0';
        return parseInt(repeatHops, 10);
    }
}

Pattern 4: Resource Availability Check #

export default class ResourceWaiter extends AConnector {
    
    public getName(): string {
        return 'wait-for-resource';
    }
    
    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        const { resourceId } = dto.getJsonData();
        
        // Check if resource is available
        const available = await this.checkResourceAvailability(resourceId);
        
        if (!available) {
            // Not ready - retry every 10 seconds, up to 30 times (5 minutes)
            dto.setRepeater(
                10,
                30,
                `Waiting for resource ${resourceId} to become available`
            );
            return dto;
        }
        
        // Resource available - proceed
        dto.removeRepeater();
        const resource = await this.getResource(resourceId);
        dto.setJsonData(resource);
        
        return dto;
    }
}

Pattern 5: Conditional Retry #

export default class ConditionalRetryConnector extends AConnector {
    
    public getName(): string {
        return 'conditional-retry';
    }
    
    public async processAction(dto: ProcessDto): Promise<ProcessDto> {
        try {
            const result = await this.performOperation();
            dto.setJsonData(result);
            return dto;
            
        } catch (error) {
            // Only retry specific error types
            if (this.shouldRetry(error)) {
                dto.setRepeater(
                    30,
                    5,
                    `Retrying due to: ${error.message}`
                );
                return dto;
            }
            
            // Don't retry - fail immediately
            dto.setStopProcess(
                ResultCode.STOP_AND_FAILED,
                `Operation failed: ${error.message}`
            );
            return dto;
        }
    }
    
    private shouldRetry(error: any): boolean {
        const retryableCodes = ['ECONNRESET', 'ETIMEDOUT', 'ENOTFOUND'];
        return retryableCodes.includes(error.code) ||
               error.message.includes('temporary') ||
               error.status >= 500;
    }
}

Result Code Ranges #

Default Ranges #

{
    success: '<300',                      // 200-299
    stopAndFail: ['300-408', '409-500'], // 300-407, 409-499
    repeat: [408, '>=500']               // 408, 500+
}

Custom Range Examples #

// Treat 404 as success (resource doesn't exist is OK)
{
    success: ['<300', 404],
    stopAndFail: ['300-404', '405-500'],
    repeat: [408, '>=500']
}

// Retry on 429 (rate limit)
{
    success: '<300',
    stopAndFail: ['300-408', '409-429', '430-500'],
    repeat: [408, 429, '>=500']
}

// Never retry (fail immediately)
{
    success: '<300',
    stopAndFail: '>=300',
    repeat: []
}

// Retry everything (except 2xx)
{
    success: '<300',
    stopAndFail: [],
    repeat: '>=300'
}

Range Syntax #

// Single number
200

// Array of numbers
[200, 201, 204]

// Range string
'200-299'     // 200 to 299 (inclusive)

// Comparison operators
'<300'        // Less than 300
'<=299'       // Less than or equal to 299
'>499'        // Greater than 499
'>=500'       // Greater than or equal to 500

// Mixed array
[200, '201-205', '>=300']

Monitoring and Debugging #

Checking Retry Status #

import { getRepeatHops, getRepeatInterval } from '@orchesty/nodejs-sdk/lib/Utils/Headers';

public async processAction(dto: ProcessDto): Promise<ProcessDto> {
    const headers = dto.getHeaders();
    const currentHop = getRepeatHops(headers) || 0;
    const interval = getRepeatInterval(headers);
    
    console.log(`Retry attempt ${currentHop}, interval: ${interval}s`);
    
    // Process...
    return dto;
}

Logging Retries #

import logger from '@orchesty/nodejs-sdk/lib/Logger/Logger';

public async processAction(dto: ProcessDto): Promise<ProcessDto> {
    const { jobId } = dto.getJsonData();
    const status = await this.checkStatus(jobId);
    
    if (status !== 'completed') {
        logger.info(
            `Job ${jobId} not ready (${status}). Setting repeater.`,
            dto
        );
        
        dto.setRepeater(30, 10, `Job ${status}`);
    }
    
    return dto;
}

Viewing Retries in Orchesty Admin #

In Orchesty Admin, you can see:

Number of retry attempts
Retry interval
Reason for retry
Time until next retry
Retry history

Best Practices #

1. Choose Appropriate Intervals #

// Fast operations (locks, quick checks)
dto.setRepeater(5, 12, 'Waiting for lock');  // 1 minute total

// Standard operations (API calls, async jobs)
dto.setRepeater(30, 10, 'Processing');  // 5 minutes total

// Slow operations (file processing, reports)
dto.setRepeater(60, 30, 'Generating report');  // 30 minutes total

// Rate limits
dto.setRepeater(300, 4, 'Rate limited');  // 20 minutes total

2. Limit Maximum Retries #

// Don't retry forever
dto.setRepeater(30, 10, 'Waiting...');  // 5 minutes max

// Not recommended
dto.setRepeater(60, 1000, 'Waiting...');  // 16+ hours!

3. Provide Clear Retry Reasons #

// Good
dto.setRepeater(30, 10, `Job ${jobId} is ${status}. Progress: ${progress}%`);

// Bad
dto.setRepeater(30, 10, 'Not ready');

4. Remove Repeater on Success #

if (operationSucceeded) {
    dto.removeRepeater();  // Important!
    dto.setJsonData(result);
}

5. Don't Retry Validation Errors #

// Validation error - don't retry
if (!data.email) {
    dto.setStopProcess(
        ResultCode.STOP_AND_FAILED,
        'Email is required'
    );
    return dto;
}

// Network error - retry
try {
    await this.callApi();
} catch (error) {
    if (error.code === 'ETIMEDOUT') {
        throw new OnRepeatException(30, 5, 'Network timeout');
    }
    throw error;
}

6. Use Different Strategies for Different Errors #

try {
    return await this.callApi();
} catch (error) {
    if (error.status === 429) {
        // Rate limit - long interval, few retries
        throw new OnRepeatException(300, 3, 'Rate limited');
    }
    
    if (error.status >= 500) {
        // Server error - standard retry
        throw new OnRepeatException(30, 10, 'Server error');
    }
    
    if (error.code === 'ETIMEDOUT') {
        // Timeout - quick retry
        throw new OnRepeatException(10, 5, 'Timeout');
    }
    
    // Other errors - don't retry
    throw error;
}

Retry vs Stop #

When to Retry #

Network timeouts
HTTP 5xx server errors
HTTP 429 rate limiting
HTTP 408 request timeout
Temporary resource unavailability
"Try again later" responses
Async operation not complete

When to Stop (Don't Retry) #

HTTP 4xx client errors (except 408, 429)
Validation failures
Authentication errors
Resource not found (404)
Permission denied (403)
Invalid input data
Business logic violations

Troubleshooting #

Retries Not Working #

Check:

Is repeater set correctly?

dto.setRepeater(interval, hops, reason);

Is node configuration overriding code?

Check node settings in Orchesty Admin

Are retries exhausted?

Check maximum hops limit

Infinite Retries #

// Don't do this
while (true) {
    dto.setRepeater(30, 10, 'Retrying');
    return dto;
}

// Instead, check condition
if (!isComplete) {
    dto.setRepeater(30, 10, 'Not complete yet');
} else {
    dto.removeRepeater();
}

Retry Interval Not Respected #

Minimum interval is 1 second
RabbitMQ scheduling has ~1 second granularity
Node configuration may override code settings

Error Handling - Different error types and when to retry
Data Flow - Understanding message lifecycle
Logging - Monitoring retry attempts
Connector - Implementing retries in connectors
Rate Limiting - Preventing rate limit errors

API References #

ProcessDto - setRepeater() and removeRepeater() methods
OnRepeatException - Exception class
ErrorHandler - Retry middleware
CurlSender - HTTP retry logic
ResultCode - Result codes

Next Steps #

Learn about Error Handling for managing different failure types
Understand Data Flow to see where retries fit in the message lifecycle
Explore Rate Limiting to prevent errors that require retries
Read ProcessDto reference for complete retry method documentation