Prefetch (per-node parallelism)
Prefetch is the number of messages the platform's bridge pulls from a node's input queue at once and hands to your worker concurrently. It is set per node, on the consumer side of the queue, and it is one of the highest-leverage tuning knobs in Orchesty: a single change can move a topology from one message at a time to ten in flight, with throughput gains in the same order.
Why prefetch matters #
- Throughput. Raising prefetch on a node that is currently the bottleneck typically yields throughput gains in the same multiple — a node at
prefetch=1saturating a single worker can reach roughly N× by moving toprefetch=N, provided the worker has the CPU and the upstream/downstream don't cap it first. - Resource cost. Higher prefetch means more in-flight messages held by the bridge, more concurrent goroutines, more open connections, and proportionally more memory and CPU on both the bridge and the worker that consumes the queue. Treat prefetch as a load multiplier on the whole node.
- Ordering. With
prefetch > 1the platform processes messages on that node concurrently and acknowledges them out of order. The topology no longer guarantees per-node message ordering once you move above1. If a downstream step depends on input order, keep the prefetch at1. - Bounded. The accepted range is
1(default) to20. Both the UI modal and the API enforce this bound; values outside the range are rejected or clamped.
Where prefetch applies #
Prefetch lives on worker-consuming node types only:
- Connector nodes
- Batch nodes
- Custom Action nodes
Trigger nodes (Start, Cron, Webhook) are not worker consumers in this sense and have no prefetch setting — see Building nodes: Event nodes for what configures their throughput instead.
How to set it #
Open the topology in the read-only editor and select the node. The prefetch value is shown inline in the node's selected-label header next to the Connector / Worker line:
Connector: hubspot.upsert-contact | Prefetch: 1 Settings
Click Settings on that line, or right-click the node and pick Prefetch settings, to open the modal:
Screenshot pending
Prefetch settings on a Connector node
Selected Connector node showing the inline label with `| Prefetch: N Settings`. Cursor hovering Settings.
target 1100 x 320
Screenshot pending
Prefetch settings modal
Numeric field 1-20, three buttons: Save & Republish, Save and continue editing, Cancel.
target 600 x 420
You have three actions:
- Save & Republish. Persists the new value and immediately republishes the topology. The bridge restarts with the new prefetch and the Bridge is out of sync banner clears. This works whether the topology is currently enabled or stopped — a disabled topology is rebuilt and (re)started with the new value.
- Save and continue editing. Persists the new value but skips republish. The database holds the new prefetch, but the running bridge keeps the old value until you republish. A yellow Bridge is out of sync banner appears on the topology page until you do.
- Cancel. Discards changes.
Screenshot pending
Bridge is out of sync banner
Topology page showing the yellow banner with a Republish now action.
target 1200 x 320
Why republish is required #
The bridge reads node configuration only on startup. Changing prefetch in the database while the bridge is running has no effect on the in-flight consumer; the bridge keeps using the value it loaded when it last started. Republishing tears down the bridge, regenerates its config, and starts it again with the new prefetch in place. The bridgeOutOfSync flag on the topology document tracks this divergence so the UI can warn you, and the Republish now action clears it.
Sizing guidance #
There is no single right value, but most cases fall into four buckets:
| Workload shape | Suggested prefetch | Rationale |
|---|---|---|
| Short, stateless steps — JSON shaping, light HTTP calls, ID lookups | 5–10 | Cheap to parallelise, gains scale almost linearly with prefetch. |
| I/O-heavy or memory-heavy — large payloads, multi-MB documents, heavy CPU work | 1–3 | Each in-flight message holds memory; concurrency helps less and costs more. |
| Order-sensitive consumers — ID mapping, sequential state machines, "events for the same customer" | 1 | Ordering must be preserved; throughput is bought elsewhere (e.g. by partitioning upstream). |
| Behind a Limiter | As high as fits the worker | The Limiter paces actual outbound calls, so prefetch can be aggressive without hammering the upstream — see below. |
Two practical signals from the dashboards:
- The node's queue is consistently long and the worker CPU is not pegged → raise prefetch.
- The worker's memory climbs with prefetch and you start seeing OOMs or slow GC → lower it.
Combining with the Limiter #
Prefetch and rate limiting are orthogonal. Prefetch controls how many messages a node accepts in parallel; the Limiter controls the rate at which calls leave the node toward an external API. Putting a Limiter behind a connector means you can keep its prefetch aggressive — even on a 100k-row import — and the Limiter will still pace outbound traffic to the quota. Without the Limiter, raising prefetch is also raising the rate at which you hit the upstream, which is usually a 429 waiting to happen.
Ordering trade-off #
prefetch > 1 lets the bridge dispatch the next message before the previous one is acknowledged, so processing — and acknowledgement — can finish out of order. Most integrations don't care: each record is independent and the destination accepts them in any order. A few cases do care:
- ID mapping and idempotency keyed by entity — see Patterns: ID mapping.
- Sequential state machines — order in which you apply state transitions for the same entity matters (orders, inventory, financial postings).
- Source-of-truth event streams — events for the same customer/order must be applied in the order they happened.
The mechanical fix is prefetch=1 on the affected node. If you need throughput too, partition upstream so that order matters only inside each partition (one queue per customer / per shard) and keep prefetch low only on the order-sensitive consumer.
API surface (community) #
For scripted changes, the editor calls these endpoints. Hand-crafted automation (CI, infra-as-code) can use them too:
| Method | Route | Notes |
|---|---|---|
PATCH | /api/nodes/{id} | Body { "prefetch": <int 1..20> }. Allowed node types: connector, batch, batch_connector, custom. Sets the parent topology's bridgeOutOfSync = true. |
POST | /api/topologies/{id}/republish | Stops the bridge, regenerates from the topology generator, starts it again, clears bridgeOutOfSync. Returns 409 if the topology has never been published (still a draft). |
POST | /api/topologies/{id}/unpublish | Stops the bridge and flips visibility back to draft. Idempotent. |
Prefetch is part of the standard node payload (Node::toArray() → prefetch: int), so any tool that already calls GET /api/topologies/{id}/nodes already sees it.
Legacy BPMN editor #
The older BPMN designer in app-ui/ writes prefetch into the schema as @pipes:rabbitPrefetch. Schema saves continue to work and update the database value as before.
The Rete editor in app-ui-new defaults the JSON node payload to prefetch=1. To prevent it from clobbering an explicitly-set value when somebody saves the schema, the backend's setNodeAttributes keeps the existing database value whenever the incoming DTO carries the default (prefetch <= 1) and the database already has a non-default value (> 1). Explicit values from the BPMN designer (or the PATCH /api/nodes/{id} API) still win.
If you operate both editors in parallel, the canonical path forward is to set prefetch through the read-only editor's modal or the API — they always force a republish acknowledgement and won't be overridden by a subsequent schema save.