Batch

Most data sources don't hand you the whole collection at once. They paginate: give me page 1, give me page 2 with this cursor, repeat until empty. Modelling that loop in a workflow tool usually means a small topology of its own — a "fetch page" node, a state node to remember the cursor, a conditional that loops back, a guard against duplicates if the run dies mid-iteration.

Orchesty collapses all of that into a single node type: the Batch.

What a Batch is #

A Batch is a connector that the platform runs in a loop on your behalf. It calls the source, hands back the items it read, optionally returns a cursor for the next page, and the platform re-invokes it until the cursor is no longer set.

From the topology's point of view, a Batch is one box on the canvas. Behind that box, the platform takes care of:

  • Calling the node again for every page until iteration ends.
  • Persisting the cursor between calls, so a crash mid-iteration resumes from the right page instead of starting over or duplicating items.
  • Emitting items into the next queue as soon as a page is read, so downstream nodes start processing while pagination is still in progress.

Everything downstream of a Batch is plain stream processing again — see Topologies.

How items leave a Batch #

A Batch lets you decide how records are atomized when they leave the node. The conceptual choices are:

  • One message per record — maximum parallelism downstream; each item travels through the stream independently.
  • Chunks of N — useful when the destination has a bulk endpoint and you want fewer, larger messages.
  • One message per page — when downstream needs to see the page as a whole.

There is also a mode that holds items until the whole iteration finishes, for cases where downstream needs to see all pages at once (aggregation, sorting, totals). For SDK signatures and the full set of options see Patterns: Pagination and batch.

See also #

© 2025 Orchesty Solutions. All rights reserved.