Integration monitoring

The process record tells you what happened to one run. Integration monitoring tells you what is happening across all your processes right now: how many are running, how many are failing, how fast each topology is moving, where the slow nodes are.

For background see Concepts: Processes and Messages.

What's in the dashboards #

The Admin UI ships with built-in dashboards organized in three layers:

Instance overview #

The "is everything fine?" view:

  • Processes started in the last hour / day / 7 days.
  • Processes currently in flight.
  • Failure rate, broken down by topology.
  • Trash inbox size, by topology.
  • Notifications fired in the last day.

If something looks abnormal, this is where you spot it first.

Screenshot pending

Instance overview dashboard

Top of page; processes counters, failure rate sparkline, Trash widget.

target 1280 x 620

Per-topology dashboard #

The "where is this topology spending time?" view. For the selected topology:

  • Per-node throughput (messages / minute).
  • Per-node latency (p50 / p95 / p99).
  • Per-node error rate.
  • Queue depth between nodes (where backpressure is building up).
  • Last 24h and last 7 days side-by-side.

This is the dashboard you open when a topology feels slow or when a node is suspect.

Per-process detail #

The process detail page in the Admin UI, drillable from the per-topology dashboard by clicking any failed or slow process. It shows which nodes were visited, which succeeded, which failed, and basic per-node metrics. For richer per-step payload audit see Trace auditing (Pro & Enterprise).

What you don't have to instrument #

You do not need to add custom metrics to a worker for any of the above. The platform measures node-level latency, throughput, and result codes from the moment a node starts processing a message; the dashboards just visualize what is already collected.

You do still want logs for "what happened inside one call". See Logging.

Health vs alerting #

Dashboards are for looking. Notifications are for being told. The two should not duplicate each other:

  • Use Notifications for things that need a human to act now: failed messages, instance limits.
  • Use dashboards for trend questions: "are we slower this week?", "which topology is responsible for most of the failures?", "is the Trash inbox growing or draining?".

Exporting metrics #

If you already run Grafana, Datadog, or a similar stack, the platform exposes:

  • A Prometheus-compatible metrics endpoint for instance-level metrics (queue depths, broker health, worker process counts).
  • A per-topology metrics export you can scrape on a schedule.

Use these to bring Orchesty health into the same dashboards you already watch for the rest of your infrastructure.

See also #

© 2025 Orchesty Solutions. All rights reserved.