Integration monitoring
The process record tells you what happened to one run. Integration monitoring tells you what is happening across all your processes right now: how many are running, how many are failing, how fast each topology is moving, where the slow nodes are.
For background see Concepts: Processes and Messages.
What's in the dashboards #
The Admin UI ships with built-in dashboards organized in three layers:
Instance overview #
The "is everything fine?" view:
- Processes started in the last hour / day / 7 days.
- Processes currently in flight.
- Failure rate, broken down by topology.
- Trash inbox size, by topology.
- Notifications fired in the last day.
If something looks abnormal, this is where you spot it first.
Screenshot pending
Instance overview dashboard
Top of page; processes counters, failure rate sparkline, Trash widget.
target 1280 x 620
Per-topology dashboard #
The "where is this topology spending time?" view. For the selected topology:
- Per-node throughput (messages / minute).
- Per-node latency (p50 / p95 / p99).
- Per-node error rate.
- Queue depth between nodes (where backpressure is building up).
- Last 24h and last 7 days side-by-side.
This is the dashboard you open when a topology feels slow or when a node is suspect.
Per-process detail #
The process detail page in the Admin UI, drillable from the per-topology dashboard by clicking any failed or slow process. It shows which nodes were visited, which succeeded, which failed, and basic per-node metrics. For richer per-step payload audit see Trace auditing (Pro & Enterprise).
What you don't have to instrument #
You do not need to add custom metrics to a worker for any of the above. The platform measures node-level latency, throughput, and result codes from the moment a node starts processing a message; the dashboards just visualize what is already collected.
You do still want logs for "what happened inside one call". See Logging.
Health vs alerting #
Dashboards are for looking. Notifications are for being told. The two should not duplicate each other:
- Use Notifications for things that need a human to act now: failed messages, instance limits.
- Use dashboards for trend questions: "are we slower this week?", "which topology is responsible for most of the failures?", "is the Trash inbox growing or draining?".
Exporting metrics #
If you already run Grafana, Datadog, or a similar stack, the platform exposes:
- A Prometheus-compatible metrics endpoint for instance-level metrics (queue depths, broker health, worker process counts).
- A per-topology metrics export you can scrape on a schedule.
Use these to bring Orchesty health into the same dashboards you already watch for the rest of your infrastructure.