ID mapping
Two systems rarely share the same identifiers for the same entity. Hubspot says contact 123, your warehouse says customer WHS-9981, the e-shop says user 42. The moment you sync between them you need a way to remember "Hubspot's 123 is the same person as warehouse's WHS-9981". That mapping is what this pattern is about.
When you need it #
You need an ID mapping table whenever:
- You sync the same entity in two directions and need to avoid creating duplicates on the round trip.
- The remote system assigns its own id on creation, and you need to remember it for future updates.
- You sync between more than two systems and need a single canonical id that all sides agree on.
You don't need it for one-way fire-and-forget feeds where you never read back the receiving side.
Two shapes #
Pairwise mapping #
The simplest shape. One row per (source system, source id, target system, target id). Easy to query in either direction, easy to grow to a third system later, easy to inspect in a database.
| source_system | source_id | target_system | target_id |
|---|---|---|---|
| hubspot | 123 | warehouse | WHS-9981 |
| eshop | 42 | warehouse | WHS-9981 |
To answer "what is the warehouse id for hubspot 123?" you read the row directly. To answer "what is the eshop id for warehouse WHS-9981?" you join two rows.
Canonical id #
A second shape adds a stable internal id and maps every external system to it.
| canonical_id | system | external_id |
|---|---|---|
| C-000123 | hubspot | 123 |
| C-000123 | warehouse | WHS-9981 |
| C-000123 | eshop | 42 |
Useful when you sync many systems and want a stable identity that does not depend on any one of them. More work to maintain.
Where to store it #
Store the table where it survives a worker restart, not in worker memory. Use your own database — a small id_mapping table (or collection) read and written from a custom node. Any engine you already operate (PostgreSQL, MySQL, MongoDB) is fine; the access pattern is a primary-key lookup and an upsert.
Application install settings is not a mapping store. It is for configuration (endpoints, credentials, feature flags, fixed reference values) and putting synced runtime data there couples it to the install lifecycle and turns reads into expensive settings fetches. Treat it as an anti-pattern even for small mappings.
Defence in depth: write the inverse id back into the systems #
Whenever an integrated system offers a custom field, an external reference attribute, or any free-form metadata slot on the entity, also write the foreign id there — Hubspot contacts get a warehouse_id custom property, warehouse records get a hubspot_id field, e-shop customers get both. The mapping table stays the source of truth; these inline ids are a backup that lets you reconstruct the mapping by walking the entities on both sides if the table is ever lost (corrupted backup, accidental truncate, tenant migration gone wrong). Without this, recovery falls back to fuzzy matching on business keys, which is slow and produces duplicates. Do it where the systems allow it.
Where in the topology to use it #
Two nodes per direction:
- Resolve. A custom node early in the topology takes the source id and looks up the target id. If found, the message proceeds. If not found, the message branches to the "create" path.
- Persist. When you create a record on the other side, a custom node writes the new pair into the mapping table before downstream consumers read it.
// resolve node
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const data = dto.getJsonData() as { hubspotId: string };
const warehouseId = await this.idMappingRepo.find('hubspot', data.hubspotId, 'warehouse');
if (warehouseId) {
dto.setJsonData({ ...data, warehouseId });
return dto;
}
dto.setStopProcess(ResultCode.DO_NOT_CONTINUE, 'no-mapping');
return dto;
}
Operational notes #
- Conflicts. Two parallel processes can race to create the same mapping. Use a unique constraint on (source_system, source_id, target_system) and treat duplicate-key errors as "already mapped, fine".
- Backfilling. When you add a new system to an existing landscape, run a one-off topology that reads existing entities from both sides and writes the initial mapping rows.
- Auditability. Mapping rows are forensic gold when something looks wrong in a synced record. Keep
created_atandcreated_by_topologycolumns.