ID mapping between systems
Keep stable references between systems that don't share identifiers. The pattern that makes two-way sync possible without creating duplicates.
Two systems rarely agree on identifiers for the same thing. HubSpot calls a contact 123, your warehouse calls the same person WHS-9981, your e-shop calls them user 42. The first time you sync between them everything looks fine. The second time, without an explicit mapping, you create duplicates: a brand-new contact in HubSpot for the customer who already exists, a second warehouse record with a slightly different name, and a support ticket the next morning.
ID mapping is the small, boring table that prevents all of that. It is also the pattern that quietly enables every two-way and multi-system synchronization on the platform.
When you need it #
You need an ID mapping table whenever:
- You sync the same entity in two directions and need to avoid creating duplicates on the round trip.
- The remote system assigns its own id on creation, and you need to remember that id for future updates.
- You sync between more than two systems and want a single canonical identifier that all sides agree on.
You don't need it for one-way fire-and-forget feeds where the receiving side is a write-only sink (a data lake, an analytics event stream) and you never read it back.
Two shapes, two trade-offs #
Pairwise mapping #
The simplest shape is one row per (source system, source id, target system, target id). Easy to query in either direction, easy to extend with a third system later, easy to inspect by hand in a database.
| source_system | source_id | target_system | target_id |
|---|---|---|---|
| hubspot | 123 | warehouse | WHS-9981 |
| eshop | 42 | warehouse | WHS-9981 |
Answering "what is the warehouse id for HubSpot 123?" is a single row read. Answering "what is the e-shop id for warehouse WHS-9981?" is a join across two rows. For two or three systems this is almost always the right shape.
Canonical id #
A second shape introduces a stable internal id and maps every external system to it.
| canonical_id | system | external_id |
|---|---|---|
| C-000123 | hubspot | 123 |
| C-000123 | warehouse | WHS-9981 |
| C-000123 | eshop | 42 |
This becomes attractive once you sync four or more systems and want an identity that does not depend on any one of them. It is more work to maintain (you have to mint and hand out the canonical id) and the read pattern is always two-step, but it scales cleanly.
Pick the shape based on lifecycle, not size
Pairwise mapping handles millions of rows fine. The reason to graduate to canonical ids is not row count but the lifecycle of the systems: when an external system is replaced, retired, or split in two, a canonical id lets you re-point one column without rewriting every other system's references.
Where to store it #
Store the table where it survives a worker restart. Worker memory is not durable, and an in-memory mapping disappears the moment a pod is rescheduled. The right place is your own database — a small id_mapping table (or collection) that a custom node reads from and writes to. Pick the engine you already operate (PostgreSQL, MySQL, MongoDB); the access pattern is a primary-key lookup and an upsert, so any of them is fine.
Defence in depth: store the inverse id in the systems too #
Whenever the integrated systems offer a custom field, an external reference attribute, or any free-form metadata slot on the entity, also write the foreign id there. HubSpot contacts get a warehouse_id custom property, warehouse records get a hubspot_id field, e-shop customers get both. The mapping table remains the primary source of truth — these inline ids are a backup.
Why bother:
- Disaster recovery. If you lose the mapping table (corrupted backup, accidental truncate, ransomware on the integration database), you can rebuild it by walking the entities on either side and reading the inverse id from each record. Without that, the only path back is fuzzy matching on business keys, which is slow, error-prone, and produces duplicates.
- Operator self-service. A support engineer looking at a HubSpot contact can immediately see the matching warehouse id without opening the integration database.
- Cross-team debugging. Teams that don't have access to the integration DB can still trace a problem across systems on their own.
This is not always possible — some systems have no extension fields, some are write-restricted, and some entities (line items, address sub-records) are too small to carry metadata. Do it where you can; the systems that allow it are the ones where outages tend to be most painful.
Where in the topology to use it #
Each direction of sync needs two specific nodes:
- Resolve. Early in the topology, a custom node takes the source id and looks up the target id. If found, the message proceeds enriched with the target id. If not found, the message branches into the "create" path.
- Persist. When the create path successfully provisions a new record on the other side, a custom node writes the new pair into the mapping table before any downstream consumer reads it.
The order matters. Persist before downstream consumption, otherwise the next message that arrives for the same entity will not see the mapping yet and will trigger a second create.
// resolve node
public async processAction(dto: ProcessDto): Promise<ProcessDto> {
const data = dto.getJsonData() as { hubspotId: string };
const warehouseId = await this.idMappingRepo.find(
'hubspot',
data.hubspotId,
'warehouse',
);
if (warehouseId) {
dto.setJsonData({ ...data, warehouseId });
return dto;
}
dto.setStopProcess(ResultCode.DO_NOT_CONTINUE, 'no-mapping');
return dto;
}
Operational notes #
- Conflicts. Two parallel processes can race to create the same mapping. Put a unique constraint on (source_system, source_id, target_system) and treat duplicate-key errors as "already mapped, fine, move on". This single line of defensive code prevents an entire class of duplicate-record incidents.
- Backfilling. When you add a new system to an existing landscape, run a one-off topology that reads existing entities from both sides, matches them on a business key (email, SKU, tax id), and writes the initial mapping rows. Plan this run before you flip the regular sync on.
- Auditability. Mapping rows are forensic gold when something looks wrong in a synced record. Keep
created_atandcreated_by_topologycolumns from day one. They cost nothing to add and save hours during incident triage. - Soft deletes. Don't delete mapping rows when the source record is removed. Mark them inactive instead. A mapping that disappears makes "why did the same customer get created twice last week" impossible to answer.
Related #
- Patterns: ID mapping (docs) — the reference page that backs this guide.
- CRM ↔ ERP sync use case — round-trip mapping between a CRM and an ERP including the inverse-id defence-in-depth pattern.
- Eshop synchronization use case — ID mapping in a real multi-system scenario.
- Data comparator guide — the natural companion: resolve once, then skip if unchanged.