Entity Resolution
Entity resolution is the process of determining when different records refer to the same real-world thing, and merging them accordingly.
What is entity resolution?
Entity resolution is the process of working out when two or more records — often drawn from different sources — actually refer to the same real-world entity, and then reconciling them into a single, coherent representation. It is also known as record linkage, deduplication, or identity resolution.
The problem arises almost anywhere data is combined. The same customer might appear as “Jane A. Smith” in a sales system, “J. Smith” in a support tool, and “Jane Smith” in a billing database, with no shared identifier between them. Until those records are recognized as describing one person, any count, total, or insight derived from the combined data will be wrong.
How entity resolution works
A typical entity resolution pipeline proceeds in stages:
→
Blocking: the data is partitioned into candidate groups so that only plausibly-matching records are compared, avoiding the cost of comparing every record against every other.→
Matching (or scoring): candidate pairs are compared across their properties — names, addresses, dates, identifiers — and assigned a similarity score using rules, statistical models, or machine learning.→
Clustering: records whose scores indicate a match are grouped together into clusters that each represent a single entity.→
Merging: the records in a cluster are combined into one canonical entity, with conflicting values resolved according to data-quality and provenance rules.
Why is entity resolution difficult?
→
Noisy and incomplete data: typos, abbreviations, missing fields, and inconsistent formatting obscure genuine matches.→
No shared keys: sources rarely agree on a common identifier, so matches must be inferred from the data itself.→
Scale: the number of possible record pairs grows quadratically, which is why blocking and efficient indexing matter.→
Ambiguity: distinct entities can look almost identical (two different people with the same name), while a single entity can look like several (a company recorded under multiple legal names).
Getting entity resolution right is essential to building a clean knowledge graph; without it, the same thing ends up represented many times over, and relationships between entities are fragmented.
Entity resolution in HASH
Because HASH builds a connected, typed graph of entities, recognizing when incoming data describes something HASH already knows about is a core concern. HASH uses AI to help propose matches as data is ingested, and records the diff and merge history so that resolution decisions remain transparent and reversible. Combined with HASH’s bitemporal versioning, this means you can always see how, when, and why two records came to be treated as one.
Create a free account
Sign up to try HASH out for yourself, and see what all the fuss is about
By signing up you agree to our terms and conditions and privacy policy