Commencer un nouveau sujet
Répondu

Deterministic or probabilistic matching

Does xDM support deterministic or probabilistic matching?


Meilleure réponse

Semarchy xDM leverages both deterministic and probabilistic matching. 

Deterministic matching means "matching using rules." That can lead to some confusion as all xDM matching is based on rules. However, xDM rules can contain probabilistic algorithms for 'fuzzy matched' entities. Rules may includes either or both exact matching (name1 = name2) and fuzzy matching (phonetic(name1) = phonetic(name2)).

Fuzzy Matched Entities use a Matcher to automatically detect duplicates using fuzzy matching algorithms such as:

• Metaphone & Double Metaphone
• Soundex
• Edit Distance & Edit Distance Similarity
• Jaro Winker & Jaro Winkler Similarity
• NGRAMs
• Levenshtein
• etc.

Multiple Match Rules in a matcher allow you to define any number of conditions required for considering two records a match. Each condition has its own Matching Score. This score represents the percentage of confidence you put in a match that occurs based on the rule. Records are considered matched when the aggregate of all conditions exceed thresholds you define.

Matching Groups are created using matching transitivity. Matching Transitivity means:

If A matches B and B matches C, then A, B, and C are in the same matching group. Each matching group has a Confidence Score expressing the level of confidence across the group of matching records. This score is the average of the individual match scores in the group.


1 commentaire

Réponse

Semarchy xDM leverages both deterministic and probabilistic matching. 

Deterministic matching means "matching using rules." That can lead to some confusion as all xDM matching is based on rules. However, xDM rules can contain probabilistic algorithms for 'fuzzy matched' entities. Rules may includes either or both exact matching (name1 = name2) and fuzzy matching (phonetic(name1) = phonetic(name2)).

Fuzzy Matched Entities use a Matcher to automatically detect duplicates using fuzzy matching algorithms such as:

• Metaphone & Double Metaphone
• Soundex
• Edit Distance & Edit Distance Similarity
• Jaro Winker & Jaro Winkler Similarity
• NGRAMs
• Levenshtein
• etc.

Multiple Match Rules in a matcher allow you to define any number of conditions required for considering two records a match. Each condition has its own Matching Score. This score represents the percentage of confidence you put in a match that occurs based on the rule. Records are considered matched when the aggregate of all conditions exceed thresholds you define.

Matching Groups are created using matching transitivity. Matching Transitivity means:

If A matches B and B matches C, then A, B, and C are in the same matching group. Each matching group has a Confidence Score expressing the level of confidence across the group of matching records. This score is the average of the individual match scores in the group.


Connexion pour poster un commentaire