How to manage duplicates using SQL queries : xDM

This article describes the use cases and queries to use in order to manage duplicates programmatically.

General Process

To manage duplicates programmatically, you must use a load (as if you were loading data or doing a workflow), insert data into the UM table, and then run a job to process the records modified by the matching.
This is typically what happens when you run a duplicate management workflow. Using SQL queries, you can do the same behind the scene.

The general process should be:

Initialize a Load using the INTEGRATION_LOAD.GET_NEW_LOADID PL/SQL function.
Load the UM table with the generated LoadID and certain values in this table's columns.
The columns to load are listed in the rest of the document.
Submit the Load using the INTEGRATION_LOAD.SUBMIT_LOAD PL/SQL function.
Make sure to run a job that processes the entities for which the match and merge will run, as well as the related (child) entities.

Example: General Process Wrapper

DECLARE
 vLoad_id NUMBER;
 vBatch_id NUMBER;

BEGIN
 vLoad_id:=SEMARCHY_DEMO_REPOSITORY.INTEGRATION_LOAD.GET_NEW_LOADID(
 'CustomerAndFinancialMDM',
 0,
 'Matching Script',
 'Matching Script launched to fix issue XYZ',
 'John Doe'
 );

 dbms_output.Put_line('Load #' || vLoad_id || ' started.');

 -- Your SQL Script goes there
 
 vBatch_id:=SEMARCHY_DEMO_REPOSITORY.INTEGRATION_LOAD.SUBMIT_LOAD(
 vLoad_id,
 'INTEGRATE_DATA',
 'John Doe'
 );

 dbms_output.Put_line('Load #' || vLoad_id || ' completed.');
 dbms_output.Put_line('Batch #' || vBatch_id || ' started.');
END;

Use cases

The following use cases (with SQL code) are detailed in Semarchy documentation:

How to manage duplicates using SQL queries Print

General Process

Use cases

Related Articles