By default when creating and configuring Duplicate Managers a pre-defined Semarchy job is triggered starting with the prefix DUJ
. This Semarchy job contains tasks that are not necessary for a duplicate manager to run, and can sometimes slow the performance of your batch job. If you are experiencing slowness when Confirming Matches, Merging Suggestion, or other related Duplicate Managers tasks, then creating a custom job can help to improve batch performance. In this guide we will remove the Delete tasks from Duplicate Managers, which add unnecessary extra time to your batch process.
Create a Custom Job
- Navigate to the Model Design Tab.
- Under Jobs > Create a New Job.
- Select Next.
- Add the entities that should process in this job. Keep in mind that if your Duplicate Managers also handles child entities, those child entities must be added to the Job as well.
- Under the Job Parameters section, set the PARAM_ENABLE_DELETE_PHASE to 0. This will remove all of the Delete tasks from the Integration Batch.
- You should also set the PARAM_ANALYZE_STATS to 0. This will disable stats collection, which is not needed on smaller data sets and can slow performance.
More details on custom integration jobs can be found in the Developer’s Guide
Enabling the Custom Job in the Duplicate Managers
Now that you have a custom job created, you can use it in Duplicate Managers by selecting it with the On Finish Job property.
Once you have created and configured the new job to run on Duplicate Managers, the final step is to deploy your model to apply your changes.