How to perform Full set data load in Continuous mode
K
Kousik Das
started a topic
2 months ago
Hi ,
I have a requirement as below.
1. Every time full set of data to be loaded through Continuous Load (through a INTEGRATE_ALL integration job)
2. Unlike in the tutorial example as below
(xdm-tutorials/integration/3-load-data-via-sql/sqlserver/7-update-1-insert-sd-person.sql at master · semarchy/xdm-tutorials · GitHub),
we are not planning to do any duplicate detection during the load i.e. every time full set of data will be pushed to SD table.
Technically it should be feasible, but I have few doubts as below:
1. Should we need to delete/truncate SD table before every submit or keep that as is in the SD tables?
As per my understanding if I use continuous load, then B_LOAD will be reserved and reused for that particular Continuous load name, so after first
submission and execution through this Continuous load, next time If I try to simply insert full set of data again with Same B_LOAD ID, B_SOURCEID,
B_PUBID combination to a SD_table of a Fuzzy Entity it will not allow me to do that as above 3 columns are system defined constraint in SD table.
So, if there is some update in any other column in 2nd set of data that will not be considered.
2. Secondly, with reference to above scenario,
a) How (strategically) full set of data can be loaded every time through continuous load facility?
b) Is delta detection is mandatory before inserting to Semarchy SD table to configure and use Continuous load?
Please help me with your valuable suggestions and also let me know if I have missed to consider anything already present in Semarchy.
Regards,
Kousik
Best Answer
A
Alexia SIMOND
said
about 2 months ago
Hello Kousik,
Handling full data load via continuous load is possible and won't violate the SD primary key.
A funding notion is that calling the get_continuous_loadid function returns the continuous load number.
Which is different from the loadid that will be generated by this continuous load.
I will explain those working principles below via an example.
If I create a continuous load TestCL whose number is 10 :
I will be able to see it in my repository table MTA_INTEG_LOAD as such:
However, all batches submitted via this continuous load have their own loadID (column LOADID below) while still being linked to the continuous load (via the column R_CONTINUOUS_LOAD) :
So in the end, in the DL_BATCH table in the data location, even though you don't see the trace of this continuous load 10, you'll still be able to find all the resulting loads with their own loadIds.
For example, in my case, we can find loads 11 and 12 that have been submitted via my continuous load :
That being said, the fact that continuous load can handle full data load, doesn't mean that it is the most suitable implementation, it all depends on your context.
Because, as you might know, full data load will result in higher volumetry in source tables, as well as historic tables if you have activated them (you might then need to keep the concept of purge in mind).
So depending on your context, delta detection might make more sense, I will leave you the discretion to analyze this on your end.
I hope this clarifies how continuous load is working.
Handling full data load via continuous load is possible and won't violate the SD primary key.
A funding notion is that calling the get_continuous_loadid function returns the continuous load number.
Which is different from the loadid that will be generated by this continuous load.
I will explain those working principles below via an example.
If I create a continuous load TestCL whose number is 10 :
I will be able to see it in my repository table MTA_INTEG_LOAD as such:
However, all batches submitted via this continuous load have their own loadID (column LOADID below) while still being linked to the continuous load (via the column R_CONTINUOUS_LOAD) :
So in the end, in the DL_BATCH table in the data location, even though you don't see the trace of this continuous load 10, you'll still be able to find all the resulting loads with their own loadIds.
For example, in my case, we can find loads 11 and 12 that have been submitted via my continuous load :
That being said, the fact that continuous load can handle full data load, doesn't mean that it is the most suitable implementation, it all depends on your context.
Because, as you might know, full data load will result in higher volumetry in source tables, as well as historic tables if you have activated them (you might then need to keep the concept of purge in mind).
So depending on your context, delta detection might make more sense, I will leave you the discretion to analyze this on your end.
I hope this clarifies how continuous load is working.
Wishing you a good day
Best regards,
Alexia
1 person likes this
K
Kousik Das
said
about 2 months ago
Thank you very much for this informative and highly helpful information.
Kousik Das
Hi ,
I have a requirement as below.
1. Every time full set of data to be loaded through Continuous Load (through a INTEGRATE_ALL integration job)
2. Unlike in the tutorial example as below
(xdm-tutorials/integration/3-load-data-via-sql/sqlserver/7-update-1-insert-sd-person.sql at master · semarchy/xdm-tutorials · GitHub),
we are not planning to do any duplicate detection during the load i.e. every time full set of data will be pushed to SD table.
Technically it should be feasible, but I have few doubts as below:
1. Should we need to delete/truncate SD table before every submit or keep that as is in the SD tables?
As per my understanding if I use continuous load, then B_LOAD will be reserved and reused for that particular Continuous load name, so after first
submission and execution through this Continuous load, next time If I try to simply insert full set of data again with Same B_LOAD ID, B_SOURCEID,
B_PUBID combination to a SD_table of a Fuzzy Entity it will not allow me to do that as above 3 columns are system defined constraint in SD table.
So, if there is some update in any other column in 2nd set of data that will not be considered.
2. Secondly, with reference to above scenario,
a) How (strategically) full set of data can be loaded every time through continuous load facility?
b) Is delta detection is mandatory before inserting to Semarchy SD table to configure and use Continuous load?
Please help me with your valuable suggestions and also let me know if I have missed to consider anything already present in Semarchy.
Regards,
Kousik
Hello Kousik,
Handling full data load via continuous load is possible and won't violate the SD primary key.
Which is different from the loadid that will be generated by this continuous load.
I will explain those working principles below via an example.
For example, in my case, we can find loads 11 and 12 that have been submitted via my continuous load :
That being said, the fact that continuous load can handle full data load, doesn't mean that it is the most suitable implementation, it all depends on your context.
Because, as you might know, full data load will result in higher volumetry in source tables, as well as historic tables if you have activated them (you might then need to keep the concept of purge in mind).
So depending on your context, delta detection might make more sense, I will leave you the discretion to analyze this on your end.
I hope this clarifies how continuous load is working.
Wishing you a good day
Best regards,
Alexia
- Oldest First
- Popular
- Newest First
Sorted by Oldest FirstAlexia SIMOND
Hello Kousik,
Handling full data load via continuous load is possible and won't violate the SD primary key.
Which is different from the loadid that will be generated by this continuous load.
I will explain those working principles below via an example.
For example, in my case, we can find loads 11 and 12 that have been submitted via my continuous load :
That being said, the fact that continuous load can handle full data load, doesn't mean that it is the most suitable implementation, it all depends on your context.
Because, as you might know, full data load will result in higher volumetry in source tables, as well as historic tables if you have activated them (you might then need to keep the concept of purge in mind).
So depending on your context, delta detection might make more sense, I will leave you the discretion to analyze this on your end.
I hope this clarifies how continuous load is working.
Wishing you a good day
Best regards,
Alexia
1 person likes this
Kousik Das
Thank you very much for this informative and highly helpful information.
-
Import Data Into Entities via Azure Data Lake
-
Recover Deleted(soft Delete) Record and Configure in Application
-
Data Quality in batch mode and real-time integration
-
Integration with analytics tools
-
Query/Load/Delete data with the REST API
-
Does the Done Tab in Inbox have a limit?
-
How Can I Trigger Enricher or Sql Procedure when deleting?
-
Matching Rules But Only The Latest Record Creates a Golden Record
-
Unstructured and Semi Structured Data in Semarchy?
-
Read CSV files from AWS S3
See all 71 topics