May 19, 2017 SAP BW On HANA Training Videos 1 - SAP BW On HANA Tutorial for beginners ( +977 ) - Duration: 2:32:59. SAP Tutorial for beginners 18,585 views.
Hello Experts,
I was trying to understand Table Comparison’s Input Contain Duplicate Keys option and after my experiments I found below results.
When records coming from source has multiple similar values for primary key then to process them/handle them we use Input Contain Duplicate Keys option of Table Comparison.
If column(s) specified in Input Contain Primary Key do not have unique key for every incoming row then we go for this option to handle duplicate keys.
Initially, before job execution, both before image and after image are empty. When job is executed then data from comparison table is loaded in the before image of Table Comparison and from before image the correct record(s), as per generated OPCODES, Insert, Update or Delete, are sent to after image which is nothing but our target table. When job is executed this transform fires select statement based on columns present in Input Primary Key columns and all the records are brought to before image buffer.
select col1, col2, . from target_table where input primary key columns in (xxxx,xxxx,. ) -> If there are more than one columns specified in Input Primary Key columns section. But if there is only one column present in Input Primary Key columns then resulted sql is:
select col1, col2, . from target_table where input primary key columns = ID;
Now, before Image contains the initial target data before job execution. When job is executed then every incoming source row is compared with the records present in before image of comparison table and according to the opcodes generated (I, U or D) the result is sent to after image i.e., to final target.
When Input Contain Duplicate Keys option is checked then it means transform knows that there is duplicate data coming from source. So of all the records present in after image if any column values, based on columns present in Input Primary Key columns, changes then Table comparison will generate ‘U’ opcode for all incoming rows having same ID ( ID means column present in Input Primary Key columns).
Lil bit confused right? Let’s dig deeper with an example.
Initial target: – Data present in target before job gets executed.
Now I’ve inserted two new records with EMP_ID = 1010, 1011 as highlighted.
Updated Source: –
Input Contain Duplicate Keys option is checked in Table Comparison: –
After job execution:-
Record with EMP_ID 1010 (EMPNAME RAJ) and 1011 (EMPNAME RAJ EY) is present in target before job is executed. Supposed two new records with same EMP_ID (1010, 1011) arrives from source. Also we’ve 1010 and 1011 already present in source too.
Existing records: 1010 RAJ…, 1011 RAJ EY….
New Records: 1010 RAJ WRITER,… 1011 RAJ MCKINSEY….
First record with EMP_ID 1010 (1010, RAJ WRITER,.) is compared with existing EMP_ID, 1010, in comparison table’s before image buffer (1010, RAJ,.). Table comparison finds that EMPNAME was initially RAJ and new incoming name for same column is RAJ WRITER so it sends an update ‘U’ opcode.
Next record with EMP_ID 1011 (1010, RAJ MCKINSEY,.) is compared with existing EMP_ID, 1011, in comparison table’s Before Image buffer (1011, RAJ EY,.). Table comparison finds that EMPNAME was initially RAJ EY and new incoming name for same column is RAJ MCKINSEY so it sends an update ‘U’ opcode.
Then comes 3rd row of source which has EMP_ID 1010 (1010, RAJ,.) and this row is compared with the existing record in target but it finds that no column value is changed. So ideally it should not send an Update but since Input Contain duplicate keys is checked so due to this the existing id will also be sent as ‘U’ opcode because for this EMP_ID earlier there was an update.
Similar is the case with 4th row which has EMP_ID 1010 (1010, RAJ EY,.).
Now you might be thinking why for other records it has not sent an update?
It will not sent an update as no column values for EMP_ID 1012, 1013, 1014… and so on have changed. Hence it no update for them.
Now comes the question out of all the updates which will be reflected in target?
Always the last record or the latest record will be reflected in target of all the Updates.
So Final Target: –
Now let’s suppose that I insert a new record with EMP_ID 1020: –
Target before job execution: – Playready test server generate key.
So now TC will see that of all the id’s EMP_ID 1020 is not present in target. So it’ll fire an insert statement and will insert the records.
Hence after execution result is: –
Final Target: –
To understand how does records get inserted and how their processing happen, visit my blog
https://blogs.sap.com/2017/06/01/table-comparison-row-by-row-select-processing-of-records/
This option slower down the performance as additional memory is required to keep a track of duplicate records.
Hope it help!
Please let me know if I have missed anything or anything needs to added/delete.
Thanks:)
Hello Experts,
I was trying to understand the Generated Key Column option present in table comparison and after experimenting I found the below observation.
When your target column has duplicate records present in it then this option is used to detect any update or insert for the incoming records. By duplicate record I mean duplicate values for column(s) defined in Input Primary Key columns section at the target side in table comparison transform.
The value selected in this column should identify every record present in target side uniquely. The transform selects the maximum value of generated key column, for column(s) defined in Input Primary key columns, and compares it with the incoming records.
Let’s see this with the following example.
Consider the below source and target when job was executed initially.
Source: –
Target: –
Now you might be thinking how S_NO has the records. This is done by Key generation transform. In a gist, this Key Generation transform looks for the maximum value of generated key column and starts increasing by the number you specify.
I’ll post about the functioning of Key Generation in my coming blog posts.
Now I add two new records for EMP_ID 1010 and 1011 as shown,
New Records: –
1011 RAJ EY 976000 654 02-FEB-18
1010 ADVISORY 1256400 256 27-SEP-18
New source before execution: –
After execution: –
Records present in final target: –
Now how it processed the records and of all the updates which updates got reflected in target?
When job is executed Table Comparison transform brings all the records to its before image and then it start comparing with the incoming source records. Now since you’ve selected the Generated Key column so the maximum value, for the duplicate records, of Generated Key column is picked and then its compared with incoming records.
For eg,
For EMP_ID 1010 EMPNAME EY has the maximum value of generated key column which is 2 (as S_NO is 2). Similarly for EMP_ID 1010 EMPNAME RAJ KPMG has the generated key column value which is 4. (Because both EMP_ID 1010 and 1011 has two records and the records which have maximum generated key value is picked)
Based on S_NO, key which uniquely identified every row present, incoming source records are compared. And of all the updates generated it’s always the first update which is reflected in target.
Let me clarify by adding more records.
I truncate the existing target table and insert complete new records into it.
So before execution,
New Source: –
New Target: –
Following records are added: –
1012 RAJ WIN 18526 654 27-AUG-18
1012 RAJAN EY 65000 652 03-MAR-18
New source: –
Target before execution: –
After execution: –
Now see incoming record with EMP_ID 1011 is compared with EMPNAME RAJ KPMG having S_NO 7, which is maximum for EMP_ID 1011 and an update is send by Table comparison.
Next incoming record with EMP_ID 1012 is compared with EMPNAME RAJ PWC having S_NO 8, which is maximum for EMP_ID 1012.
Again next incoming record with EMP_ID 1010 is compared with EMPNAME having EY having S_NO 5, which is maximum for EMP_ID 1010.
In this way all the incoming records are compared with their maximum generated keys values and accordingly update is sent to the target.
So my final Target table looks like: –
The other duplicate records are not at all affected except the records with maximum generated key value (here it’s S_NO).
So always remember, of all the updates generated for a particular primary key, column(s) defined in Input Primary Keys, it’s always the first update which is sent to the target for that particular primary key.
Hope it clarifies!
Kindly let me know if I’ve missed anything or something needs to be added or modified.
Thanks 🙂