Thursday, June 5, 2008

DataStage .. Sparse VS Normal Lookup

A Normal look up tries to fit your reference data into memory,and then does an in-memory look up. And so if your reference data is huge, this can cause performane issues in your job.

Based on your reference data volume - if it is very high - you can either opt for a Merge or a Join stage.

A sparse look up is used when your input or main stream data is very less compared to your reference data. In this case, DataStage executes an SQL query on the reference DB for each incoming record. So,if the input is very high compared to the reference - this type of look up will be a huge hit to performance. A sparse look up is only supported in the Oracle Enterprise, DB2 Enterprise and the ODBC Enterprise stages - and that when directly linked to a Lookup Stage.

Data read by any database stage can serve as the reference input to a Lookup stage. By default, this reference data is loaded into memory like any other reference link.When directly connected as the reference link to a Lookup stage, bothDB2/UDB Enterprise and Oracle Enterprise stages allow the lookup type to be changed to Sparse and send individual SQL statements to the reference database for each incoming Lookup row. Sparse Lookup is only available when the database stage is directly connected to the reference link, with no intermediate stages.It is important to note that the individual SQL statements required by a Sparse Lookup are an expensive operation from a performance perspective. In most cases, it is faster to use a DataStage Join stage between the input and DB2 reference data than it is to perform a Sparse Lookup.
For scenarios where the number of input rows is significantly smaller (1:100 or more) than the number of reference rows in a DB2 or Oracle table, a Sparse Lookup may be appropriate

Tuesday, June 3, 2008

Hi Guys
Please check out he comments made Aneesh ... use ful for more understanding ...
Thnx Anees

First would like to congratulate you on your blog! Its a good effort and am sure well appreciated.

Regarding the Sort Stage, you have said - "The Sort Stage offers a variety of options of retaining first or last records when removing duplicate records...."

Just wanted to point out that the option of retaining the first or last while de-duplication is only available in the Remove Duplicate Stage.

Probably another point that i would like to call out is regarding the usage of the Transformer stage. This stage works faster than some of the others like like the Filter stage. The "evilness" of the Transformer I believe was broken open to the public during the IBM IOD Conference,2007. Vincent Mcburney had called this out in his blogs a couple of times.