Thursday, June 5, 2008

DataStage .. Sparse VS Normal Lookup

A Normal look up tries to fit your reference data into memory,and then does an in-memory look up. And so if your reference data is huge, this can cause performane issues in your job.

Based on your reference data volume - if it is very high - you can either opt for a Merge or a Join stage.

A sparse look up is used when your input or main stream data is very less compared to your reference data. In this case, DataStage executes an SQL query on the reference DB for each incoming record. So,if the input is very high compared to the reference - this type of look up will be a huge hit to performance. A sparse look up is only supported in the Oracle Enterprise, DB2 Enterprise and the ODBC Enterprise stages - and that when directly linked to a Lookup Stage.

Data read by any database stage can serve as the reference input to a Lookup stage. By default, this reference data is loaded into memory like any other reference link.When directly connected as the reference link to a Lookup stage, bothDB2/UDB Enterprise and Oracle Enterprise stages allow the lookup type to be changed to Sparse and send individual SQL statements to the reference database for each incoming Lookup row. Sparse Lookup is only available when the database stage is directly connected to the reference link, with no intermediate stages.It is important to note that the individual SQL statements required by a Sparse Lookup are an expensive operation from a performance perspective. In most cases, it is faster to use a DataStage Join stage between the input and DB2 reference data than it is to perform a Sparse Lookup.
For scenarios where the number of input rows is significantly smaller (1:100 or more) than the number of reference rows in a DB2 or Oracle table, a Sparse Lookup may be appropriate

2 comments:

The Bird Sayz said...

Hi Ketan,
It might also be noted that while using a Sparse Lookup, the Lookup stage can only support a single reference link.

Aneesh

python training said...

Thank you for sharing This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Datastage online training