Home » Open Source Rocks » Pentaho Data Integration (a.k.a. Kettle) and Database Sharding
             

Pentaho Data Integration (a.k.a. Kettle) and Database Sharding

So I recently had a situation where I needed to run the same transformation over multiple database shards, which turns out to be relatively easy to do with the right mix of transformations and jobs. This post will walk through the steps needed so you can do the same thing.

Go ahead and download the sample files: shard_example – it includes a text file with a sample set of parameters as well as the transformations/jobs that use them.

Thanks to PDI’s database connections being able to take parameters, all that is needed is the connection information for each of the shards. This can come from many sources (a table input step, text file, etc.). For this post’s example, we’ll use a text file with the database and the server names. The same concept used for those two parameters will also work for the database account’s username and password.

The Parent Job

Step 1 – Retrieve the Parameters

For this example, a text input step is used to pull the database and server names into the result set. The data is then passed back into the result of the parent job.

Step 1 - Retrieve the Parameters

Step 2 – Using the Parameters in a Sub-Job

The parent job uses the parameters from the first transformation and passes them to the second step, in this case a sub-job.

Step 2 - Parameters in Sub Job

It’s important to note that for each set of parameters to be used in the sub-job, the checkbox for running the step with each row in the result set must be checked. If this is not checked, then all the input will be passed into the sub-job, which will set the parameters as whatever the last input row happens to be.

Execute For Every Input Row

There are two steps in the sub-job – one to set the parameters based off the input row given, and then a step that will create a text file based off those parameters.

Shard Function Job

From here, it should be relatively easy to build transformations that can loop through all of the shards – go ahead and give it a try!

Copyright © 2011 - All Rights Reserved

Leave a Reply

Your email address will not be published. Required fields are marked *