Monday, February 13, 2012

Availability during ETL processing

Hi,

I am in the midst of designing a new Data Warehouse system. As we get further into the design of the system, the more we are realising how complex our ETL is going to be and that the amount of time it will take to run could be significant i.e. a few days! My question is obviously I don't want to have a down time in my relational system for this long and prevent my users from accessing the data for days at a time each month. So what functionality should I be looking at to allow me to maintain a working copy of the data that users can query whilst I perform database updates and then perform a quick promotion of the updated data to users for querying?

If you can point me in the direction of the right functionality in SQL Server 2005 and possible some relevant white papers that cover this sort of scenario I would be grateful.

regards

Colin

In your ETL process, are you building a completely new DW database, or are you updating the existing one?

This seems like a very good application for some form of snapshots.

If you're updating an existing database, then database snapshots should fit the bill. They will present a static view of the data as it existed at the point in time when the snapshot was created. This allows the source database to be updated without disturbing the users.

If you create and populate a new database in each ETL cycle, then you may want to look into something like volume snapshots using the VSS framework. In this case, you are making a snapshot of the entire volume(s) on which the database resides. This can then be mounted as a read-only database for your users to query. The advantage in this scenario is that it doesn't need to be tied to the previous database.

No comments:

Post a Comment