May Dynamic Sampling be an Answer for OLAP Systems’ Statistics Needs?

It is very important to determine when and how often to gather new statistics for Cost Based Optimizer. Especially after 10g since Rule Based Optimizer is now obsolete as a result the Automatics Statistics Gathering mechanism keeps all the statistics current by default after a fresh installation with a scheduled job.

-- Verifying Automatic Statistics Gathering Job
SELECT owner, job_name, enabled 

The default gathering interval is nightly but you can customize this interval for your needs. The frequency of collection intervals should balance the task of providing accurate statistics and the processing overhead incurred by the statistics collection process. If you want you can disable this job and have your own strategy of course.

-- Disabling Automatic Statistics Gathering Job

Or you can manually gather and lock statistics at the table or schema level(LOCK_SCHEMA_STATS) as another option.

-- Manual statistics strategy example

You can query the STATTYPE_LOCKED column of DBA_TAB_STATISTICS view to determine if the statistics on the table are locked or not and you may use the UNLOCK_TABLE/SCHEMA_STATS procedures to unlock the statistics on a specified table. An important note here is when you lock the statistics on a table all of the dependent statistics are considered locked like table statistics, column statistics, histograms and dependent index statistics.

In an OLAP environment for tables which are being substantially modified in batch operations such as bulk loads usually DBMS_STATS procedure is called as soon as the load operation completes as a part of that ETL step. Also statistics are needed to be gathered manually after new object creation until the configured custom or Oracle’s Automatic Statistics Gathering job process takes action. The statistics in certain cases on these highly volatile tables can be set to NULL and when Oracle encounters a table with no statistics optimizer dynamically gathers the necessary statistics as part of the query optimization. This 10g new feature is called dynamic sampling and is controlled by the OPTIMIZER_DYNAMIC_SAMPLING parameter which should be set to a value of 2 or higher(the default value is 2).

-- Setting statistics to NULL by deleting and then locking the statistics for Dynamic Sampling

Dynamic sampling offers the opportunity to collect statistics for segments with either missing statistics or stale statistics. Where as this approach involves an overhead at query optimization time and this action is repeated for the same objects unless statistics are gathered. This usually is not acceptable for OLTP applications but when it comes to OLAP usually parsing time can be discarded compared to the execution time and also avoiding statistics collection overhead on huge tables may be a high motivation for most of the time.

A 100% dynamic sampling-based OLAP system topic was raised and discussed before by David Aldridge here, so any further comments are welcomed.


Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s