Oracle Best Practices Part 5

Part 1 best practices –
1- Put your Database into ARCHIVELOG Mode
2- Comment on Tables and Columns similar to your PL/SQL or SQL Source Codes
3- Use DBMS_STATS for statistic collection
4- Using Recovery Manager(RMAN) for Backup and Recovery needs

Part 2 best practices –
5- Resumable Statements and Space Allocation
6- Maintenance of Global Partitioned Indexes
7- Default Tablespaces
8- PL/SQL Bulk Operations
9- Locking Issues
10- Oracle System Event Triggers
11- Autonomous Transactions

Part 3 best practice –
12- Learn what is already provided with PL/SQL language

Part 4 best practice –
13- Defining Application Services for Oracle Database 10g

14- Cost Based Optimizer Best Practices and Advised Strategy

A. When and how to gather statistics for the cost based optimizer?

1. When to gather statistics; some candidate situations are as follows;

– After new database creations,
– After new created and altered segments(tables, indexes, partitions, clusters etc.),
– After hardware upgrades like CPUs, I/O subsystem(gather system statistics),
– After migration from Rule based optimizer(RBO) to Cost based optimizer(CBO),
– After large amounts of data change(bulk operations, loads, purges etc.),
– After new high/low values for keys generated,
– After 10g Optimizer statistics are automatically gathered with the job GATHER_STATS_JOB. This job gathers statistics on all objects in the database which have Missing or Stale statistics,
– Also data dictionary and fixed tables need statistics after 10g.

2. How much to gather; using Sample Sizes are recommended;

– estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE optimal for uniform sizes and distributions of data but problems can occur with skewed data or if there are lot of nulls in the column,
– Smaller sample sizes cane be preferred for large objects to save time and resource consumption,
– Row sampling means full scan of data sampling rows where as block sampling reduces I/O.

3. How to keep the statistics up to date; proposed methodology summary at “A Practical Approach to Optimizer Statistics in 10g by Andrew Holdsworth Director of Real World Performance” is as follows;

– Mentioned best practices in A.1. and A.2. will work well for 90% of your SQLs. In the initial post of this series with part 3 “Use DBMS_STATS for statistic collection” I discussed the advantages of DBMS_STATS over ANALYZE for statistics collection. So prefer dbms_stats package over analyze command(we still need analyze for backward compatibility, validate structure(index_stats information also) and list chained rows options), here is an example of advised general stats gathering method;


BEGIN
   dbms_stats.gather_schema_stats(ownname          => USER,
                                  options          => 'GATHER AUTO',
                                  degree           => 4,
                                  estimate_percent => dbms_stats.auto_sample_size,
                                  cascade          => TRUE,
                                  method_opt       => 'FOR ALL COLUMNS SIZE AUTO',
                                  granularity      => 'AUTO');
   COMMIT;
END;
/

BEGIN
   dbms_stats.gather_table_stats(ownname          => USER,
                                 tabname          => 'EMPLOYEES',
                                 degree           => 4,
                                 estimate_percent => dbms_stats.auto_sample_size,
                                 cascade          => TRUE,
                                 method_opt       => 'FOR ALL COLUMNS SIZE AUTO',
                                 granularity      => 'AUTO');
   COMMIT;
END;
/

– But for the 9% of your SQLs you may need to refine the stats collection method based on data distribution; manual histogram management and setting statistics etc.
– And for the remaining 1% of your SQLs sample size is irrelevant, you have to get assistance of options like hints, outlines and sql profiles on 10g. These options must be preferred only as a last resort.
– Do not fall into the parameter change trap;
step 1) Bad plan discovered on Query A,
step 2) Optimizer parameter changed to fix Query A,
step 3) This change causes bad plan discovery on Query B,
step 4) than another change on optimizer parameter to fix Query B brings you back to step 1 :)

Use 10046 and 10053 events for detail analysis of CBO decisions, changing the value of an optimizer parameter for a query will cause new problems.

B. Ensuring representive stats to guarantee good performance

To determining the non-representive stats basic analysis are;
Step 1) Check if the rows estimate from explain plan and v$sql_plan_statistics_all are correct,
a. V$SQL_PLAN_STATISTICS_ALL columns to compare;
CARDINALITY = optimizer row estimate
LAST_OUTPUT_ROWS = actual row count
b. e-rows and a-rows columns in the explain plan taken with below steps;

-- If you want to call this function, you need access to several of the dynamic performance views -v$session, v$sql, v$sql_plan and v$sql_plan_statistics_all
-- set serveroutput off it is enabled
-- The hint /*+ gather_plan_statistics */ enables collection of rowsource execution statistics for the statement
-- Reference : http://jonathanlewis.wordpress.com/2006/11/09/dbms_xplan-in-10g/
--
set serveroutput off 
select /*+ gather_plan_statistics */ ...
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

Step 2) Check if the rows estimate are correct in all_[tab|part]_col_statistics,
Step 3) Check if the sample size large enough to give accurate row estimates,
Step 4) Gather more accurate stats or use dbms_stats.set_…._stats to set the calculated accurate stats and start from the step 1.

C. How to capture SQL workload against schema

1. DML monitoring information is collected and can be monitored at [DBA|ALL|USER]_TAB_MODIFICATIONS;

– This information is needed by dbms_stats to identify the objects with “stale” statistics,
– This option is eabled by default after 10g, but for 9i it must be manually set

alter table table-name monitoring;

– Tracked information can be queried from [DBA|ALL|USER]_TAB_MODIFICATIONS;

select table_name, partition_name, inserts, updates, deletes from all_tab_modifications;
TABLE_NAME PARTITION_NAME INSERTS UPDATES DELETES
---------------------------------------------------------
USERS USERS14 0 3328 0
ACCOUNT ACCOUNT23 23450 738 230
EMP EMP19 2409 390 0

– 9i and 10g use 10% change as the threshold to gather stale stats, this value is not hard-coded after 11g ad can be set with dbms_stats.set_table_prefs

2. Column usage history(predicate information) is also collected and can be monitored at sys.col_usage$;

– This information is again needed by dbms_stats to identify candidate columns on which to build histograms when method_opt=>’for …size auto’
– Tracked information can be queried from sys.col_usage$ and information is persistent after shutdown

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

References Used :
Oracle® Database Performance Tuning Guide 10g Release 2 (10.2)
Part IV Optimizing SQL Statements
http://structureddata.org/presentations/