Oracle High Availability Best Practices Summary

I have written on Oracle Maximum Availability Architecture(MAA) and 11g High Availability New Features before. Below is a quick reference prepared for 10g Release 2 single instance, RAC and standby instances;

Multiplex redo log files,
Use resumable space allocationa simple example,
Create at least two control files and multiplex them,
Enable Flashback Database,
Use Spfile,
Use Automatic Undo Management,
Use Automatic Segment Space Management,
Use Locally Managed Tablespace Management,
Use Locally Managed Temporary Tablespace Management,
Enable Archivelog Mode,
Use Flash Recovery Area,
Resynchronizing the Recovery Catalog and setting time long enough for CONTROL_FILE_RECORD_KEEP_TIME parameter(default is 7 days),
Specify a database default permanent and temporary tablespace other than SYSTEM and SYSAUX,
Enable Block Checking,
Use Auto-Tuned Checkpointing,
Use Database Resource Manager,
Log checkpoints to the alert.log,

..

All above practices, and much more of them you may comment, affect the performance, availability and mean time to recover(MTTR) of your database. For more information please read Oracle® Database High Availability Best Practices 10g Release 2 (10.2) Guide and Oracle® Database Backup and Recovery Advanced User’s Guide 10g Release 2 (10.2) Guide

Oracle Best Practices Part 5

Part 1 best practices –
1- Put your Database into ARCHIVELOG Mode
2- Comment on Tables and Columns similar to your PL/SQL or SQL Source Codes
3- Use DBMS_STATS for statistic collection
4- Using Recovery Manager(RMAN) for Backup and Recovery needs

Part 2 best practices –
5- Resumable Statements and Space Allocation
6- Maintenance of Global Partitioned Indexes
7- Default Tablespaces
8- PL/SQL Bulk Operations
9- Locking Issues
10- Oracle System Event Triggers
11- Autonomous Transactions

Part 3 best practice –
12- Learn what is already provided with PL/SQL language

Part 4 best practice –
13- Defining Application Services for Oracle Database 10g

14- Cost Based Optimizer Best Practices and Advised Strategy

A. When and how to gather statistics for the cost based optimizer?

1. When to gather statistics; some candidate situations are as follows;

– After new database creations,
– After new created and altered segments(tables, indexes, partitions, clusters etc.),
– After hardware upgrades like CPUs, I/O subsystem(gather system statistics),
– After migration from Rule based optimizer(RBO) to Cost based optimizer(CBO),
– After large amounts of data change(bulk operations, loads, purges etc.),
– After new high/low values for keys generated,
– After 10g Optimizer statistics are automatically gathered with the job GATHER_STATS_JOB. This job gathers statistics on all objects in the database which have Missing or Stale statistics,
– Also data dictionary and fixed tables need statistics after 10g.

2. How much to gather; using Sample Sizes are recommended;

– estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE optimal for uniform sizes and distributions of data but problems can occur with skewed data or if there are lot of nulls in the column,
– Smaller sample sizes cane be preferred for large objects to save time and resource consumption,
– Row sampling means full scan of data sampling rows where as block sampling reduces I/O.

3. How to keep the statistics up to date; proposed methodology summary at “A Practical Approach to Optimizer Statistics in 10g by Andrew Holdsworth Director of Real World Performance” is as follows;

– Mentioned best practices in A.1. and A.2. will work well for 90% of your SQLs. In the initial post of this series with part 3 “Use DBMS_STATS for statistic collection” I discussed the advantages of DBMS_STATS over ANALYZE for statistics collection. So prefer dbms_stats package over analyze command(we still need analyze for backward compatibility, validate structure(index_stats information also) and list chained rows options), here is an example of advised general stats gathering method;


BEGIN
   dbms_stats.gather_schema_stats(ownname          => USER,
                                  options          => 'GATHER AUTO',
                                  degree           => 4,
                                  estimate_percent => dbms_stats.auto_sample_size,
                                  cascade          => TRUE,
                                  method_opt       => 'FOR ALL COLUMNS SIZE AUTO',
                                  granularity      => 'AUTO');
   COMMIT;
END;
/

BEGIN
   dbms_stats.gather_table_stats(ownname          => USER,
                                 tabname          => 'EMPLOYEES',
                                 degree           => 4,
                                 estimate_percent => dbms_stats.auto_sample_size,
                                 cascade          => TRUE,
                                 method_opt       => 'FOR ALL COLUMNS SIZE AUTO',
                                 granularity      => 'AUTO');
   COMMIT;
END;
/

– But for the 9% of your SQLs you may need to refine the stats collection method based on data distribution; manual histogram management and setting statistics etc.
– And for the remaining 1% of your SQLs sample size is irrelevant, you have to get assistance of options like hints, outlines and sql profiles on 10g. These options must be preferred only as a last resort.
– Do not fall into the parameter change trap;
step 1) Bad plan discovered on Query A,
step 2) Optimizer parameter changed to fix Query A,
step 3) This change causes bad plan discovery on Query B,
step 4) than another change on optimizer parameter to fix Query B brings you back to step 1 :)

Use 10046 and 10053 events for detail analysis of CBO decisions, changing the value of an optimizer parameter for a query will cause new problems.

B. Ensuring representive stats to guarantee good performance

To determining the non-representive stats basic analysis are;
Step 1) Check if the rows estimate from explain plan and v$sql_plan_statistics_all are correct,
a. V$SQL_PLAN_STATISTICS_ALL columns to compare;
CARDINALITY = optimizer row estimate
LAST_OUTPUT_ROWS = actual row count
b. e-rows and a-rows columns in the explain plan taken with below steps;

-- If you want to call this function, you need access to several of the dynamic performance views -v$session, v$sql, v$sql_plan and v$sql_plan_statistics_all
-- set serveroutput off it is enabled
-- The hint /*+ gather_plan_statistics */ enables collection of rowsource execution statistics for the statement
-- Reference : http://jonathanlewis.wordpress.com/2006/11/09/dbms_xplan-in-10g/
--
set serveroutput off 
select /*+ gather_plan_statistics */ ...
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

Step 2) Check if the rows estimate are correct in all_[tab|part]_col_statistics,
Step 3) Check if the sample size large enough to give accurate row estimates,
Step 4) Gather more accurate stats or use dbms_stats.set_…._stats to set the calculated accurate stats and start from the step 1.

C. How to capture SQL workload against schema

1. DML monitoring information is collected and can be monitored at [DBA|ALL|USER]_TAB_MODIFICATIONS;

– This information is needed by dbms_stats to identify the objects with “stale” statistics,
– This option is eabled by default after 10g, but for 9i it must be manually set

alter table table-name monitoring;

– Tracked information can be queried from [DBA|ALL|USER]_TAB_MODIFICATIONS;

select table_name, partition_name, inserts, updates, deletes from all_tab_modifications;
TABLE_NAME PARTITION_NAME INSERTS UPDATES DELETES
---------------------------------------------------------
USERS USERS14 0 3328 0
ACCOUNT ACCOUNT23 23450 738 230
EMP EMP19 2409 390 0

– 9i and 10g use 10% change as the threshold to gather stale stats, this value is not hard-coded after 11g ad can be set with dbms_stats.set_table_prefs

2. Column usage history(predicate information) is also collected and can be monitored at sys.col_usage$;

– This information is again needed by dbms_stats to identify candidate columns on which to build histograms when method_opt=>’for …size auto’
– Tracked information can be queried from sys.col_usage$ and information is persistent after shutdown

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

References Used :
Oracle® Database Performance Tuning Guide 10g Release 2 (10.2)
Part IV Optimizing SQL Statements
http://structureddata.org/presentations/

Oracle Concepts and Architecture – Part 5

Part 1 subjects –

1- Flashback Versions Query, Drop Table and Recyclebin Management
2- How shared pool works in Oracle and binding example
3- The Storage Hierarchy Summary in an Oracle Database
4- Concurrency and Isolation Levels Concepts
5- Automated Checkpoint Tuning and Mean time to recover(MTTR) Concepts
6- Concurrency and Consistency Concepts
7- Latch and Lock Concepts

Part 2 subject –

8- Oracle Database Architecture: Overview

Part 3 subjects –

9- How Atomicity is implemented on Oracle
10- NULL values and Oracle

Part 4 subject –

11- Overview of Transaction Management Internals

12- Concurrency versus Locking Concepts, Understanding Lock Contention with Examples

“Performance is interested with avoiding unnecessary work on limited system resources, where as Scalability is interested with avoiding blocking others doing same operations(serialization). These need different attentions, especially second is very important for heavy loaded OLTP systems. Sometimes you may sacrifice in one to gain in other.” Memory Management and Latching Improvements in Oracle9i and 10g Presentation by Tanel Põder

Concurrency and Locking are like Fenerbahce(imitates concurrency here of course :) and Galatasaray(imitates locking :), they have an immortal competition. It is very critical to understand Oracle’s answers for building highly-concurrent OLTP applications, you paid a lot for it right :) I mentioned the importance of testing for concurrency before, here with this post I want to focus on some conditions which reduce concurrency.

In Oracle you can develop highly concurrent database applications because;
1- Readers do not wait for readers,
2- Writers do not wait for readers,
3- Writers only waits for writers if they attempt to change the same rows at the same time,
4- Oracle automatically locks data at the lowest level of restrictiveness.

Below example demonstrates how updates and inserts on same row creates blocking locks – Code Listing 147a – Locks on same row update and insert example

There are some situations to be aware of –

A. Below example demonstrates that an update on row 101 waits for an update on row 100 of hr.employees table – Code Listing 147b – Lock acquisition and blocking lock example

Because lock acquisition works under a FIFO policy the SX lock can not be granted until the S lock has been released in the example.

B. Below example demonstrates that locks are converted under certain circumstances – Code Listing 147c – Lock conversion and blocking lock example

First session in the example has an SX lock both parent and child tables, but for the third session this time it has a SS lock on the parent table. After the update statement on the first session SX lock on the child table is not allowed to convert it and it hangs. The third session did not release the SX lock in order to get an SSX one, a conversion takes place.

C. Below example demonstrates how bitmap index causes blocking locks on different rows – Code Listing 147d – Bitmap index blocking locks on different rows example

Bitmap indexes are best for OLAP. But for OLTP, because of this locking behavior this option is an important subject to concurrent testing. Also check this related reference Bitmap Indexes by Jonathan Lewis

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

Refences Used :
Oracle® Database Concepts 10g Release 2 (10.2)
Chapter 4 Transaction Management
Chapter 13 Data Concurrency and Consistency

Oracle Concepts and Architecture – Part 4

Part 1 subjects –

1- Flashback Versions Query, Drop Table and Recyclebin Management
2- How shared pool works in Oracle and binding example
3- The Storage Hierarchy Summary in an Oracle Database
4- Concurrency and Isolation Levels Concepts
5- Automated Checkpoint Tuning and Mean time to recover(MTTR) Concepts
6- Concurrency and Consistency Concepts
7- Latch and Lock Concepts

Part 2 subject –

8- Oracle Database Architecture: Overview

Part 3 subjects –

9- How Atomicity is implemented on Oracle
10- NULL values and Oracle

11- Overview of Transaction Management Internals

Transactions are one of the major features that set a database apart from a file system. Main purpose of them are to take the database from one consistent state to next, guaranteed that either all of your changes including various rules and checks that implement data integrity have been saved or none of them are saved when you commit your work. All statements are atomic.

I advice you also use Julian Dyke’s related great presentation in slide show mode during reading below simplified descriptions for much more deep informations on Transaction Internals with Oracle.

Overview of Transaction Management Internals

Undo segments are central to the database and they are used for both transaction rollback recovery and building cosistent read database operations.

a. A transaction begins when a slot is allocated in the transaction table at the head of an undo segment. The transaction identifier(txid=undo_segment_number.transaction_slot_id.SCN_wrap) is a pointer to this location.
b. Before a transaction iserts, updates or deletes a table row a Interested Transaction List(ITL) is allocated in the block containing the row. The ITL is used to mark the row as locked until the transaction is either committed or rollbacked. And the ITL contains the transaction identifier.
c. When the change is applied to the block undo information is also generated and is stored in the undo segment. The transaciton table slot contains a pointer to the undo information.

Read consistency

When a differet session wants to read the same block and the first trasaction has not been yet ended, the session reading the block(most possibly still in the cache) first finds the open ITL. It checks the transaction status by reading the undo segment header and finds that there is an active transaction still. This means the session must create a snapshot of the block and rollback all the changes made by the active transaction. This task is called making a consistent read(CR) copy of the block and is achieved by first cloning the block and then rolling back the latest changes by applying undo records from the udo segment. In this very simple explaination still CR requires I/O to data block, undo segment header and undo records.

Locking

This time lets suppose another session wants to change(update or delete) the row that was changed by the first transaction and the first trasaction has not been yet ended. This time when the session reading the block(most possibly still in the cache) first finds the open ITL. It checks the transaction status by reading the undo segment header and finds that there is an active transaction still, it starts to wait on the transaction table slot for the trasaction to complete(commit or rollback).

Commit and Delayed Block Cleanout(DLBC)

Now lets suppose the first active transaction is now committed, this is recorded immediately by marking the transaction table slot as inactive however the data block itself may not be updated until later meaning ITL in the block may remain open for some time after the commit. When Oracle read this block for some reason, the transaction table will be checked also, commit will be cofirmed and ITL this time is also closed. This behaviour is known as delayed block cleanout. Here if the undo segment has since been deleted the SCN recorded in undo$ is used to cofirm the commit.

Transaction Recover: Rollback – Process Crash(PMON) – Instance Crash(SMON)

When a rollback statement is issued the undo records for the current transaction in reverse order(latest first) is scanned and applied back. When rollback statement returns successfully this means all the block changes have been undone and the ITL has been cleared, there is no delay for rollback. This process alse generates new redo which are written to the redo logs.

If the server(shadow) process somehow crashes with an active transaction, PMON detects this and roll back the transaction. Event 10012 and 10246 can be set and used to monitor this.

If the instance crashes there is no chance to rollback the active transactions. The next time database is opened, SMON background process will do crash recovery and it will return the block to its pre-crash consistent state. Active transactions in the SYSTEM undo segment are immediately rolled back, where as active transactions in other undo segments are marked as “dead” and SMON scan these segments at a later time to perform rollback since the database need to start up as fast as possible. To list the dead transactions –
select * from x$ktuxe where ktuxecfl = ‘DEAD’ or
alter system dump undo header — here dead transactions are identified by having cflg = ‘0x10’

Problematic cases

Above operations need access to the undo segment header, undo records and related data blocks for ITL information. If any of these are corrupted or lost then operations will be affected. Event 10013 monitors transaction recovery during startup and Event 10015 dump undo segment header listed in undo$ before and after transaction recovery.

Also only as a last resort and with the assistance of Oracle support “_offline_rollback_segments” and “_corrupted_rollback_segments” hidden parameters can be used. When database opens a list of offline and corrupted undo segment numbers(USNs) are constructed. If you use these two hidden parameters you change the default behavþior mentioned above, the transaction table is not read during startup so the transacitons are not marked as dead or rolled back. The segments listed in the parameters are appeared as OFFLINE in dba_rollback_segs(undo$) and they can not be acquired for the new transactions. This is an unsupported state so media recovery is preferred instead.

Continue reading with part 5

Refences Used :
Oracle® Database Concepts 10g Release 2 (10.2)
Chapter 4 Transaction Management
Chapter 13 Data Concurrency and Consistency