Oracle Business Intelligence and Extraction, Transformation, Loading(ETL) Part 1

1- Introduction to Parallel Execution

Oracle 7.1 introduced parallel query option in 1994. Usually a full scan is a good candidate for parallel query if you are not on I/O bounded system.

Also as percieved you do NOT need more than one CPU in order to use this option, in other words a degree of parallelism greater then the number of cpus on the system.

Consider an index build you read, sort and write. When you are reading, you are not using the CPU so if there were another process sorting that would be great. Likewise when you are sorting, you are not using the disk if someone else could be reading or writing, that would be great.

Parallel execution dramatically reduces response time for data-intensive operations on large databases that are typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Parallel execution improves processing for:
• Queries requiring large table scans, joins, or partitioned index scans
• Creation of large indexes
• Creation of large tables (including materialized views)
• Bulk inserts, updates, merges, and deletes
• Sorting large volumes of data
You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access large objects(LOBs).

You can use parallel execution for any of the following:
• Access methods: Some examples are table scans, index full scans, and partitioned index range scans.
• Join methods: Some examples are nested loop, sort merge, hash, and star transformation.
• DDL statements: Some examples are CREATE TABLE AS SELECT, CREATE INDEX, REBUILD INDEX, REBUILD INDEX PARTITION, and MOVE/SPLIT/COALESCE PARTITION. You can normally use parallel data definition language (DDL) where you use regular DDL. There are, however, some additional details to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB columns. All of these DDL operations can be performed in NOLOGGING mode for either parallel or serial execution. The CREATE TABLE statement for an index-organized table can be parallelized either with or without an AS SELECT clause. Different parallelism is used for different operations. The Parallel CREATE (PARTITIONED) TABLE… AS SELECT and parallel CREATE (PARTITIONED) INDEX statements run with a degree of parallelism equal to the number of partitions. Parallel operations require accurate statistics to perform optimally.
• DML statements: Some examples are INSERT AS SELECT, updates, deletes, and MERGE operations. Parallel DML (parallel insert, update, merge, and delete) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT … SELECT statements to insert rows into multiple tables as part of a single DML statement. You can normally use parallel DML where you use regular DML. Although DML normally includes queries, the term parallel DML refers only to inserts, updates, merges, and deletes done in parallel.
• Miscellaneous SQL operations: Some examples are GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, as well as aggregate and table functions.
• Parallel query: You can parallelize queries and subqueries in SELECT statements, as well as the query portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE).
• SQL*Loader: You can parallelize the use of SQL*Loader where large amounts of data are routinely encountered.

When to Implement and When NOT to Implement Parallel Execution

The benefits of parallel execution can be seen in DSS and data warehousing environments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple data manipulation language (DML) or SELECT statements that characterize OLTP applications would not see any benefit from being executed in parallel.

Parallel execution is NOT normally useful for:
• Environments in which the typical query or transaction is very short (a few seconds or less). This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may outweigh the benefits of parallelism.
• Environments in which the CPU, memory, or I/O resources are already heavily utilized. Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution does not yield any benefits and indeed may be detrimental to performance.

How Parallel Execution Works

Parallel execution divides the task of executing a SQL statement into multiple small units, each of which is executed by a separate process. The incoming data can be divided into parts (called granules). The user shadow process that is going to execute a query in parallel takes on the role as parallel execution coordinator or query coordinator. The query coordinator does the following:
• Parses the query and determines the degree of parallelism
• Allocates one or two sets of slaves (threads or processes)
• Controls the query and sends instructions to the PQ slaves
• Determines which tables or indexes need to be scanned by the PQ slaves
• Produces the final output to the user
At execution time, the coordinator also performs the parts of the plan that execute serially (such as accessing tables in serial if they are small or have no hint or degree of parallelism set). Ranging is also done serially to determine the ranges of keys to be distributed from producer slaves to consumer slaves who are sorting or otherwise must consume specific ranges of rows.

When using EXPLAIN PLAN with parallel queries, one parallel plan is compiled and executed. This plan is derived from the serial plan by allocating row sources specific to the parallel support in the Query Coordinator (QC) plan. The table queue row sources (PX Send and PX Receive), the granule iterator, and buffer sorts, required by the two slave set PQ model, are directly inserted into the parallel plan. This plan is the exact same plan for all the slaves if executed in parallel or for the QC if executed in serial.


SELECT SUM(salary) FROM emp2 GROUP BY department_id;

| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU) |    TQ  |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT         |          |   107 |  2782 |     3 (34)  |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |             |        |      |            |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001 |   107 |  2782 |     3 (34)  |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY         |          |   107 |  2782 |     3 (34)  |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE           |          |   107 |  2782 |     3 (34)  |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH        | :TQ10000 |   107 |  2782 |     3 (34)  |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY      |          |   107 |  2782 |     3 (34)  |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR |          |   107 |  2782 |     2 (0)   |  Q1,00 | PCWP |            |
|   8 |         TABLE ACCESS FULL| EMP2     |   107 |  2782 |     2 (0)   |  Q1,00 | PCWP |            |

The table EMP2 is scanned in parallel by one set of slaves while the aggregation for the GROUP BY is done by the second set. The PX BLOCK ITERATOR row source represents the splitting up of the table EMP2 into pieces so as to divide the scan workload between the parallel scan slaves. The PX SEND and PX RECEIVE row sources represent the pipe that connects the two slave sets as rows flow up from the parallel scan, get repartitioned through the HASH table queue, and then read by and aggregated on the top slave set. The PX SEND QC row source represents the aggregated values being sent to the QC (Query Coordinator) in random (RAND) order. The PX COORDINATOR row source represents the QC or Query Coordinator which controls and schedules the parallel plan appearing below it in the plan tree.

Degree of Parallelism (DOP)

The parallel execution coordinator may enlist two or more of the instance’s parallel execution servers to process a SQL statement. The number of parallel execution servers associated with a single operation is known as the degree of parallelism. A single operation is a part of a SQL statement such as an order by or full table scan to perform a join on a non-indexed column table. The degree of parallelism applies directly only to intra-operation parallelism. If inter-operation parallelism is possible, the total number of parallel execution servers for a statement can be twice the specified degree of parallelism. No more than two sets of parallel execution servers can run simultaneously. Each set of parallel execution servers may process multiple operations. Only two sets of parallel execution servers need to be active to guarantee optimal inter-operation parallelism.

Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quickly. When multiple users use parallel execution at the same time, it is easy to quickly exhaust available CPU, memory, and disk resources. The default DOP is used when you ask to parallelize an operation but you do not specify a DOP in a hint or in the definition of a table or index. The default DOP is appropriate for most applications.

If no parallel hints are used and there is no default degree of parallelism for the table in the dictionary:
• Execution for that table is serial
• When parallelism is enforced with the ALTER SESSION FORCE PARALLEL … command, the DOP for a SQL statement is determined by the value of the parameter CPU_COUNT. The value of CPU_COUNT is, by default, the number of CPUs on the system and the value of the PARALLEL_THREADS_PER_CPU parameter.
However, the actual number of processes that are used is limited by their availability on the requested instances during run time. The PARALLEL_MAX_SERVERS initialization parameter sets an upper limit on the total number of parallel execution servers that an instance can have.
If a minimum fraction of the desired parallel execution servers is not available (specified by the PARALLEL_MIN_PERCENT initialization parameter), a user error is produced. You can retry the query when the system is less busy.

Parallelization Rules for SQL Statements

A SQL statement can be parallelized if it includes a parallel hint or if the table or index being operated on has been declared PARALLEL with a CREATE or ALTER statement. In addition, a DDL statement can be parallelized with the PARALLEL clause. However, not all of these methods apply to all types of SQL statements. Parallelization has two components: the decision to parallelize and the DOP. These components are determined differently for queries, DDL operations, and DML operations.

To determine the DOP, the Oracle Server looks at the reference objects:
• Parallel query looks at each table and index in the portion of the query that is being parallelized to determine which is the reference table. The basic rule is to pick the table or index with the largest DOP.
• For parallel DML (INSERT, UPDATE, MERGE, and DELETE), the reference object that determines the DOP is the table being modified by an insert, update, or delete operation. Parallel DML also adds some limits to the DOP to prevent deadlock. If the parallel DML statement includes a subquery, the subquery’s DOP is the same as the DML operation.
• For parallel DDL, the reference object that determines the DOP is the table, index, or partition being created, rebuilt, split, or moved. If the parallel DDL statement includes a subquery, the subquery’s DOP is the same as the DDL operation.

Disabling Parallel Execution

You disable parallel SQL execution with an ALTER SESSION DISABLE PARALLEL DML|DDL|QUERY statement. All subsequent DML (INSERT, UPDATE, DELETE), DDL (CREATE, ALTER), or query (SELECT) operations are executed serially after such a statement is issued. They are executed serially regardless of any PARALLEL clause that is associated with the statement or any parallel attribute that is associated with the table or indexes involved.

The following statement disables parallel DDL operations:

You can also use the ALTER statement to change the PARALLEL state of tables and indexes to NOPARALLEL.

Hints for Parallel Execution

PARALLEL: The PARALLEL hint lets you specify the desired number of concurrent servers that can be used for a parallel operation. The hint applies to the SELECT, INSERT, UPDATE, and DELETE portions of a statement as well as to the table scan portion.
NO_PARALLEL: The NO_PARALLEL hint overrides a PARALLEL specification in the table clause.
PQ_DISTRIBUTE: The PQ_DISTRIBUTE hint improves the performance of parallel join operations. The optimizer ignores the distribution hint if both tables are serial.
PARALLEL_INDEX: The PARALLEL_INDEX hint specifies the desired number of concurrent servers that can be used to parallelize index range scans for partitioned indexes.
NO_PARALLEL_INDEX: The NO_PARALLEL_INDEX hint overrides a PARALLEL attribute setting on an index to avoid a parallel index scan operation.

So what is the best technique to determine the degree of parallelism(DOP) to use?

Mr.Kyte‘s answer to this question is; “don’t, let the database do it. automatic tuning (in 9i, the default in 10g) and adaptive multiuser. Just make things “parallel” or use the “parallel hint” without a degree and let the database do it. If you have the resources, it’ll use them. If you have too many concurrent users to do a parallel 8 query, it’ll downgrade it to parallel 4 or maybe just serial.”

Another bad habit is, with hints you may use lots of parallel servers but this will not reduce responce time, usually syncronization betwwen the threads will result more waits and responce time to grow. Before starting a parallel query in the database you can check the available parallel servers by this queries;

-- how many parallel servers can be opened
FROM v$parameter
WHERE NAME LIKE ‘%paral%max%’;

-- how many parallel servers are being used by whom
SELECT a.qcsid, a.qcserial#, y.osuser, COUNT(*)
FROM v$px_session a, v$session y
WHERE y.sid = a.qcsid
AND y.serial# = a.qcserial#
GROUP BY a.qcsid, a.qcserial#, y.osuser ;

Oracle Database versions 8.1.5 and later we can also parallelize pl/sql functions. The PARALLEL_ENABLE keyword is the preferred way to mark your code as safe for parallel execution. This keyword is syntactically similar to DETERMINISTIC and it is placed after the return value type in a declaration of the function:

RETURN P1 * 2;

As a conclusion, to benefit when using parallelism option there are some important considerations, “Parallel <> fast=true, never has been, never will.” Thomas Kyte

Continue reading with Part 2

References Used :
Oracle® Database Application Developer’s Guide – Fundamentals 10g Release 2 (10.2) Chapter 7 Coding PL/SQL Procedures and Packages Parallel Query and Parallel DML
Oracle® Database Concepts 10g Release 2 (10.2) Chapter 16 Business Intelligence Overview of Parallel Execution
Oracle® Database Performance Tuning Guide 10g Release 2 (10.2) Chapter 19 Using EXPLAIN PLAN Viewing Parallel Queries with EXPLAIN PLAN
“Performance in Parallel query” thread on Asktom
“Parallel Query concerns” thread on Asktom


  1. Tuning Parallel Execution by Doug Burns

    Click to access px.pdf

  2. from asktom –

    “Here is a short quote from my book Effective Oracle by Design on this topic:

    Parallel Query

    Parallel query is suitable for a certain class of large problems: very large problems
    that have no other solution. Parallel query is my last path of action for solving a
    performance problem; it’s never my first course of action.

    Parallel Query Settings

    I will not discuss the physical setup of parallel query operations. That topic is well
    covered in both the Oracle Concepts Guide and Data Warehousing Guide. As I mentioned
    earlier, my current favorite way to set up parallelism in Oracle is using the automatic
    tuning option first introduced in Oracle8i Release 2 (version 8.1.6):
    PARALLEL_AUTOMATIC_TUNING = TRUE. With this setting, the other parallel settings are
    automatically set. Now, all I need to do is set the PARALLEL option on the table (not
    PARALLEL , just PARALLEL) and Oracle will, when appropriate, parallelize certain
    operations on that table for me. The degree of parallelism (how many processes/threads
    will be thrown at a problem) will be decided for me and vary over time as the load on the
    system varies. I have found that, for most cases, this achieves my desired goal, which is
    usually to get the best performance, with the least amount of work, in a manner that is
    most manageable. Setting a single parameter is a great way to get there.

    For the novice user wanting to play with parallel query for the first time, parallel
    automatic tuning is a good way to get started. As you develop an understanding of what
    parallel query does and how it does it, try tweaking some of the other parallel settings:

    PARALLEL_ADAPTIVE_MULTI_USER Controls whether the degree of parallelism should vary
    over time as the load on the system does; should the algorithm for assigning resources
    “adapt” to the increase in load.
    PARALLEL_EXECUTION_MESSAGE_SIZE Sets the size of the message buffers used to pass
    information back and forth between the processes executing the parallel query.
    PARALLEL_INSTANCE_GROUP Applies only to Oracle RAC configurations (Oracle Parallel
    Server, OPS, in Oracle8i and earlier). Allows you to restrict the number of instances
    that will be used to perform a parallel operation (as opposed to the number of processes
    an instance will use).
    PARALLEL_MAX_SERVERS Sets the maximum number of parallel query slaves (like dedicated
    servers but for parallel operations) your instance will ever have.
    PARALLEL_MIN_PERCENT Useful if you would like to receive an error message when you
    request a specific degree of parallelism but insufficient resources exist to satisfy that
    request. You might use this to ensure that a process that takes days unless it gets what
    it wants doesn’t run unless it gets what it wants.
    PARALLEL_MIN_SERVERS Sets the number of servers to start when the instance starts and
    to keep started permanently. Otherwise, you may need to start the parallel processes in
    response to a query.
    PARALLEL_THREADS_PER_CPU Determines the default degree of parallelism and contributes
    to the adaptive parallelism algorithms, to determine when to back off on the amount of
    parallel resources.
    RECOVERY_PARALLELISM For crash recovery, sets how many parallel threads should be
    used. This setting can be used to speed up the recovery from an instance crash.
    FAST_START_PARALLEL_ROLLBACK Sets how many processes would be available to perform a
    parallel rollback after the recovery takes place. This would be useful on systems where
    many long-running transactions are constantly processing (which would need
    correspondingly long rollback times in the event of an instance crash).”

Leave a Comment

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s