Analyzing and interpreting what causes sessions to wait is an important method to determine where time is spent. In Oracle RAC, the wait time is attributed to an event which reflects the exact outcome of a request. For example, when a session on an instance is looking for a block in the global cache, it does not know whether it will receive the data cached by another instance or whether it will receive a message to read from disk.
The wait events for the global cache convey precise information and waiting for global cache blocks or messages is:
■ Summarized in a broader category called Cluster Wait Class
■ Temporarily represented by a placeholder event which is active while waiting for a block,
for example:
■ gc current block request
■ gc cr block request
■ Attributed to precise events when the outcome of the request is known, for example:
■ gc current block 3-way
■ gc current block busy
■ gc cr block grant 2-way
In summary, the wait events for Oracle RAC convey information valuable for performance analysis. They are used in Automatic Database Diagnostic Monitor (ADDM) to enable precise diagnostics of the effect of cache fusion.
Monitoring Performance by Analyzing GCS and GES Statistics
In order to determine the amount of work and cost related to inter-instance messaging and contention, examine block transfer rates, remote requests made by each transaction, the number and time waited for global cache events as described under the following headings:
■ Analyzing the Effect of Cache Fusion in Oracle RAC
■ Analyzing Performance Using GCS and GES Statistics
Analyzing the Effect of Cache Fusion in Oracle RAC
The effect of accessing blocks in the global cache and maintaining coherency is represented by:
■ The Global Cache Service (GCS) statistics for current and cr blocks, for example,
gc current blocks received,
gc cr blocks received, and so on
■ The GCS wait events, for gc current block 3-way, gc cr grant 2-way, and so on.
The response time for cache fusion transfers is determined by the messaging and processing times imposed by the physical interconnect components, the IPC protocol and the GCS protocol. It is not affected by disk I/O factors other than occasional log writes. The cache fusion protocol does not require I/O to data files in order to guarantee cache coherency and Oracle RAC inherently does not cause any more I/O to disk than a nonclustered instance.
Analyzing Performance Using GCS and GES Statistics
This section describes how to monitor GCS performance by identifying data blocks and objects which are frequently used (hot) by all instances. High concurrency on certain blocks may be identified by GCS wait events and times.
The gc current block busy wait event indicates that the access to cached data blocks was delayed because they were busy either in the remote or the local cache. This could be caused by any of the following:
■ The blocks were pinned
■ The blocks were held up by sessions
■ The blocks were delayed by a log write on a remote instance
■ A session on the same instance was already accessing a block which was in transition between instances and the current session needed to wait behind it (for example, gc current block busy) Use the V$SESSION_WAIT view to identify objects and data blocks with contention. The GCS wait events contain the file and block number for a block request in p1 and p2, respectively. An additional segment statistic, gc buffer busy, has been added to quickly determine the busy objects without having to query the V$SESSION_WAIT view mentioned earlier. The AWR infrastructure provides a view of active session history which can also be used to trace recent wait events and their arguments. It is therefore useful for hot block analysis. Most of the reporting facilities used by AWR and Statspack contain the object statistics and cluster wait class category, so that sampling of the views mentioned earlier is largely unnecessary. It is advisable to run ADDM on the snapshot data collected by the AWR infrastructure to obtain an overall evaluation of the impact of the global cache. The advisory will also identify the busy objects and SQL highest cluster wait time.
Analyzing Cache Fusion Transfer Impact Using GCS Statistics
This section describes how to monitor GCS performance by identifying objects read and modified frequently and the service times imposed by the remote access. Waiting for blocks to arrive may constitute a significant portion of the response time, in the same way that reading from disk could increase the block access delays, only that cache fusion transfers in most cases are faster than disk access latencies. The following wait events indicate that the remotely cached blocks were shipped to the local instance without having been busy, pinned or requiring a log flush:
■ gc current block 2-way
■ gc current block 3-way
■ gc cr block 2-way
■ gc cr block 3-way
Note: Oracle recommends using ADDM and AWR. However, Statspack is available for backward compatibility. Statspack provides reporting only. You must run Statspack at level 7 to collect statistics related to block contention and segment block waits.
The object statistics for gc current blocks received and gc cr blocks received enable quick identification of the indexes and tables which are shared by the active instances. As mentioned earlier, creating an ADDM analysis will, in most cases, point you to the SQL statements and database objects that could be impacted by inter-instance contention. Any increases in the average wait times for the events mentioned in the preceding list could be caused by the following occurrences:
■ High load: CPU shortages, long run queues, scheduling delays
■ Misconfiguration: using public instead of private interconnect for message and block traffic If the average wait times are acceptable and no interconnect or load issues can be diagnosed, then the accumulated time waited can usually be attributed to a few SQL statements which need to be tuned to minimize the number of blocks accessed.
The column CLUSTER_WAIT_TIME in V$SQLAREA represents the wait time incurred by individual SQL statements for global cache events and will identify the SQL which may need to be tuned.
Analyzing Response Times Based on Wait Events
Most global cache wait events that show a high total time as reported in the AWR and Statspack reports or in the dynamic performance views are normal and may present themselves as the top database time consumers without actually indicating a problem. This section describes frequent wait events that you should be aware of when interpreting performance data. If user response times increase and a high proportion of time waited is for global cache, then you should determine the cause. Most reports include a breakdown of events sorted by percentage of the total time. It is useful to start with an ADDM report, which analyzes the routinely collected performance statistics with respect to their impact, and points to the objects and SQL contributing most to the time waited, and then moves on to the more detailed reports produced by AWR and Statspack. Wait events for Oracle RAC include the following categories:
■ Block-Related Wait Events
■ Message-Related Wait Events
■ Contention-Related Wait Events
■ Load-Related Wait Events
Block-Related Wait Events The main wait events for block-related waits are:
■ gc current block 2-way
■ gc current block 3-way
■ gc cr block 2-way
■ gc cr block 3-way
The block-related wait event statistics indicate that a block was received as either the result of a 2-way or a 3-way message, that is, the block was sent from either the resource master requiring 1 message and 1 transfer, or was forwarded to a third node from which it was sent, requiring 2 messages and 1 block transfer.
Message-Related Wait Events
The main wait events for message-related waits are:
■ gc current grant 2-way
■ gc cr grant 2-way
The message-related wait event statistics indicate that no block was received because it was not cached in any instance. Instead a global grant was given, enabling the requesting instance to read the block from disk or modify it. If the time consumed by these events is high, then it may be assumed that the frequently used SQL causes a lot of disk I/O (in the event of the cr grant) or that the workload inserts a lot of data and needs to find and format new blocks frequently (in the event of the current grant). Contention-Related Wait Events
The main wait events for contention-related waits are:
■ gc current block busy
■ gc cr block busy
■ gc buffer busy acquire/release
The contention-related wait event statistics indicate that a block was received which was pinned by a session on another node, was deferred because a change had not yet been flushed to disk or because of high concurrency, and therefore could not be shipped immediately. A buffer may also be busy locally when a session has already initiated a cache fusion operation and is waiting for its completion when another session on the same node is trying to read or modify the same data. High service times for blocks exchanged in the global cache may exacerbate the contention, which can be caused by frequent concurrent read and write accesses to the same data. The gc current block busy and gc cr block busy wait events indicate that the local instance that is making the request did not immediately receive a current or consistent read block. The term busy in these events’ names indicates that the sending of the block was delayed on a remote instance. For example, a block cannot be shipped immediately if Oracle Database has not yet written the redo for the block’s changes to a log file. In comparison to block busy wait events, a gc buffer busy event indicates that Oracle Database cannot immediately grant access to data that is stored in the local buffer cache. This is because a global operation on the buffer is pending and the operation has not yet completed. In other words, the buffer is busy and all other processes that are attempting to access the local buffer must wait to complete. The existence of gc buffer busy events also means that there is block contention that is resulting in multiple requests for access to the local block. Oracle Database must queue these requests. The length of time that Oracle Database needs to process the queue depends on the remaining service time for the block. The service time is affected by the processing time that any network latency adds, the processing time on the remote and local instances, and the length of the wait queue. The average wait time and the total wait time should be considered when being alerted to performance issues where these particular waits have a high impact.
Usually, either interconnect or load issues or SQL execution against a large shared working set can be found to be the root cause.
Load-Related Wait Events The main wait events for load-related waits are:
■ gc current block congested
■ gc cr block congested
The load-related wait events indicate that a delay in processing has occurred in the GCS, which is usually caused by high load, CPU saturation and would have to be solved by additional CPUs, load-balancing, off loading processing to different times or a new cluster node.For the events mentioned, the wait time encompasses the entire round trip from the time a session starts to wait after initiating a block request until the block arrives.
Thank you for giving your valuable time to read the above information.
If you want to be updated with all our articles send us the Invitation or Follow us:
Skant Gupta’s LinkedIn: www.linkedin.com/in/skantali/
Joel Perez’s LinkedIn: Joel Perez’s Profile
Anuradha’s LinkedIn: Anuradha’s Profile
LinkedIn Group: Oracle Cloud DBAAS
Facebook Page: OracleHelp