Thursday, June 6, 2013

Beginning Performance Tuning

Beginning Performance Tuning

By Arup Nanda Oracle ACE Director

http://www.oracle.com/technetwork/issue-archive/2012/12-jul/o42dba-1566567.html

http://docs.oracle.com/cd/E11882_01/server.112/e16638/toc.htm

Resolve session performance issues in Oracle Database.
It’s the middle of the night, and you get a frantic call from someone complaining that the database is slow. The caller demands to know why—and what you’re going to do about it. Sound familiar? If it does, you are not alone. High performance is a common expectation of database system users: they get very unhappy when they don’t get it, and they are usually not shy about letting you know. What should you do next? In this article, you will learn some techniques for troubleshooting Oracle Database performance issues.
To use the scripts in this article, you need to create some tables in a test schema and access some dynamic performance views. The database user SYS has all privileges to access the views, so you need the password for the SYS user. The script for setting up the example tables is available in thesidebar.

Session State

Setup


To set up the test cases for this article, execute the SQL in this “Setup” section. The SQL assumes that you have access to the SYS user, you can create a user called ARUP (which means you don’t have a user with the same name), and you have a tablespace called USERS with at least 64 KB of free space.

Connect as SYS, and execute the following:

connect sys/<password> as sysdba

create user arup identified by arup
default tablespace users
/

alter user arup quota unlimited on users
/ 

-- now connect as arup
connect arup/arup

create table t1
(
 col1 number,
 col2 varchar2(1)
)
/
insert into t1 values
(1,’z’) 

/
commit
/
Before you start troubleshooting why a database is slow, you have to first understand that the database itself is never slow or fast—it has a constant speed. The sessions connected to the database, however, slow down when they hit a bump in the road. To resolve a session performance issue, you need to identify the bump and remove it. Fortunately, it’s very easy to do this in most cases. 
The first step to resolving a session performance issue is to ascertain what that database session is doing now. An Oracle Database session is always in one of three states:
  1. Idle. Not doing anything—just waiting to be given some work.
  2. Processing. Doing something useful—running on the CPU.
  3. Waiting. Waiting for something, such as a block to come from disk or a lock to be released.
If a session is waiting to be given work (idle), it’s really not slow at all—it just has nothing to do. If a session is waiting for some resource, such as a block or a lock, it has stopped processing. Until it gets that resource, the session will continue to wait. When it gets that resource, it does some processing and then moves on to the next resource it needs, waits for that to be available, and starts processing . . . and the cycle goes on until the session has nothing else to do. If it waits for resources often, the session will appear slow. But it’s not really slow—it’s just following a pattern of go, stop, go again, stop again, and so on. Your mission is to find and eliminate the “stop” issues in the session.
How difficult is it to get information about what’s causing the session to stop? It’s actually veryeasy: Oracle Database is instrumented to talk about what the database sessions are doing. All you need to do is to listen attentively or, more precisely, look for that information in the right place, and that place is a view called V$SESSION. Everything you need for your analysis is in this view.
To explain how to use the V$SESSION view, I will use a very common scenario—row locking—as an example. To follow along, first set up the previously mentioned tables as described in the online version of this article. Then connect as user ARUP from two different sessions. From the first session, issue the following SQL statement:

update t1
set col2 = 'x' where col1 = 1;

The output will show “1 row updated,” indicating that the row was updated. Do not issue a COMMIT after the statement. By not committing, you will force the session to get and hold a lock on the first row of the T1 table. Now go to the second session and issue the following SQL statement:

update t1
set col2 = 'y' 
where col1 = 1;

This statement will hang. Why? The answer is simple: the first session holds a lock on the row, which causes the second session to hang and the user to complain that the session is slow. To know what the second session is doing, the first thing you need to check is the STATE column in V$SESSION:

select sid, state
from v$session
where username = 'ARUP'; 

SID   STATE
————  ——————————————————
3346  WAITING
2832  WAITED KNOWN TIME

Study the output carefully. Session 3346 (in the SID column) indicates that it is waiting for something—and therefore not working. That should be your first clue that the session is experiencing one of those performance bumps in the road. But before you can determine what the session is waiting for, let’s study the state of session 2832 in the output, which shows that it waited for some known amount of time earlier. The important point is that session 2832 is not waiting for anything right now, meaning that it’s working productively.
Next, let’s see what the second session (3346) is waiting for. That information is readily available in the EVENT column in the same V$SESSION view. The EVENT column not only shows an event a session is waiting for currently, but also shows an event a session has waited for earlier. The query against V$SESSION in Listing 1 displays information from the EVENT column for both sessions.
Code Listing 1: Query for displaying sessions, session state, and events

select sid, state, event
from v$session
where username = 'ARUP'; 

SID   STATE              EVENT
————— —————————————————  ————————————————————————————
2832  WAITED KNOWN TIME  SQL*Net message from client
3346  WAITING            enq: TX - row lock contention

The output in Listing 1 shows that session 3346 is waiting right now for an event: “enq: TX – row lock contention”—short for “enqueue for transaction-level lock on row” or, in plain English, a row-level lock. The session is waiting because it wants to lock one or more rows, but another session has already placed locks on the row or rows. Unless that other session commits or rolls back its transaction, session 3346 will not get the lock it needs and will have no choice but to wait. On the other hand, the state of session 2832, “WAITED KNOWN TIME,” means that it is working—not waiting—right now. It was, however, waiting earlier for an event called “SQL*Net message from client” (I will discuss this specific event later). There is one very important lesson in these results: you cannot look at the EVENT column alone to find out what the session is waiting for. You must look at the STATE column first to determine whether the session is waiting or working and then inspect the EVENT column.
After you determine that a session is waiting for something, the next thing you need to find out is how long the session has been waiting. A very long wait usually indicates some sort of bottleneck. Where can you get information on the length of the waiting period? The answer is right there in the V$SESSION view, in the SECONDS_IN_WAIT column.
Getting the amount of time a session has been waiting makes sense for sessions that are waiting right now, but what about the sessions that are working now? Recall that the EVENT column shows not only the event a session is experiencing now but also the last wait event the session has experienced. Another column—WAIT_TIME—in the same V$SESSION view shows how long that wait lasted. (Note that WAIT_TIME is shown in centiseconds [hundredths of a second].)
Now that you know how to get information on the sessions waiting and working, let’s put all the information together in a single query, shown in Listing 2. It clearly shows the state of the sessions: whether they are working or waiting; if they are working, what they were waiting for earlier and for how long; and if they are waiting, what for and for how long.
Code Listing 2: Query for displaying sessions, session state, and wait details

col "Description" format a50
select sid,
        decode(state, 'WAITING','Waiting',
                'Working') state,
        decode(state,
                'WAITING',
                'So far '||seconds_in_wait,
                'Last waited '||
                wait_time/100)||
        ' secs for '||event
        "Description"
from v$session
where username = 'ARUP';

Output:

SID   STATE       Description
————— ——————————  ———————————————————————————————————————————————————————
2832  Working     Last waited 2029 secs for SQL*Net message from client
3346  Waiting     So far 743 secs for enq: TX - row lock contention
4208  Waiting     So far 5498 secs for SQL*Net message from client

Idle Event

Note the details of session 4208 in Listing 2; it’s currently waiting 5,498 seconds for a “SQL*Net message from client” event. Recall from the previous section that an Oracle Database session can be in one of the three states: working, waiting for a resource, or waiting for work. But how can a session determine whether it is idle? It will expect to be given work by clients via SQL*Net, but there is no way for it to know in advance if any work is coming from the clients. All it can do is wait for some instruction coming through SQL*Net. Until then, it will have nothing else to do but eagerly stare at the SQL*Net interface, and this condition is reported as “SQL*Net message from client” in the V$SESSION view’s EVENT column, which is practically the same thing as just being idle.
You can disregard another EVENT column value, “rdbms ipc message,” because it is an event status for sessions that are idle. Note that an idle session does not show IDLE as the STATE column value; it still shows “Waiting.” You have to check the EVENT column to determine whether the session is truly idle.
You may be tempted to modify the query in Listing 2 to filter sessions that include the “SQL*Net message from client” and “rdbms ipc message” idle events. Although you can do that, I highly discourage doing that, for multiple reasons. First, not all instances of the “SQL*Net message from client” event indicate that the session is idle. Consider the possibility that the network might be truly slow, in which case the session will also wait for these events. Remember, the session doesn’t have the ability to determine whether the client is truly idle or is sending instructions that are slow or stuck in the network. All it can do is wait, and it will wait with the “SQL*Net message from client” event. Second, idle events may provide some clues to Oracle Support about what else is going on inside a session. So I recommend displaying these “idle” EVENT values.

Diagnosis of Locking

The output of Listing 2 provides enough information to enable you to make a diagnosis about the performance of these three sessions. Session 4208 is idle, so any complaints that session 4208 is slow just aren’t related to the database. Any performance issues related to this session could be related to a bug in the code that’s going through an infinite loop or high CPU consumption on the application server. You can redirect the performance troubleshooting focus toward the application client.
The story of session 3346 is different. This session is truly a bottleneck to the application. Now that you know why this session appears slow—it is waiting for a row lock—the next logical question is which session holds that lock. The answer is also found in—I hope you guessed it—the V$SESSION view, in, more specifically, the BLOCKING_SESSION column. (Note that in an Oracle Real Application Clusters [Oracle RAC] environment, the blocking session may exist in a different instance. In such a case, the blocking instance is displayed in the V$SESSION view’s BLOCKING_INSTANCE column.)
You can find out the blocking session and instance by issuing the following SQL statement:

select 
  blocking_session B_SID,
  blocking_instance B_Inst
from v$session
where sid = 3346;

B_SID   B_INST
——————  ———————
 2832      1

The output shows clearly that SID 2832 is holding the lock that SID 3346 is waiting for. Now you can follow a cause/effect relationship between the session in which an update to a row is being blocked and the session that holds the lock on that row.
You can find the specific row that is locked by first finding the table containing that row. To find that table, use the same V$SESSION view; in this case, the information is in the ROW_WAIT_OBJ# column, which shows the object number of the table whose row is being locked. You can then get the name of the table from the DBA_OBJECTS view, using this object number, as shown in Listing 3.
Code Listing 3: Getting row lock information

select row_wait_obj#,
       row_wait_file#,
       row_wait_block#,
       row_wait_row#
from v$session 
where sid = 3346;

ROW_WAIT_OBJ#  ROW_WAIT_FILE#  ROW_WAIT_BLOCK#  ROW_WAIT_ROW#
—————————————  ——————————————  ———————————————— ——————————————
241876         1024            2307623          0

To get the object information:

select owner, object_type, object_name, data_object_id
from dba_objects
where object_id = 241876;

OWNER  OBJECT_TYPE  OBJECT_NAME   DATA_OBJECT_ID
—————  ———————————— ————————————  ——————————————
ARUP   TABLE        T1                    241877

The output shows that some row in the T1 table is the point of the row lock contention. But which specific row is locked? That data is available in three V$SESSION view columns—ROW_WAIT_FILE#, ROW_WAIT_BLOCK#, and ROW_WAIT_ROW#—which show the relative file ID, the block ID in that file, and the row’s slot number inside that block, respectively, for that specific row. Using this information, you can identify the ROWID of the row. The ROWID, the physical address of every row in an Oracle Database instance, can be used to uniquely identify a row.
Listing 4 shows a SQL script that enables you to select the specific blocking row from the table with the information gathered so far. Save this script in a file named rowinfo.sql. The script expects the input in the following order: owner, table name, object#, file#, block#, and row#. You can call this script and pass all the requested parameters by copying and pasting the corresponding output from Listing 3.
Code Listing 4: Finding the row information

REM Filename: rowinfo.sql
REM This shows the row from the table when the
REM components of ROWID are passed. Pass the
REM following in this exact order
REM  1. owner 
REM  2. table name 
REM  3. data_object_id
REM  4. relative file ID 
REM  5. block ID
REM  6. row Number 
REM
select *
from &1..&2
where rowid =
        dbms_rowid.rowid_create (
                rowid_type      =>  1, 
                object_number   => &3,
                relative_fno    => &4,
                block_number    => &5,
                row_number      => &6
        )
/

SQL> @rowinfo ARUP T1 241877 1024 2307623 0

COL1  C
————— —
  1   x

The output in Listing 4 shows the specific row on which a lock is being requested but that is locked by another session. So far you have identified not only the source session of the locking but the specific row being locked as well.
Is it possible that the session holding the lock (SID 2832) is somehow disconnected from the client? That can occur in connection pools or when users access the database with thick-client tools such as Oracle SQL Developer. After you identify the session holding the lock, you may want to wait until it commits or rolls back the transaction. Either action releases the lock.
In the case of a dead connection, you may alternatively decide to kill the session, which will force a rollback releasing the locks held by the blocking session and enabling the waiting sessions to continue. Occasionally the problem can be pretty simple: for instance, someone issued an UPDATE statement from a thick-client tool but forgot to commit and thus caused every session to wait for those updated rows. Identifying that blocking session enables you to send a gentle reminder to rectify that situation immediately.

More on the Session

In many troubleshooting situations, just knowing the SID of each session is not enough. You may need to know other details, such as the client machine the session is connecting from, the user (of both the database and the operating system), and the service name. All of this information is also readily available in the same V$SESSION view you have been using. Let’s briefly examine the columns that provide that information, by running the script shown in Listing 5.
Code Listing 5: Sessions from a specific user

select SID, osuser, machine, terminal, service_name, 
       logon_time, last_call_et
from v$session
where username = 'ARUP'; 

SID   OSUSER  MACHINE   TERMINAL  SERVICE_NAME  LOGON_TIME LAST_CALL_ET
————— ——————  ———————   ————————  ————————————  —————————— ————————————
3346  oradb   prodb1    pts/5     SYS$USERS     05-FEB-12          6848
2832  oradb   prodb1    pts/6     SERV1         05-FEB-12          7616
4408  ANANDA  ANLAP     ANLAP     ADHOC         05-FEB-12             0

OSUSER. The operating system user as which the client is connected. The output indicates that session 4408 is connected from the ANLAP machine, where a Windows user, ANANDA, has logged in.
MACHINE. The name of the machine where the client is running. This could be the database server itself. For two of the sessions, the machine name shows up as “prodb1.” Session 4408 runs on a different machine—ANLAP—presumably a laptop.
TERMINAL. If the session is connected from a UNIX server, this is the terminal where it runs.
LOGON_TIME. This shows when the session was first connected to the Oracle Database instance.
Using the columns shown in Listing 5, you can get very detailed information on a user’s sessions.
Suppose you receive a complaint that the applications running on the application server named appsvr1 are experiencing performance issues. Listing 6 shows a query against the V$SESSION view—including columns you’ve used in previous queries in this article—for the sessions connected from that machine and the output.
Code Listing 6: Session waits for a specific machine

col username format a5
col program format a10
col state format a10
col last_call_et head 'Called|secs ago' format 999999
col seconds_in_wait head 'Waiting|for secs' format 999999
col event format a50
select sid, username, program,
        decode(state, 'WAITING', 'Waiting',
                'Working') state,
last_call_et, seconds_in_wait, event
from v$session
where machine = 'appsvr1'
/
                                       Called      Waiting
SID   USERNAME  PROGRAM       STATE    secs ago    for secs   EVENT 
————— ———————   ———————————   ———————  —————————   ————————   ——————————————————
2832  ARUP      sqlplus.exe   Waiting       152         151   SQL*Net message 
                                                              from client
3089  ARUP      sqlplus.exe   Waiting       146         146   enq: TX - row lock 
                                                              contention
3346  ARUP      sqlplus.exe   Working        18          49   SQL*Net message 
                                                              from client

From the output, you can easily see that three sessions are connected from the appsvr1 application server. All of them are running SQL*Plus (as shown in the PROGRAM column). SID 3346 is the only one that is working (indicated by “Working” in the STATE column). Because it’s working, the EVENT column shows the last time the session waited. The wait time in this case is meaningless, because the session is not waiting but actually working. The “Called secs ago” column (representing the “last_call_et” column in V$SESSION) displays 18, which means that the session made a SQL call 18 seconds ago.
The other sessions are waiting. SID 3089 is waiting for a row lock. From the output, you can see that the session has been waiting for 146 seconds and that it also made its last SQL call 146 seconds ago. This indicates that the session has been waiting for that particular lock ever since it made that SQL call.
Finally, session 2832 is also waiting; in this case, it is waiting with a “SQL*Net message from client” event, which means it is idle, waiting to be given some work. The session issued its last SQL statement 152 seconds ago and has been idle for 151 seconds.
Armed with this information, you can diagnose performance issues very accurately. You can tell the complaining user that of the three sessions connected from the appsvr1 application server, one session is idle, one is working, and one is waiting for a lock. The user is probably referring to the slowness of this last session. Now you know the reason and how you can rectify it.

Getting the SQL

Another key piece of performance tuning information is the SQL statement a session is executing, which will provide more insights into the workings of the session. The same V$SESSION view also shows the SQL statement information. The SQL_ID column in the V$SESSION view shows the ID of the last SQL statement executed. You can get the text of that SQL statement from the V$SQL view, using the SQL_ID value. Here is an example of how I have identified the SQL statement executed by the session that appears slow to the user.

select sql_id
from v$session
where sid = 3089;

SQL_ID
—————————————————
g0uubmuvk4uax

set long 99999
select sql_fulltext
from v$sql
where sql_id = 'g0uubmuvk4uax';
SQL_FULLTEXT
————————————————————————————————————————
update t1 set col2 = 'y' where col1 = 1

Data Access Issues

I have used row-level locking as the cause of a slowdown in this article. Although locking-related contention is a very common cause, it is not the only cause of performance problems. Another major cause of contention is disk I/O. When a session retrieves data from the database data files on disk to the buffer cache, it has to wait until the disk sends the data. This wait shows up for that session as “db file sequential read” (for index scans) or “db file scattered read” (for full-table scans) in the EVENT column, as shown below:

select event
from v$session
where sid = 3011;

EVENT
—————————————————————————
db file sequential read

When you see this event, you know that the session is waiting for I/O from the disk to complete. To make the session go faster, you have to reduce that waiting period. There are several ways to reduce the wait:
  1. Reduce the number of blocks retrieved by the SQL statement. Examine the SQL statement to see if it is doing a full-table scan when it should be using an index, if it is using a wrong index, or if it can be rewritten to reduce the amount of data it retrieves.
  2. Place the tables used in the SQL statement on a faster part of the disk.
  3. Consider increasing the buffer cache to see if the expanded size will accommodate the additional blocks, therefore reducing the I/O and the wait.
  4. Tune the I/O subsystem to return data faster.

Next Steps

There are other options as well, but the preceding ones are the most common remediation techniques. The exact activity you undertake depends on your specific situation, but the first technique—reducing the number of blocks retrieved by a SQL statement—almost always works.

When you think about tuning to reduce the number of blocks, you can look at the SQL statement to see which table is being selected from. But what if you see two or more tables in the statement? How do you determine which table is causing the wait?
To find the table causing a wait, you will again use the V$SESSION view. The view’s P1 and P2 columns provide information about the segment the session is waiting for. Listing 7 shows a query of P1 and P2, and the output.
Code Listing 7: Checking data access waits

select SID, state, event, p1, p2
from v$session
where username = 'ARUP'; 

SID  STATE     EVENT                   P1 P2
———— ———————   ——————————————————————— —— ————
2201 WAITING   db file sequential read  5 3011

The P1 column shows the file ID, and the P2 column shows the block ID. From that information in the result in Listing 7, you can get the segment name from the extent information in DBA_EXTENTS, as shown below:

select owner, segment_name
from dba_extents
where file_id = 5
and 3011 between block_id 
and block_id + blocks;

OWNER  SEGMENT_NAME
—————— —————————————
ARUP   T1

This shows that the T1 table, owned by ARUP, is being selected from by the disk in the session. You should direct your attention to this table for tuning. You can move the table to a high-speed disk for faster I/O, or, alternatively, you can focus on making I/O in this table faster by making changes that affect this table, such as creating new indexes, creating materialized views, or building a result cache.

Conclusion

In summary, this article presented the following steps for starting a successful performance tuning session:
  1. Check whether the session is working or waiting. If the latter, determine what it is waiting for and how long it has been waiting.
  2. Compare the waiting period of the session with how long ago it issued a SQL call.
  3. If the cause of the wait is a lock contention, find the session holding the lock and get the details of the session. (If the session holding the lock is an orphan session, you may want to kill it to release the lock.)
  4. Find the SQL statement the session is executing.
  5. If the session is waiting for I/O, find out which segment (table, materialized view, index, and so on) the I/O is waiting for.
The techniques presented in this article will help you resolve about 20 percent of the performance issues you encounter as a DBA. Oracle Database is instrumented to provide information on its inner workings so that you can zero in on the exact cause of an issue—all you have to do is listen.

I sincerely hope that this article has helped you realize how simple it is to diagnose some common but seemingly thorny performance issues in Oracle Database by identifying the right information sources. Happy tuning!

No comments:

Post a Comment