Monday, 27 June 2011

Buffer Busy Waits Tuning

Recently I was involved in resolving an ongoing degrading performance issue for an application. It was at a state whereby the users were just frustrated and the business was not happy at all.
When it ended up in my court, the first thing I did was to identify the window when the user experience was really bad. Hence, I had to liaise with app vendors to undertand the application a bit and what else was running on it.
It was finally understood that there is a small etl job that runs on the database during business hours regularly. It was one of the business requirments to fulfill a reporting need. Hence, stopping that etl process was out of the question. Also, the application performance slows down when the etl kicks off and the response time gets better when the etl is finished. Also, the etl used to run 4 times in the business hours.
After analysing the statspack report of the database since it was 9i, I found that the top wait event when in the slowness window was 'Buffer Busy Waits' apart from IO:

Top 5 Timed Events
~~~~~~~~~~~~~~~~~~                                                     % Total
Event                                               Waits    Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read                         2,198,322      13,940    40.34
buffer busy waits                               1,273,203       9,852    28.51
CPU time                                                        4,241    12.27
enqueue                                             1,170       3,414     9.88
latch free                                         97,147       1,542     4.46

Also the buffer busy waits are not in the top 5 wait events when performance issue is not there.
I tried to find whether there was a link when users were complaining about database performance and buffer busy waits and I did find a link. Here is an output of my query from statspack to check this trend:

SNAP_TIME         CLASS                WAIT_COUNT         TIME
----------------- -------------------- ---------- ------------
21-06-11 00:13:06 2nd level bmb                 1            3
21-06-11 00:13:06 data block               74,340      338,940
21-06-11 00:13:06 segment header               18           33
21-06-11 00:13:06 undo block                   42            1
21-06-11 00:43:05 data block               34,519      181,783
21-06-11 00:43:05 undo header                   2            5
21-06-11 01:13:09 data block                2,733        2,744
21-06-11 01:43:12 data block                  144          344
21-06-11 01:43:12 undo header                   2            0
21-06-11 02:13:16 data block                  187          208
21-06-11 02:43:20 data block                   70          155
21-06-11 03:13:25 data block              160,186      318,329
21-06-11 03:13:25 segment header                2            0
21-06-11 03:13:25 undo block                  170           25
21-06-11 03:43:28 data block               40,802       87,378
21-06-11 04:13:32 data block               21,974       19,604
21-06-11 04:13:32 segment header                1            2
21-06-11 04:43:36 data block                   17           47
21-06-11 05:13:40 data block                3,523        2,755
21-06-11 05:13:40 segment header                4            1
21-06-11 05:43:44 data block               15,238       11,911
21-06-11 05:43:44 segment header                6            5
21-06-11 05:43:44 undo header                   7            0
21-06-11 06:13:47 data block                  259          374
21-06-11 06:43:51 data block                  181          402
21-06-11 06:43:51 undo header                   6            2
21-06-11 07:13:56 data block               42,250       22,239
21-06-11 07:13:56 undo header                   3            0
21-06-11 07:43:59 data block              158,719       91,015
21-06-11 08:14:04 1st level bmb                 3            1
21-06-11 08:14:04 2nd level bmb                 1            3
21-06-11 08:14:04 data block              542,143      761,181
21-06-11 08:14:04 segment header                9           14
21-06-11 08:14:04 undo block                   16            2
21-06-11 08:14:04 undo header                  73            0
21-06-11 08:44:09 data block             ########    1,109,580
21-06-11 08:44:09 segment header                2            5
21-06-11 08:44:09 undo block                   65            8
21-06-11 08:44:09 undo header                   9           10
21-06-11 09:14:12 data block             ########      930,479
21-06-11 09:14:12 segment header                3            4
21-06-11 09:14:12 undo block                   20            1
21-06-11 09:14:12 undo header                   4            6
21-06-11 09:44:19 data block             ########      981,900
21-06-11 09:44:19 segment header                1            3
21-06-11 09:44:19 undo block                  115           53
21-06-11 09:44:19 undo header                  34            7
21-06-11 10:14:22 data block              982,942      701,688
21-06-11 10:14:22 segment header                6            7
21-06-11 10:14:22 undo block                   43           32
21-06-11 10:14:22 undo header                   3            5
21-06-11 10:44:26 data block              522,797      342,624
21-06-11 10:44:26 segment header                2            5
21-06-11 10:44:26 undo block                   14            0
21-06-11 10:44:26 undo header                   4            0
21-06-11 11:14:32 1st level bmb                 8            4
21-06-11 11:14:32 2nd level bmb                 6            3
21-06-11 11:14:32 data block              905,206    1,077,409
21-06-11 11:14:32 segment header               46           50
21-06-11 11:14:32 undo block                   62            9
21-06-11 11:14:32 undo header                   7            6
21-06-11 11:44:51 data block             ########    2,316,365
21-06-11 11:44:51 segment header                3           20
21-06-11 11:44:51 undo block                  186          293
21-06-11 11:44:51 undo header                   9           46
21-06-11 12:14:55 data block              153,829      130,860
21-06-11 12:14:55 undo block                   15           10
21-06-11 12:14:55 undo header                   2            1
21-06-11 12:44:59 data block              152,903       84,885
21-06-11 12:44:59 undo block                   13            0
21-06-11 12:44:59 undo header                   2            0
21-06-11 13:15:01 data block               37,928       37,543
21-06-11 13:15:01 segment header               10            4
21-06-11 13:15:01 undo header                   9            7
21-06-11 13:45:06 data block               28,441       12,470
21-06-11 13:45:06 undo header                   4            0
21-06-11 14:15:09 data block              595,511      339,329
21-06-11 14:15:09 undo block                   19            2
21-06-11 14:15:09 undo header                  27            7
21-06-11 14:45:16 data block               24,499       12,706
21-06-11 14:45:16 undo block                   83          130
21-06-11 14:45:16 undo header                  33           18
21-06-11 15:15:18 1st level bmb                 3            3
21-06-11 15:15:18 2nd level bmb                 2            3
21-06-11 15:15:18 data block             ########    2,149,819

With the above output, users were complaining about database performance when when the buffer busy waits was high. Also, the contention in on data blocks.
After running some monitoring script to find out which object was having this, it was identified that there was a massive table. Also, the database default block size was 32K meaning having alot of rows per block. To make it worse, the table was sitting on a local managed tablespace with manual segment space management. Another interesting finding during the analysis was that the reason code for buffer busy wait event was 300 (Block is being read by another session, and no other suitable block image was found, so we wait until the read is completed. This may also occur after a buffer cache assumed deadlock. The kernel can't get a buffer in a certain amount of time and assumes a deadlock. Therefore it will read the CR version of the block. ) It simply indicated we were having hot block contention.

Hence, my recommnedation to fix the issue was to create another 8K blocksize tablespace which had 'automatic segment space managment' and move the table with the issue to it. Once the migration was done, the buffer busy waits disappeared from the top 5 wait events and users were finding the application response time normal even when the etl was running. The business was happy as well.

The buffer busy wait events was also showing better results after the change:

SNAP_TIME         CLASS                WAIT_COUNT         TIME
----------------- -------------------- ---------- ------------
28-06-11 00:04:56 data block               40,959       90,487
28-06-11 00:04:56 undo block                    5            0
28-06-11 00:34:59 data block               59,194      179,264
28-06-11 00:34:59 undo block                   18            9
28-06-11 01:05:03 data block                1,459        2,088
28-06-11 01:35:06 data block                  145          451
28-06-11 02:05:11 data block                  235          503
28-06-11 02:35:14 data block                  144          516
28-06-11 03:05:19 data block               16,663       20,802
28-06-11 03:35:22 2nd level bmb                 1            4
28-06-11 03:35:22 data block              140,303      404,557
28-06-11 03:35:22 segment header                7           10
28-06-11 03:35:22 undo block                   39           58
28-06-11 03:35:22 undo header                   6           17
28-06-11 04:05:26 data block               31,410       24,654
28-06-11 04:35:29 data block                4,376        6,528
28-06-11 05:05:33 data block                9,393       14,982
28-06-11 05:05:33 segment header               12            8
28-06-11 05:05:33 undo block                    2            0
28-06-11 05:35:38 undo header                   2            0
28-06-11 06:05:41 data block                  266          138
28-06-11 06:35:45 data block              147,655      295,869
28-06-11 06:35:45 segment header                3            1
28-06-11 06:35:45 undo block                    8            0
28-06-11 07:05:49 data block                9,572       11,383
28-06-11 07:05:49 segment header                2            2
28-06-11 07:35:53 data block              141,267      280,172
28-06-11 07:35:53 undo block                    5            1
28-06-11 08:05:57 data block                1,239        1,913
28-06-11 08:05:57 segment header                2            2
28-06-11 08:05:57 undo header                   2            0
28-06-11 08:36:00 data block                8,436       15,255
28-06-11 08:36:00 segment header                3            2
28-06-11 08:36:00 undo header                  11            0
28-06-11 09:06:07 1st level bmb                 3            3
28-06-11 09:06:07 2nd level bmb                 2            0
28-06-11 09:06:07 data block              164,876      520,704
28-06-11 09:06:07 segment header               28           39
28-06-11 09:06:07 undo block                   18            3
28-06-11 09:06:07 undo header                   7            0
28-06-11 09:36:10 data block               15,731       45,259
28-06-11 09:36:10 undo block                    4            5
28-06-11 09:36:10 undo header                   1            2
28-06-11 10:06:19 data block               22,449       22,406
28-06-11 10:06:19 undo block                   11            6
28-06-11 10:06:19 undo header                  19           10
28-06-11 10:36:20 data block                9,059       14,184
28-06-11 10:36:20 segment header                1            4
28-06-11 10:36:20 undo block                    8            2
28-06-11 10:36:20 undo header                  28            1
28-06-11 11:06:24 data block               18,346       46,440
28-06-11 11:06:24 segment header                7            8
28-06-11 11:06:24 undo block                    6            0
28-06-11 11:06:24 undo header                   8            1
28-06-11 11:36:26 data block               14,429       15,896
28-06-11 11:36:26 undo block                    6            0
28-06-11 11:36:26 undo header                   9            3
28-06-11 12:06:32 data block                1,967        5,556
28-06-11 12:06:32 undo block                    3            0
28-06-11 12:06:32 undo header                   3            0
28-06-11 12:36:34 data block                1,992        3,037
28-06-11 12:36:34 undo block                    4            0
28-06-11 12:36:34 undo header                   1            5
28-06-11 13:06:40 1st level bmb                 8            4
28-06-11 13:06:40 2nd level bmb                 8            0
28-06-11 13:06:40 data block              183,334      517,622
28-06-11 13:06:40 segment header               34           43
28-06-11 13:06:40 undo block                    7            2
28-06-11 13:36:42 data block               26,026       28,859
28-06-11 13:36:42 undo header                   3            0
28-06-11 14:06:46 data block                2,078        3,912
28-06-11 14:06:46 segment header                1            3
28-06-11 14:06:46 undo block                    4            0
28-06-11 14:06:46 undo header                  15            2
28-06-11 14:36:52 data block                3,131        5,007
28-06-11 14:36:52 undo block                    3            0
28-06-11 14:36:52 undo header                  15            1
28-06-11 15:06:54 data block               20,727       92,361
28-06-11 15:06:54 segment header                8           24
28-06-11 15:06:54 undo block                   10            9
28-06-11 15:06:54 undo header                  23            7


No comments:

Post a Comment