Recently I was involved in resolving an ongoing degrading performance issue for an application. It was at a state whereby the users were just frustrated and the business was not happy at all.
When it ended up in my court, the first thing I did was to identify the window when the user experience was really bad. Hence, I had to liaise with app vendors to undertand the application a bit and what else was running on it.
It was finally understood that there is a small etl job that runs on the database during business hours regularly. It was one of the business requirments to fulfill a reporting need. Hence, stopping that etl process was out of the question. Also, the application performance slows down when the etl kicks off and the response time gets better when the etl is finished. Also, the etl used to run 4 times in the business hours.
After analysing the statspack report of the database since it was 9i, I found that the top wait event when in the slowness window was 'Buffer Busy Waits' apart from IO:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read 2,198,322 13,940 40.34
buffer busy waits 1,273,203 9,852 28.51
CPU time 4,241 12.27
enqueue 1,170 3,414 9.88
latch free 97,147 1,542 4.46
Also the buffer busy waits are not in the top 5 wait events when performance issue is not there.
I tried to find whether there was a link when users were complaining about database performance and buffer busy waits and I did find a link. Here is an output of my query from statspack to check this trend:
SNAP_TIME CLASS WAIT_COUNT TIME
----------------- -------------------- ---------- ------------
21-06-11 00:13:06 2nd level bmb 1 3
21-06-11 00:13:06 data block 74,340 338,940
21-06-11 00:13:06 segment header 18 33
21-06-11 00:13:06 undo block 42 1
21-06-11 00:43:05 data block 34,519 181,783
21-06-11 00:43:05 undo header 2 5
21-06-11 01:13:09 data block 2,733 2,744
21-06-11 01:43:12 data block 144 344
21-06-11 01:43:12 undo header 2 0
21-06-11 02:13:16 data block 187 208
21-06-11 02:43:20 data block 70 155
21-06-11 03:13:25 data block 160,186 318,329
21-06-11 03:13:25 segment header 2 0
21-06-11 03:13:25 undo block 170 25
21-06-11 03:43:28 data block 40,802 87,378
21-06-11 04:13:32 data block 21,974 19,604
21-06-11 04:13:32 segment header 1 2
21-06-11 04:43:36 data block 17 47
21-06-11 05:13:40 data block 3,523 2,755
21-06-11 05:13:40 segment header 4 1
21-06-11 05:43:44 data block 15,238 11,911
21-06-11 05:43:44 segment header 6 5
21-06-11 05:43:44 undo header 7 0
21-06-11 06:13:47 data block 259 374
21-06-11 06:43:51 data block 181 402
21-06-11 06:43:51 undo header 6 2
21-06-11 07:13:56 data block 42,250 22,239
21-06-11 07:13:56 undo header 3 0
21-06-11 07:43:59 data block 158,719 91,015
21-06-11 08:14:04 1st level bmb 3 1
21-06-11 08:14:04 2nd level bmb 1 3
21-06-11 08:14:04 data block 542,143 761,181
21-06-11 08:14:04 segment header 9 14
21-06-11 08:14:04 undo block 16 2
21-06-11 08:14:04 undo header 73 0
21-06-11 08:44:09 data block ######## 1,109,580
21-06-11 08:44:09 segment header 2 5
21-06-11 08:44:09 undo block 65 8
21-06-11 08:44:09 undo header 9 10
21-06-11 09:14:12 data block ######## 930,479
21-06-11 09:14:12 segment header 3 4
21-06-11 09:14:12 undo block 20 1
21-06-11 09:14:12 undo header 4 6
21-06-11 09:44:19 data block ######## 981,900
21-06-11 09:44:19 segment header 1 3
21-06-11 09:44:19 undo block 115 53
21-06-11 09:44:19 undo header 34 7
21-06-11 10:14:22 data block 982,942 701,688
21-06-11 10:14:22 segment header 6 7
21-06-11 10:14:22 undo block 43 32
21-06-11 10:14:22 undo header 3 5
21-06-11 10:44:26 data block 522,797 342,624
21-06-11 10:44:26 segment header 2 5
21-06-11 10:44:26 undo block 14 0
21-06-11 10:44:26 undo header 4 0
21-06-11 11:14:32 1st level bmb 8 4
21-06-11 11:14:32 2nd level bmb 6 3
21-06-11 11:14:32 data block 905,206 1,077,409
21-06-11 11:14:32 segment header 46 50
21-06-11 11:14:32 undo block 62 9
21-06-11 11:14:32 undo header 7 6
21-06-11 11:44:51 data block ######## 2,316,365
21-06-11 11:44:51 segment header 3 20
21-06-11 11:44:51 undo block 186 293
21-06-11 11:44:51 undo header 9 46
21-06-11 12:14:55 data block 153,829 130,860
21-06-11 12:14:55 undo block 15 10
21-06-11 12:14:55 undo header 2 1
21-06-11 12:44:59 data block 152,903 84,885
21-06-11 12:44:59 undo block 13 0
21-06-11 12:44:59 undo header 2 0
21-06-11 13:15:01 data block 37,928 37,543
21-06-11 13:15:01 segment header 10 4
21-06-11 13:15:01 undo header 9 7
21-06-11 13:45:06 data block 28,441 12,470
21-06-11 13:45:06 undo header 4 0
21-06-11 14:15:09 data block 595,511 339,329
21-06-11 14:15:09 undo block 19 2
21-06-11 14:15:09 undo header 27 7
21-06-11 14:45:16 data block 24,499 12,706
21-06-11 14:45:16 undo block 83 130
21-06-11 14:45:16 undo header 33 18
21-06-11 15:15:18 1st level bmb 3 3
21-06-11 15:15:18 2nd level bmb 2 3
21-06-11 15:15:18 data block ######## 2,149,819
With the above output, users were complaining about database performance when when the buffer busy waits was high. Also, the contention in on data blocks.
After running some monitoring script to find out which object was having this, it was identified that there was a massive table. Also, the database default block size was 32K meaning having alot of rows per block. To make it worse, the table was sitting on a local managed tablespace with manual segment space management. Another interesting finding during the analysis was that the reason code for buffer busy wait event was 300 (Block is being read by another session, and no other suitable block image was found, so we wait until the read is completed. This may also occur after a buffer cache assumed deadlock. The kernel can't get a buffer in a certain amount of time and assumes a deadlock. Therefore it will read the CR version of the block. ) It simply indicated we were having hot block contention.
Hence, my recommnedation to fix the issue was to create another 8K blocksize tablespace which had 'automatic segment space managment' and move the table with the issue to it. Once the migration was done, the buffer busy waits disappeared from the top 5 wait events and users were finding the application response time normal even when the etl was running. The business was happy as well.
The buffer busy wait events was also showing better results after the change:
SNAP_TIME CLASS WAIT_COUNT TIME
----------------- -------------------- ---------- ------------
28-06-11 00:04:56 data block 40,959 90,487
28-06-11 00:04:56 undo block 5 0
28-06-11 00:34:59 data block 59,194 179,264
28-06-11 00:34:59 undo block 18 9
28-06-11 01:05:03 data block 1,459 2,088
28-06-11 01:35:06 data block 145 451
28-06-11 02:05:11 data block 235 503
28-06-11 02:35:14 data block 144 516
28-06-11 03:05:19 data block 16,663 20,802
28-06-11 03:35:22 2nd level bmb 1 4
28-06-11 03:35:22 data block 140,303 404,557
28-06-11 03:35:22 segment header 7 10
28-06-11 03:35:22 undo block 39 58
28-06-11 03:35:22 undo header 6 17
28-06-11 04:05:26 data block 31,410 24,654
28-06-11 04:35:29 data block 4,376 6,528
28-06-11 05:05:33 data block 9,393 14,982
28-06-11 05:05:33 segment header 12 8
28-06-11 05:05:33 undo block 2 0
28-06-11 05:35:38 undo header 2 0
28-06-11 06:05:41 data block 266 138
28-06-11 06:35:45 data block 147,655 295,869
28-06-11 06:35:45 segment header 3 1
28-06-11 06:35:45 undo block 8 0
28-06-11 07:05:49 data block 9,572 11,383
28-06-11 07:05:49 segment header 2 2
28-06-11 07:35:53 data block 141,267 280,172
28-06-11 07:35:53 undo block 5 1
28-06-11 08:05:57 data block 1,239 1,913
28-06-11 08:05:57 segment header 2 2
28-06-11 08:05:57 undo header 2 0
28-06-11 08:36:00 data block 8,436 15,255
28-06-11 08:36:00 segment header 3 2
28-06-11 08:36:00 undo header 11 0
28-06-11 09:06:07 1st level bmb 3 3
28-06-11 09:06:07 2nd level bmb 2 0
28-06-11 09:06:07 data block 164,876 520,704
28-06-11 09:06:07 segment header 28 39
28-06-11 09:06:07 undo block 18 3
28-06-11 09:06:07 undo header 7 0
28-06-11 09:36:10 data block 15,731 45,259
28-06-11 09:36:10 undo block 4 5
28-06-11 09:36:10 undo header 1 2
28-06-11 10:06:19 data block 22,449 22,406
28-06-11 10:06:19 undo block 11 6
28-06-11 10:06:19 undo header 19 10
28-06-11 10:36:20 data block 9,059 14,184
28-06-11 10:36:20 segment header 1 4
28-06-11 10:36:20 undo block 8 2
28-06-11 10:36:20 undo header 28 1
28-06-11 11:06:24 data block 18,346 46,440
28-06-11 11:06:24 segment header 7 8
28-06-11 11:06:24 undo block 6 0
28-06-11 11:06:24 undo header 8 1
28-06-11 11:36:26 data block 14,429 15,896
28-06-11 11:36:26 undo block 6 0
28-06-11 11:36:26 undo header 9 3
28-06-11 12:06:32 data block 1,967 5,556
28-06-11 12:06:32 undo block 3 0
28-06-11 12:06:32 undo header 3 0
28-06-11 12:36:34 data block 1,992 3,037
28-06-11 12:36:34 undo block 4 0
28-06-11 12:36:34 undo header 1 5
28-06-11 13:06:40 1st level bmb 8 4
28-06-11 13:06:40 2nd level bmb 8 0
28-06-11 13:06:40 data block 183,334 517,622
28-06-11 13:06:40 segment header 34 43
28-06-11 13:06:40 undo block 7 2
28-06-11 13:36:42 data block 26,026 28,859
28-06-11 13:36:42 undo header 3 0
28-06-11 14:06:46 data block 2,078 3,912
28-06-11 14:06:46 segment header 1 3
28-06-11 14:06:46 undo block 4 0
28-06-11 14:06:46 undo header 15 2
28-06-11 14:36:52 data block 3,131 5,007
28-06-11 14:36:52 undo block 3 0
28-06-11 14:36:52 undo header 15 1
28-06-11 15:06:54 data block 20,727 92,361
28-06-11 15:06:54 segment header 8 24
28-06-11 15:06:54 undo block 10 9
28-06-11 15:06:54 undo header 23 7
When it ended up in my court, the first thing I did was to identify the window when the user experience was really bad. Hence, I had to liaise with app vendors to undertand the application a bit and what else was running on it.
It was finally understood that there is a small etl job that runs on the database during business hours regularly. It was one of the business requirments to fulfill a reporting need. Hence, stopping that etl process was out of the question. Also, the application performance slows down when the etl kicks off and the response time gets better when the etl is finished. Also, the etl used to run 4 times in the business hours.
After analysing the statspack report of the database since it was 9i, I found that the top wait event when in the slowness window was 'Buffer Busy Waits' apart from IO:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
db file sequential read 2,198,322 13,940 40.34
buffer busy waits 1,273,203 9,852 28.51
CPU time 4,241 12.27
enqueue 1,170 3,414 9.88
latch free 97,147 1,542 4.46
Also the buffer busy waits are not in the top 5 wait events when performance issue is not there.
I tried to find whether there was a link when users were complaining about database performance and buffer busy waits and I did find a link. Here is an output of my query from statspack to check this trend:
SNAP_TIME CLASS WAIT_COUNT TIME
----------------- -------------------- ---------- ------------
21-06-11 00:13:06 2nd level bmb 1 3
21-06-11 00:13:06 data block 74,340 338,940
21-06-11 00:13:06 segment header 18 33
21-06-11 00:13:06 undo block 42 1
21-06-11 00:43:05 data block 34,519 181,783
21-06-11 00:43:05 undo header 2 5
21-06-11 01:13:09 data block 2,733 2,744
21-06-11 01:43:12 data block 144 344
21-06-11 01:43:12 undo header 2 0
21-06-11 02:13:16 data block 187 208
21-06-11 02:43:20 data block 70 155
21-06-11 03:13:25 data block 160,186 318,329
21-06-11 03:13:25 segment header 2 0
21-06-11 03:13:25 undo block 170 25
21-06-11 03:43:28 data block 40,802 87,378
21-06-11 04:13:32 data block 21,974 19,604
21-06-11 04:13:32 segment header 1 2
21-06-11 04:43:36 data block 17 47
21-06-11 05:13:40 data block 3,523 2,755
21-06-11 05:13:40 segment header 4 1
21-06-11 05:43:44 data block 15,238 11,911
21-06-11 05:43:44 segment header 6 5
21-06-11 05:43:44 undo header 7 0
21-06-11 06:13:47 data block 259 374
21-06-11 06:43:51 data block 181 402
21-06-11 06:43:51 undo header 6 2
21-06-11 07:13:56 data block 42,250 22,239
21-06-11 07:13:56 undo header 3 0
21-06-11 07:43:59 data block 158,719 91,015
21-06-11 08:14:04 1st level bmb 3 1
21-06-11 08:14:04 2nd level bmb 1 3
21-06-11 08:14:04 data block 542,143 761,181
21-06-11 08:14:04 segment header 9 14
21-06-11 08:14:04 undo block 16 2
21-06-11 08:14:04 undo header 73 0
21-06-11 08:44:09 data block ######## 1,109,580
21-06-11 08:44:09 segment header 2 5
21-06-11 08:44:09 undo block 65 8
21-06-11 08:44:09 undo header 9 10
21-06-11 09:14:12 data block ######## 930,479
21-06-11 09:14:12 segment header 3 4
21-06-11 09:14:12 undo block 20 1
21-06-11 09:14:12 undo header 4 6
21-06-11 09:44:19 data block ######## 981,900
21-06-11 09:44:19 segment header 1 3
21-06-11 09:44:19 undo block 115 53
21-06-11 09:44:19 undo header 34 7
21-06-11 10:14:22 data block 982,942 701,688
21-06-11 10:14:22 segment header 6 7
21-06-11 10:14:22 undo block 43 32
21-06-11 10:14:22 undo header 3 5
21-06-11 10:44:26 data block 522,797 342,624
21-06-11 10:44:26 segment header 2 5
21-06-11 10:44:26 undo block 14 0
21-06-11 10:44:26 undo header 4 0
21-06-11 11:14:32 1st level bmb 8 4
21-06-11 11:14:32 2nd level bmb 6 3
21-06-11 11:14:32 data block 905,206 1,077,409
21-06-11 11:14:32 segment header 46 50
21-06-11 11:14:32 undo block 62 9
21-06-11 11:14:32 undo header 7 6
21-06-11 11:44:51 data block ######## 2,316,365
21-06-11 11:44:51 segment header 3 20
21-06-11 11:44:51 undo block 186 293
21-06-11 11:44:51 undo header 9 46
21-06-11 12:14:55 data block 153,829 130,860
21-06-11 12:14:55 undo block 15 10
21-06-11 12:14:55 undo header 2 1
21-06-11 12:44:59 data block 152,903 84,885
21-06-11 12:44:59 undo block 13 0
21-06-11 12:44:59 undo header 2 0
21-06-11 13:15:01 data block 37,928 37,543
21-06-11 13:15:01 segment header 10 4
21-06-11 13:15:01 undo header 9 7
21-06-11 13:45:06 data block 28,441 12,470
21-06-11 13:45:06 undo header 4 0
21-06-11 14:15:09 data block 595,511 339,329
21-06-11 14:15:09 undo block 19 2
21-06-11 14:15:09 undo header 27 7
21-06-11 14:45:16 data block 24,499 12,706
21-06-11 14:45:16 undo block 83 130
21-06-11 14:45:16 undo header 33 18
21-06-11 15:15:18 1st level bmb 3 3
21-06-11 15:15:18 2nd level bmb 2 3
21-06-11 15:15:18 data block ######## 2,149,819
With the above output, users were complaining about database performance when when the buffer busy waits was high. Also, the contention in on data blocks.
After running some monitoring script to find out which object was having this, it was identified that there was a massive table. Also, the database default block size was 32K meaning having alot of rows per block. To make it worse, the table was sitting on a local managed tablespace with manual segment space management. Another interesting finding during the analysis was that the reason code for buffer busy wait event was 300 (Block is being read by another session, and no other suitable block image was found, so we wait until the read is completed. This may also occur after a buffer cache assumed deadlock. The kernel can't get a buffer in a certain amount of time and assumes a deadlock. Therefore it will read the CR version of the block. ) It simply indicated we were having hot block contention.
Hence, my recommnedation to fix the issue was to create another 8K blocksize tablespace which had 'automatic segment space managment' and move the table with the issue to it. Once the migration was done, the buffer busy waits disappeared from the top 5 wait events and users were finding the application response time normal even when the etl was running. The business was happy as well.
The buffer busy wait events was also showing better results after the change:
SNAP_TIME CLASS WAIT_COUNT TIME
----------------- -------------------- ---------- ------------
28-06-11 00:04:56 data block 40,959 90,487
28-06-11 00:04:56 undo block 5 0
28-06-11 00:34:59 data block 59,194 179,264
28-06-11 00:34:59 undo block 18 9
28-06-11 01:05:03 data block 1,459 2,088
28-06-11 01:35:06 data block 145 451
28-06-11 02:05:11 data block 235 503
28-06-11 02:35:14 data block 144 516
28-06-11 03:05:19 data block 16,663 20,802
28-06-11 03:35:22 2nd level bmb 1 4
28-06-11 03:35:22 data block 140,303 404,557
28-06-11 03:35:22 segment header 7 10
28-06-11 03:35:22 undo block 39 58
28-06-11 03:35:22 undo header 6 17
28-06-11 04:05:26 data block 31,410 24,654
28-06-11 04:35:29 data block 4,376 6,528
28-06-11 05:05:33 data block 9,393 14,982
28-06-11 05:05:33 segment header 12 8
28-06-11 05:05:33 undo block 2 0
28-06-11 05:35:38 undo header 2 0
28-06-11 06:05:41 data block 266 138
28-06-11 06:35:45 data block 147,655 295,869
28-06-11 06:35:45 segment header 3 1
28-06-11 06:35:45 undo block 8 0
28-06-11 07:05:49 data block 9,572 11,383
28-06-11 07:05:49 segment header 2 2
28-06-11 07:35:53 data block 141,267 280,172
28-06-11 07:35:53 undo block 5 1
28-06-11 08:05:57 data block 1,239 1,913
28-06-11 08:05:57 segment header 2 2
28-06-11 08:05:57 undo header 2 0
28-06-11 08:36:00 data block 8,436 15,255
28-06-11 08:36:00 segment header 3 2
28-06-11 08:36:00 undo header 11 0
28-06-11 09:06:07 1st level bmb 3 3
28-06-11 09:06:07 2nd level bmb 2 0
28-06-11 09:06:07 data block 164,876 520,704
28-06-11 09:06:07 segment header 28 39
28-06-11 09:06:07 undo block 18 3
28-06-11 09:06:07 undo header 7 0
28-06-11 09:36:10 data block 15,731 45,259
28-06-11 09:36:10 undo block 4 5
28-06-11 09:36:10 undo header 1 2
28-06-11 10:06:19 data block 22,449 22,406
28-06-11 10:06:19 undo block 11 6
28-06-11 10:06:19 undo header 19 10
28-06-11 10:36:20 data block 9,059 14,184
28-06-11 10:36:20 segment header 1 4
28-06-11 10:36:20 undo block 8 2
28-06-11 10:36:20 undo header 28 1
28-06-11 11:06:24 data block 18,346 46,440
28-06-11 11:06:24 segment header 7 8
28-06-11 11:06:24 undo block 6 0
28-06-11 11:06:24 undo header 8 1
28-06-11 11:36:26 data block 14,429 15,896
28-06-11 11:36:26 undo block 6 0
28-06-11 11:36:26 undo header 9 3
28-06-11 12:06:32 data block 1,967 5,556
28-06-11 12:06:32 undo block 3 0
28-06-11 12:06:32 undo header 3 0
28-06-11 12:36:34 data block 1,992 3,037
28-06-11 12:36:34 undo block 4 0
28-06-11 12:36:34 undo header 1 5
28-06-11 13:06:40 1st level bmb 8 4
28-06-11 13:06:40 2nd level bmb 8 0
28-06-11 13:06:40 data block 183,334 517,622
28-06-11 13:06:40 segment header 34 43
28-06-11 13:06:40 undo block 7 2
28-06-11 13:36:42 data block 26,026 28,859
28-06-11 13:36:42 undo header 3 0
28-06-11 14:06:46 data block 2,078 3,912
28-06-11 14:06:46 segment header 1 3
28-06-11 14:06:46 undo block 4 0
28-06-11 14:06:46 undo header 15 2
28-06-11 14:36:52 data block 3,131 5,007
28-06-11 14:36:52 undo block 3 0
28-06-11 14:36:52 undo header 15 1
28-06-11 15:06:54 data block 20,727 92,361
28-06-11 15:06:54 segment header 8 24
28-06-11 15:06:54 undo block 10 9
28-06-11 15:06:54 undo header 23 7
No comments:
Post a Comment