long长事务回滚的模拟与定位

大型事务的回滚 大型事务的回滚产生非常大的代价,不仅锁定需要的资源,并且消耗的CPU和IO,尤其是IO将极为密集。尤其在KILL大型事务之前检查事务究竟有多大可能是必要的,同时我们也需要知道回滚已经进行了多少程度。
一、模拟:

1.删除500多万的数据。


SQL> conn hr/hr;
Connected.
SQL>  create table  test as select * from dba_objects;

Table created.

SQL> insert into test select * from test;

87055 rows created.

SQL> insert into test select * from test;

174110 rows created.

SQL> insert into test select * from test;

348220 rows created.

SQL> insert into test select * from test;

696440 rows created.

SQL> insert into test select * from test;

1392880 rows created.

SQL> select count(*) from test;

  COUNT(*)
----------
   2785760

SQL> insert into test select * from test;

2785760 rows created.

SQL> commit;

Commit complete.

SQL> select count(*) from test;

  COUNT(*)
----------
   5571520

SQL> analyze table test compute statistics;

Table analyzed.

SQL>  
SQL> select sid from v$mystat where rownum=1;

       SID
----------
        30

SQL>

--模拟删除,不commit
SQL> delete test;
 
5571520 rows deleted.



--另开窗口,查询kill掉会话

SQL>  select sid,serial#,sql_id,event,blocking_session from v$session where sid=30;

       SID    SERIAL# SQL_ID        EVENT                                                            BLOCKING_SESSION
---------- ---------- ------------- ---------------------------------------------------------------- ----------------
        30        165 7qqwcq9td6akt log buffer space                                                               11

SQL> select sql_text from v$sql where sql_id='7qqwcq9td6akt';

SQL_TEXT
----------------------------------------------------------------------
 delete test

SQL> alter system kill session '30,165' immediate;

System altered.

--回到原来窗口验证:
SQL>  select count(*) from test;
 select count(*) from test
        *
ERROR at line 1:
ORA-03135: connection lost contact
Process ID: 28346
Session ID: 30 Serial number: 165


二、定位:
查看回滚进度:
可以通过以下两个视图查看回滚的进度,通过单位时间内恢复的undo block来估算恢复时间:
1.    通过x$ktuxe
alter session set nls_date_format='yyyy-mm-dd hh34:mi:ss';
select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ,sysdate from x$ktuxe where KTUXECFL='DEAD' and KTUXESIZ >0;


SQL> alter session set nls_date_format='yyyy-mm-dd hh34:mi:ss';

Session altered.

SQL> select ADDR,KTUXEUSN,KTUXESLT,KTUXESQN,KTUXESIZ,sysdate from x$ktuxe where KTUXECFL='DEAD' and KTUXESIZ >0;

ADDR               KTUXEUSN   KTUXESLT   KTUXESQN   KTUXESIZ SYSDATE
---------------- ---------- ---------- ---------- ---------- -------------------
00007F170E8AAC20          5         11       7784      84438 2017-12-09 14:19:22

SQL>

可以通过KTUXESLT ,KTUXESQN这两个字段,然后用以下脚本回滚得出大概需要的时间:
set serveroutput on
declare
  l_start number;
  l_end   number;
begin
  select ktuxesiz
    into l_start
    from x$ktuxe
   where KTUXEUSN = 5
     and KTUXESLT = 11;
  dbms_lock.sleep(60);
  select ktuxesiz
    into l_end
    from x$ktuxe
   where KTUXEUSN = 5
     and KTUXESLT = 11;
  dbms_output.put_line('time est Day:' ||
                       round(l_end / (l_start - l_end) / 60 / 24, 2));
end;
/

time est Day:.01

PL/SQL procedure successfully completed.

SQL> SQL>  

2.    通过v$fast_start_trancsations 状态为recovering表示恢复中;
select USN,SLT,SEQ,STATE,UNDOBLOCKSDONE,UNDOBLOCKSTOTAL,CPUTIME,XID,sysdate from V$FAST_START_TRANSACTIONS;

SQL> select USN,SLT,SEQ,STATE,UNDOBLOCKSDONE,UNDOBLOCKSTOTAL,CPUTIME,XID,sysdate from V$FAST_START_TRANSACTIONS;

       USN        SLT        SEQ STATE            UNDOBLOCKSDONE UNDOBLOCKSTOTAL    CPUTIME XID              SYSDATE
---------- ---------- ---------- ---------------- -------------- --------------- ---------- ---------------- -------------------
         5         11       7784 RECOVERING               123491          179829        120 05000B00681E0000 2017-12-09 14:20:25
         
通过如下视图观察回滚是串行还是并行回滚的,如下图应是并行恢复的,
V$FAST_START_SERVERS provides information about all the recovery slaves performing parallel transaction recovery.

通过xid字段与v$fast_start_trancsations关联。
select * from v$fast_start_servers  where xid in (select XID from V$FAST_START_TRANSACTIONS);

SQL> select * from v$fast_start_servers  where xid in (select XID from V$FAST_START_TRANSACTIONS);

STATE       UNDOBLOCKSDONE        PID XID
----------- -------------- ---------- ----------------
RECOVERING          133950         20 05000B00681E0000

3.查看回滚对象
通过dump undo block方式查看回滚的对象:

首先通过v$fast_start_trancsations的usn字段查询到使用的回滚段。

SQL> select * from v$rollname where usn=5;

       USN NAME
---------- ------------------------------
         5 _SYSSMU5_898567397$
         
dump这个undo block,因为dump的文件很大,在查询出对象的object_id后,kill掉这个dump会话。
alter system dump undo block "" XID ;

SQL> alter system dump undo block "_SYSSMU5_898567397$" xid 5 11 7784;
 
。。。。。。。。。


[oracle@wang trace]$ ls -lrt
total 635992
-rw-r----- 1 oracle oinstall        81 Apr 27  2017 DBdb_ora_9045.trm
-rw-r----- 1 oracle oinstall        59 Apr 27  2017 DBdb_mman_9065.trm
-rw-r----- 1 oracle oinstall        60 Apr 27  2017 DBdb_ora_9084.trm
-rw-r----- 1 oracle oinstall       111 Apr 27  2017 DBdb_ora_9099.trm
.......................................
-rw-r----- 1 oracle oinstall     12306 Dec  9 14:23 DBdb_ora_28113.trm
-rw-r----- 1 oracle oinstall 536798705 Dec  9 14:23 DBdb_ora_28113.trc
[oracle@wang trace]$
[oracle@wang trace]$
[oracle@wang trace]$
[oracle@wang trace]$ grep objn DBdb_ora_28113.trc | head -5
* Rec #0x11  slt: 0x0b  objn: 90373(0x00016105)  objd: 90373  tblspc: 4(0x00000004)
* Rec #0x10  slt: 0x0b  objn: 90373(0x00016105)  objd: 90373  tblspc: 4(0x00000004)
* Rec #0xf  slt: 0x0b  objn: 90373(0x00016105)  objd: 90373  tblspc: 4(0x00000004)
* Rec #0xe  slt: 0x0b  objn: 90373(0x00016105)  objd: 90373  tblspc: 4(0x00000004)
* Rec #0xd  slt: 0x0b  objn: 90373(0x00016105)  objd: 90373  tblspc: 4(0x00000004)
[oracle@wang trace]$
[oracle@wang trace]$

可以查询到objn为121192,对应的就是dba_objects的object_id,即mosongtao.rollback_test,正是前边测试的对象。再查询v$session_longops配合username,last_update_time,target,可以大概定位到执行sql_id。

注意:在查询到object_id后手工停掉dump undo block 动作

SQL> alter system dump undo block "_SYSSMU5_898567397$" xid 5 11 7784;
 
^C^C^C^C^C^C
alter system dump undo block "_SYSSMU5_898567397$" xid 5 11 7784
*
ERROR at line 1:
ORA-01013: user requested cancel of current operation

SQL> SQL> SQL> SQL>               


4.查询sql
SQL> select SID,TARGET,SQL_ID,START_TIME,LAST_UPDATE_TIME from v$session_longops where target like '%TEST%' order by LAST_UPDATE_TIME desc;

       SID TARGET                                                           SQL_ID        START_TIME   LAST_UPDATE_
---------- ---------------------------------------------------------------- ------------- ------------ ------------
        30 HR.TEST                                                          7qqwcq9td6akt 09-DEC-17    09-DEC-17

SQL> select sql_text from v$sql where sql_id='7qqwcq9td6akt';

SQL_TEXT
-------------------------------------------------------------------------
 delete test

SQL>









文章题目:long长事务回滚的模拟与定位
本文链接:http://pcwzsj.com/article/gijihs.html