Oracle每天凌晨2点的自动备份策略的导致的一系列问题
1,给安徽的同事安装了一个生产Oracle数据库,最近一段时间 总是在2点-10点之间出现数据库连不上的情况,具体tomcat应用日志如下:
创新互联建站-专业网站定制、快速模板网站建设、高性价比南湖网站开发、企业建站全套包干低至880元,成熟完善的模板库,直接使用。一站式南湖网站制作公司更省心,省钱,快速模板网站建设找我们,业务覆盖南湖地区。费用合理售后完善,十余年实体公司更值得信赖。
08:58:09 ERROR c.d.web.controller.DBAppController - 查询更新版本请求异常org.springframework.dao.DataAcce***esourceFailureException:
### Error querying database. Cause: java.sql.SQLException: Io exception: Connection timed out
### The error may exist in file [/usr/local/tomcat/xx/WEB-INF/classes/mapper/DBAppMapper.xml]
### The error may involve com.dabay.web.dao.DBAppDao.selectProperties-Inline
### The error occurred while setting parameters
### SQL: SELECT KEY,VALUE,DESCRIPTION FROM APP_PROPERTIES WHERE KEY=? AND DATA_STATUS!='9'
### Cause: java.sql.SQLException: Io exception: Connection timed out
; SQL []; Io exception: Connection timed out; nested exception is java.sql.SQLException: Io exception: Connection timed out
08:58:09 ERROR c.d.web.controller.DBAppController - DGW_0922084243406:查询轮播图请求异常org.springframework.dao.DataAcce***esourceFailureException:
### Error querying database. Cause: java.sql.SQLException: Io exception: Connection timed out
### The error may exist in file [/usr/local/tomcat/xx/WEB-INF/classes/mapper/DBAppMapper.xml]
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT TITLE, URL, REMARKS, PNGURL FROM INFO_BANNER WHERE DATA_STATUS!='9' AND ROWNUM<6 ORDER BY ORDERDESC asc,CREATE_TIME desc
### Cause: java.sql.SQLException: Io exception: Connection timed out
; SQL []; Io exception: Connection timed out; nested exception is java.sql.SQLException: Io exception: Connection timed out
2,想到排查ORACLE数据库是否正常,百度到了如下三个结果
一:查看数据库监听是否启动 lsnrctl status 二:查看数据库运行状态,是否open select instance_name,status from v$instance; 三:查看alert日志,查看是否有错误信息 SQL> show parameter background_dump NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ background_dump_dest string /u01/app/oracle/diag/rdbms/just_test/test/trace 是的,有alert日志,接下来查看alert日志,如下 db_recovery_file_dest_size of 3882 MB is 45.88% used. This is a user-specified limit on the amount of space that will be used by this database for recovery-related files, and does not reflect the amount of space available in the underlying filesystem or ASM diskgroup. Fri Sep 22 02:01:05 2017 Starting background process CJQ0 Fri Sep 22 02:01:05 2017 CJQ0 started with pid=22, OS id=6797 Fri Sep 22 02:06:05 2017 Starting background process SMCO Fri Sep 22 02:06:05 2017 SMCO started with pid=32, OS id=7393 Fri Sep 22 04:21:10 2017 Thread 1 cannot allocate new log, sequence 221 Private strand flush not complete Current log# 1 seq# 220 mem# 0: /u01/app/oracle/oradata/hsrs_pro/redo01.log Thread 1 advanced to log sequence 221 (LGWR switch) Current log# 2 seq# 221 mem# 0: /u01/app/oracle/oradata/hsrs_pro/redo02.log Fri Sep 22 09:00:35 2017 先看到了 Thread 1 cannot allocate new log, sequence 221,于是又百度了一下,找到了如下结果 (摘自 http://blog.csdn.net/zonelan/article/details/7613519) 这个实际上是个比较常见的错误。通常来说是因为在日志被写满时会切换 日志组,这个时候会触发一次checkpoint,DBWR会把内存中的脏块往数据文件中写,只要没写结束就不会释放这个日志组。如果归档模式被开启的 话,还会伴随着ARCH写归档的过程。如果redo log产生的过快,当CPK或归档还没完成,LGWR已经把其余的日志组写满,又要往当前的日志组里面写redo log的时候,这个时候就会发生冲突,数据库就会被挂起。并且一直会往alert.log中写类似上面的错误信息。 于是有了以下的操作: SQL> select group#,sequence#,bytes,members,status from v$log; #查看每组日志的状态 GROUP# SEQUENCE# BYTES MEMBERS STATUS ---------- ---------- ---------- ---------- -------------------------------- 1 220 52428800 1 INACTIVE ##空闲的 2 221 52428800 1 CURRENT ##当前的 3 219 52428800 1 INACTIVE ##空闲的 SQL> alter database add logfile group 4 ('/u01/app/oracle/oradata/xx/redo04.log') size 500M; 增加日志组 Database altered. SQL> alter database add logfile group 5 ('/u01/app/oracle/oradata/xx/redo05.log') size 500M; Database altered. SQL> alter system switch logfile; 切换日志组
SQL> select group#,sequence#,bytes,members,status from v$log; #查看状态发现有了区别 GROUP# SEQUENCE# BYTES MEMBERS STATUS ---------- ---------- ---------- ---------- -------------------------------- 1 22052428800 1 INACTIVE 2 22152428800 1 ACTIVE 3 21952428800 1 INACTIVE 4 222 524288000 1 ACTIVE 5 223 524288000 1 CURRENT 经理过如上操作,突然看到了alert日志中有一个recovery 并且 tomcat应用日志中也有recovery这个单词,于是又百度了一番。分别执行了如下命令(不懂什么意思) SQL> select * from v$flash_recovery_area_usage; SQL> select * from v$recovery_file_dest; 查看recovery的实际大小: NAME -------------------------------------------------------------------------------- SPACE_LIMIT SPACE_USED SPACE_RECLAIMABLE NUMBER_OF_FILES ----------- ---------- ----------------- --------------- /u01/app/oracle/recovery_area 4070572032 3926630400 2059067392 41 SQL> select * from v$flash_recovery_area_usage 2 ; FILE_TYPE PERCENT_SPACE_USED ---------------------------------------- ------------------ PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES ------------------------- --------------- CONTROL FILE 0 00 REDO LOG 0 00 ARCHIVED LOG 0 00 FILE_TYPE PERCENT_SPACE_USED ---------------------------------------- ------------------ PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES ------------------------- --------------- BACKUP PIECE 53.96 50.58 37 IMAGE COPY 42.5 04 FLASHBACK LOG 0 00 FILE_TYPE PERCENT_SPACE_USED ---------------------------------------- ------------------ PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES ------------------------- --------------- FOREIGN ARCHIVED LOG 0 00 7 rows selected. SQL> show parameter db_recovery_file_dest_size; 最后发现这个才是我要找的 查看当前recovery的限制大小 NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ db_recovery_file_dest_size big integer 3882M SQL> alter system set db_recovery_file_dest_size=5882M scope=spfile; 改大一点? System altered. SQL> show parameter db_recovery_file_dest_size; 但是好像并没有用,还是这么大 NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ db_recovery_file_dest_size big integer 3882M 好吧,仍然百度:)执行了如下命令好像管用了 SQL> alter system set db_recovery_file_dest_size=10G; System altered. SQL> show parameter db_recovery_file_dest_size; NAME TYPE ------------------------------------ ---------------------- VALUE ------------------------------ db_recovery_file_dest_size big integer 10G
先观察看看吧~应用日志10点好像没有超时报错了~~ 完
补充一下,下面这俩货的区别
scope=both scope=spfile Oracle spfile就是动态参数文件,里面设置了Oracle 的各种参数。所谓的动态, 就是说你可以在不关闭数据库的情况下,更改数据库参数,记录在spfile里面。更改参数 的时候,有4种scope选项,scope就是范围。 scope=spfile 仅仅更改spfile里面的记载,不更改内存,也就是不立即生效,而是等 下次数据库启动生效。 有一些参数只允许用这种方法更改,scope=memory 仅仅更改内存,不改spfile。也就是下次 启动就失效了 scope=both 内存和spfile都更改,不指定scope参数,等同于scope=both。 ========================================================================================= 好吧,问题好像解决了 oracle 在每天凌晨2点自动重启,登录EM 查了一下jobs果然2点是有一个自动备份策略的,具体步骤如下: 1, su oracle 2, source .bash_profile 3, sqlplus /nolog 4, conn /as sysdba 5, emctl status dbconsole 检查EM是否启动,如果没有==》 emctl start dbconsole [oracle@xx ~]$ emctl status dbconsole Oracle Enterprise Manager 11g Database Control Release 11.2.0.1.0 Copyright (c) 1996, 2009 Oracle Corporation. All rights reserved. https://xx:1158/em/console/aboutApplication Oracle Enterprise Manager 11g is running. ------------------------------------------------------------------ Logs are generated in directory /u01/app/oracle/product/11.2.0/dbhome_1/xx/sysman/log 6, 获取到如上的地址(https://xx:1158/em/console/aboutApplication),在浏览器访问 7,点开job
8,删除job
先这么观察一下。明天看结果,另外,上面查看job步骤是可以修改备份策略的。
本文名称:Oracle每天凌晨2点的自动备份策略的导致的一系列问题
当前网址:http://pcwzsj.com/article/ihjode.html