如何排查CPU过载过高的问题
本篇内容主要讲解“如何排查CPU过载过高的问题”,感兴趣的朋友不妨来看看。本文介绍的方法操作简单快捷,实用性强。下面就让小编来带大家学习“如何排查CPU过载过高的问题”吧!
创新互联坚持“要么做到,要么别承诺”的工作理念,服务领域包括:成都网站设计、成都网站建设、企业官网、英文网站、手机端网站、网站推广等服务,满足客户于互联网时代的法库网站设计、移动媒体设计的需求,帮助企业找到有效的互联网解决方案。努力成为您成熟可靠的网络建设合作伙伴!
侦探问题根源
通过top指令发现,当前5511的线程cup和内存的使用率过高:
top 5511 root 20 0 16.841g 6.088g 5584 S 47.0 39.2 4011:41 java 9550 root 20 0 2516200 67892 2436 S 0.7 0.4 204:20.40 java 21271 root 20 0 11.579g 0.987g 5056 S 0.7 6.4 46:29.88 java 13 root 20 0 0 0 0 S 0.3 0.0 0:35.21 ksoftirqd/1 2128 root 20 0 9.833g 1.273g 6656 S 0.3 8.2 33:48.35 java 29464 root 20 0 11.578g 1.030g 5448 S 0.3 6.6 31:42.67 java 31721 root 20 0 157744 2264 1544 R 0.3 0.0 0:00.32 top
查询当前java进程详情,利用 jinfo pid 5511
Attaching to process ID 5511, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.151-b12 Java System Properties: java.runtime.name = Java(TM) SE Runtime Environment java.vm.version = 25.151-b12 sun.boot.library.path = /usr/local/java/jre/lib/amd64 java.vendor.url = http://java.oracle.com/ java.vm.vendor = Oracle Corporation path.separator = : file.encoding.pkg = sun.io java.vm.name = Java HotSpot(TM) 64-Bit Server VM sun.os.patch.level = unknown sun.java.launcher = SUN_STANDARD user.country = US user.dir = /mnt/app/bdcenter-base/bdcenter-service-base-1.0-SNAPSHOT java.vm.specification.name = Java Virtual Machine Specification PID = 5511 java.runtime.version = 1.8.0_151-b12 java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment os.arch = amd64 java.endorsed.dirs = /usr/local/java/jre/lib/endorsed line.separator = java.io.tmpdir = /tmp java.vm.specification.vendor = Oracle Corporation os.name = Linux io.netty.noKeySetOptimization = true sun.jnu.encoding = UTF-8 java.library.path = /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib spring.beaninfo.ignore = true sun.nio.ch.bugLevel = java.specification.name = Java Platform API Specification java.class.version = 52.0 sun.management.compiler = HotSpot 64-Bit Tiered Compilers os.version = 3.10.0-693.2.2.el7.x86_64 user.home = /root user.timezone = Asia/Shanghai catalina.useNaming = false java.awt.printerjob = sun.print.PSPrinterJob file.encoding = UTF-8 @appId = bdcenter-service-base java.specification.version = 1.8 io.netty.recycler.maxCapacityPerThread = 0 catalina.home = /tmp/tomcat.7050606280722271893.9025 user.name = root java.class.path = bdcenter-service-base-1.0-SNAPSHOT.jar java.vm.specification.version = 1.8 sun.arch.data.model = 64 sun.java.command = bdcenter-service-base-1.0-SNAPSHOT.jar --spring.profiles.active=prod 5 java.home = /usr/local/java/jre user.language = en java.specification.vendor = Oracle Corporation io.netty.noUnsafe = true awt.toolkit = sun.awt.X11.XToolkit java.vm.info = mixed mode java.version = 1.8.0_151 java.ext.dirs = /usr/local/java/jre/lib/ext:/usr/java/packages/lib/ext sun.boot.class.path = /usr/local/java/jre/lib/resources.jar:/usr/local/java/jre/lib/rt.jar:/usr/local/java/jre/lib/sunrsasign.jar:/usr/local/java/jre/lib/jsse.jar:/usr/local/java/jre/lib/jce.jar:/usr/local/java/jre/lib/charsets.jar:/usr/local/java/jre/lib/jfr.jar:/usr/local/java/jre/classes java.awt.headless = true java.vendor = Oracle Corporation catalina.base = /tmp/tomcat.7050606280722271893.9025 file.separator = / java.vendor.url.bug = http://bugreport.sun.com/bugreport/ sun.io.unicode.encoding = UnicodeLittle sun.cpu.endian = little sun.cpu.isalist = VM Flags: Non-default VM flags: -XX:+AggressiveOpts -XX:CICompilerCount=4 -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:InitialHeapSize=6442450944 -XX:MaxDirectMemorySize=1073741824 -XX:MaxHeapSize=6442450944 -XX:MaxNewSize=697892864 -XX:MaxTenuringThreshold=6 -XX:MinHeapDeltaBytes=196608 -XX:NewSize=697892864 -XX:OldPLABSize=16 -XX:OldSize=5744558080 -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:ThreadStackSize=256 -XX:+UseBiasedLocking -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseFastAccessorMethods -XX:+UseParNewGC Command line: -Xms6G -Xmx6G -XX:MaxPermSize=256M -XX:MaxDirectMemorySize=1G -Xss256k -XX:+AggressiveOpts -XX:+UseBiasedLocking -XX:+UseFastAccessorMethods -XX:+DisableExplicitGC -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=75 -XX:CMSInitiatingOccupancyFraction=75 -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
Command line 可以看出JVM 用户输入参,对整个jvm的信息进行消息的描述
查看当前gc情况: jstat -gccause 5511
[root@bigdata-service-1:/root]# jstat -gccause 5511 1000 S0 S1 E O M CCS YGC YGCT FGC FGCT GCT LGCC GCC 0.00 8.86 61.50 72.16 95.05 91.62 4577 326.410 8 8.855 335.265 Allocation Failure No GC 0.00 8.86 64.31 72.16 95.05 91.62 4577 326.410 8 8.855 335.265 Allocation Failure No GC 0.00 8.86 65.04 72.16 95.05 91.62 4577 326.410 8 8.855 335.265 Allocation Failure No GC 0.00 8.86 65.17 72.16 95.05 91.62 4577 326.410 8 8.855 335.265 Allocation Failure No GC 0.00 8.86 65.26 72.16 95.05 91.62 4577 326.410 8 8.855 335.265 Allocation Failure No GC 0.00 8.86 65.26 72.16 95.05 91.62 4577 326.410 8 8.855 335.265 Allocation Failure No GC 0.00 8.86 67.80 72.16 95.05 91.62 4577 326.410 8 8.855 335.265 Allocation Failure No GC
发现新生代频繁的进行gc操【YGC:gc 次数, YGCT: gc时间】,老年的几乎没有gc【FGC:full gc次数,FGCT :full gc时间】,可能原因是新生代的设计过小,导致新生代频繁gc
查看当前jvm 内存分配:
jmap -heap 5511 [root@bigdata-service-1:/root]# jmap -heap 5511 Attaching to process ID 5511, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.151-b12 using parallel threads in the new generation. using thread-local object allocation. Concurrent Mark-Sweep GC Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 6442450944 (6144.0MB) NewSize = 697892864 (665.5625MB) MaxNewSize = 697892864 (665.5625MB) OldSize = 5744558080 (5478.4375MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB) Heap Usage: New Generation (Eden + 1 Survivor Space): capacity = 628162560 (599.0625MB) used = 210724080 (200.96214294433594MB) free = 417438480 (398.10035705566406MB) 33.54610628178795% used Eden Space: capacity = 558432256 (532.5625MB) used = 203572032 (194.14141845703125MB) free = 354860224 (338.42108154296875MB) 36.45420367694519% used From Space: capacity = 69730304 (66.5MB) used = 7152048 (6.8207244873046875MB) free = 62578256 (59.67927551269531MB) 10.256728552337876% used To Space: capacity = 69730304 (66.5MB) used = 0 (0.0MB) free = 69730304 (66.5MB) 0.0% used concurrent mark-sweep generation: capacity = 5744558080 (5478.4375MB) used = 4145940376 (3953.876853942871MB) free = 1598617704 (1524.560646057129MB) 72.17161561016022% used 32720 interned Strings occupying 4008568 bytes.
从JVM的分配可以看出,新生代分配:capacity = 628162560 (599.0625MB),Eden Space:capacity = 558432256 (532.5625MB),老年代分配:capacity = 5744558080 (5478.4375MB)。从数据分析出,新生代设计过小,导致新生代频繁gc。
其次,新生代设计过小,导致大对象无法分配,直接分配到老年代。导致老年代空间占用过大,从而占用整个系统内存空间。可能导致其他服务内存不足。
线上手动GC,查看GC后的JVM详情信息:
[root@bigdata-service-1:/root]# jmap -histo:live 5511 日志过大..... [root@bigdata-service-1:/root]# jstat -gc 5511 1000 S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT 68096.0 68096.0 7588.1 0.0 545344.0 394801.0 5609920.0 4050654.1 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 8.855 336.437 68096.0 68096.0 0.0 0.0 545344.0 0.0 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 46918.8 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 46922.8 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 46937.1 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 46937.1 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 46937.1 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 64190.0 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 64190.0 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 64190.0 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 64190.0 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 64190.0 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 82152.3 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 82152.3 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 82152.3 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 82156.4 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 82156.4 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 101737.5 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 101737.5 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 103581.5 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 103581.5 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 103581.5 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 124812.4 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 124812.4 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578 68096.0 68096.0 0.0 0.0 545344.0 124814.5 5609920.0 4043257.0 87952.0 83637.7 11392.0 10437.9 4592 327.582 9 12.997 340.578
手动强制Full gc 以后发现,内存并没有被释放,说明当前系统可能存在内存泄漏,导致对象不能被清理释放【OC:老年代空间,OU :老年代使用空间(KB)】。 存在内存泄漏,主要是当前的对象引用没有被释放导致,比如僵尸线程的存在。说明当前系统可能存在大量的僵尸线程。进而进行进一步的排查
查看当前系统线程情况:jstack。可以看到当前线程的详细情况
[root@bigdata-service-1:/root]# jstack 5511 | head -50 2018-07-20 11:59:27 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode): "Keep-Alive-Timer" #19437 daemon prio=8 os_prio=0 tid=0x00007f8d749ce000 nid=0x4995 waiting on condition [0x00007f8bb7b2b000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at sun.net.www.http.KeepAliveCache.run(KeepAliveCache.java:172) at java.lang.Thread.run(Thread.java:748) "http-nio-9025-exec-201" #19378 daemon prio=5 os_prio=0 tid=0x00007f8c8e3d3000 nid=0x7ffc waiting on condition [0x00007f8b3a724000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000679f4cbd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) "http-nio-9025-exec-197" #19374 daemon prio=5 os_prio=0 tid=0x00007f8c8e3cc000 nid=0x7ff7 waiting on condition [0x00007f8b3a828000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000679f4cbd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:748) "http-nio-9025-exec-196" #19373 daemon prio=5 os_prio=0 tid=0x00007f8c8a06d000 nid=0x7ff3 waiting on condition [0x00007f8b3a8aa000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000679f4cbd0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:103) at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:31) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
查看当前系统的线程总数:
jvm 开启的线程数:
[root@bigdata-service-1:/root]# jstack 5511 | wc -l 254132
当前服务器对不同IP的建立连接数:
[root@bigdata-service-1:/root]# netstat -nat|grep ESTABLISHED|awk '{print$5}'|awk -F : '{print$1}'|sort|uniq -c|sort -rn 10244 10.27.70.185 26 10.27.70.77 26 10.24.237.150 10 10.81.83.121 7 10.29.148.127 4 10.80.101.244 3 10.27.4.125 2 127.0.0.1 2 10.80.112.35 2 10.27.87.167 2 10.25.0.162 1 10.80.112.23 1 106.11.248.20
加载当前的堆信息:
**jmap -dump:live,format=b,file=dump.hprof 5511 **
通过 Eclipse IDE MAT 插件进行堆分析
Eclipse 导入堆信息
查看当前堆内存泄漏情况
堆的对象泄漏可以分析出,当前堆中存在大量的:”io.netty.util.internal.InternalThreadLocalMap“这个是netty的内部线程
查看当前堆中占用最大的对象
通过对堆的对象分析,可以发现,当前系统中大量存在连接池对象,从而定位到问题所在,连接池出现了泄漏。导致系统大量创建连接池,而没有释放。出现了内存泄漏
可以分析出,当前系统对外建立了大量的连接,而且连接没有释放,导致系统开启的线程数过多。通过分析连接最多的ip,定位到出现问题的节点
"elasticsearch[_client_][transport_client_boss][T#4]" #166 daemon prio=5 os_prio=0 tid=0x00007f8c9819e800 nid=0x20e3 runnable [0x00007f8c7403d000] - locked <0x0000000674995e98> (a sun.nio.ch.Util$3) - locked <0x0000000674995e80> (a java.util.Collections$UnmodifiableSet) - locked <0x0000000674986f58> (a sun.nio.ch.EPollSelectorImpl) at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:753) at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:886)
到此,相信大家对“如何排查CPU过载过高的问题”有了更深的了解,不妨来实际操作一番吧!这里是创新互联网站,更多相关内容可以进入相关频道进行查询,关注我们,继续学习!
文章名称:如何排查CPU过载过高的问题
分享网址:http://pcwzsj.com/article/gdipoo.html