[NETVIRT-547] Too many open files while booting VMs in 200 nodes scale setup Created: 19/Mar/17  Updated: 06/Apr/18  Resolved: 06/Apr/18

Status: Resolved
Project: netvirt
Component/s: General
Affects Version/s: Carbon
Fix Version/s: None

Type: Bug Priority: Medium
Reporter: Guy Sela Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Attachments: File odl1logs.tar.gz     File odl2logs.tar.gz     File odl3logs.tar.gz    
External issue ID: 8017

 Description   

Running on Carbon code from 16/3/2017, I was performing scale testing on a 3-node cluster. I had an openstack with 200 connected compute nodes (200 OVSs) and I was booting VMs. I tried to reach 200 VMs. Somewhere in the process of booting the VMs, the ODLs started suffering from "Too many open files" Exceptions. These can be seen best in ODL1 logs in the karaf.log.1.

From 20:38:37 - 22:01:10, the only thing this ODL is doing is snapshots of the datastore, as a result of actions the other ODLs are performing. Because the snapshot mechanism was just recently replaced as a result of this bug: https://bugs.opendaylight.org/show_bug.cgi?id=7521, I suspect this is related.

At 22:06:14 it starts suffering from:
2017-03-16 22:06:14,840 | WARN | entLoopGroup-4-1 | DefaultChannelPipeline | 141 - io.netty.common - 4.1.8.Final | An exceptionCaught() event was fired, and it reached at the tail of the pipeline. It usually means the last handler in the pipeline did not handle the exception.
io.netty.channel.unix.Errors$NativeIoException: accept() failed: Too many open files
at io.netty.channel.unix.Errors.newIOException(Errors.java:117)[147:io.netty.transport-native-epoll:4.1.8.Final]
at io.netty.channel.unix.Socket.accept(Socket.java:263)[147:io.netty.transport-native-epoll:4.1.8.Final]
at io.netty.channel.epoll.AbstractEpollServerChannel$EpollServerSocketUnsafe.epollInReady(AbstractEpollServerChannel.java:129)[147:io.netty.transport-native-epoll:4.1.8.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394)[147:io.netty.transport-native-epoll:4.1.8.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:299)[147:io.netty.transport-native-epoll:4.1.8.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)[141:io.netty.common:4.1.8.Final]
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)[141:io.netty.common:4.1.8.Final]
at java.lang.Thread.run(Thread.java:745)[:1.8.0_121]



 Comments   
Comment by Guy Sela [ 19/Mar/17 ]

Attachment odl1logs.tar.gz has been added with description: Odl1 Logs

Comment by Guy Sela [ 19/Mar/17 ]

Attachment odl2logs.tar.gz has been added with description: ODL2 Logs

Comment by Guy Sela [ 19/Mar/17 ]

Attachment odl3logs.tar.gz has been added with description: ODL3 Logs

Comment by Guy Sela [ 19/Mar/17 ]

[opensdn@c6-bl-2 ~]$ cat /proc/sys/fs/file-max
13092633
[opensdn@c6-bl-2 ~]$ ulimit -Hn
4096
[opensdn@c6-bl-2 ~]$ ulimit -Sn
1024

[opensdn@c6-bl-2 ~]$ cat /proc/43902/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 514507 514507 processes
Max open files 102400 102400 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 514507 514507 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

Generated at Wed Feb 07 20:21:51 UTC 2024 using Jira 8.20.10#820010-sha1:ace47f9899e9ee25d7157d59aa17ab06aee30d3d.