-
Bug
-
Resolution: Unresolved
-
High
-
None
we have sporadic failures in our netvirt 3node (aka clustering) suites where openstack
instances go to error state. the reason is shown as "Failed to allocate network(s)".
In this example, this is happening after all three nodes have been stopped and started.
The cluster is showing as up and operational, in that syncstatus is True and every shard
has a proper Leader and Followers. Aprox 10m after the nodes are started these
instances are booted that end up in error state.
There is one ODL karaf.log with some of these messages:
2018-09-16T10:25:39,274 | ERROR | nioEventLoopGroup-7-1 | WebSocketServerHandler | 336 - org.opendaylight.netconf.restconf-nb-bierman02 - 1.7.4.SNAPSHOT | Listener for stream with name 'data-change-event-subscription/neutron:neutron/neutron:ports/datastore=OPERATIONAL/scope=SUBTREE' was not found.
In the nova log we can see some operation has timed out, so maybe that is because
of the failing websocket:
WARNING nova.virt.libvirt.driver [[01;36mNone req-a139053f-fabc-4c64-b658-0e73c1e4ecc5 [00;36madmin admin] [01;35m[instance: e115f123-25e9-4e6c-80be-347564d75af1] Timeout waiting for [('network-vif-plugged', u'8d8e28ca-1b96-41f9-8d12-6b041e5300e9')] for instance with vm_state building and task_state spawning.[00m: Timeout: 300 seconds
The other two ODL nodes do not seem to have this websocket error, but that may
only be because just the one ODL is being hit with the requests (via haproxy). Or it
may be that it's just not broken on the other two nodes.
- relates to
-
NETVIRT-1461 port create times out, instances go to error state
- Resolved