Uploaded image for project: 'netvirt'
  1. netvirt
  2. NETVIRT-1460

websocket failing: causes instance creation failures

XMLWordPrintable

      we have sporadic failures in our netvirt 3node (aka clustering) suites where openstack
      instances go to error state. the reason is shown as "Failed to allocate network(s)".

      In this example, this is happening after all three nodes have been stopped and started.
      The cluster is showing as up and operational, in that syncstatus is True and every shard
      has a proper Leader and Followers. Aprox 10m after the nodes are started these
      instances are booted that end up in error state.

      There is one ODL karaf.log with some of these messages:

      2018-09-16T10:25:39,274 | ERROR | nioEventLoopGroup-7-1 | WebSocketServerHandler           | 336 - org.opendaylight.netconf.restconf-nb-bierman02 - 1.7.4.SNAPSHOT | Listener for stream with name 'data-change-event-subscription/neutron:neutron/neutron:ports/datastore=OPERATIONAL/scope=SUBTREE' was not found.
      

      In the nova log we can see some operation has timed out, so maybe that is because
      of the failing websocket:

      WARNING nova.virt.libvirt.driver [[01;36mNone req-a139053f-fabc-4c64-b658-0e73c1e4ecc5 [00;36madmin admin] [01;35m[instance: e115f123-25e9-4e6c-80be-347564d75af1] Timeout waiting for [('network-vif-plugged', u'8d8e28ca-1b96-41f9-8d12-6b041e5300e9')] for instance with vm_state building and task_state spawning.[00m: Timeout: 300 seconds
      

      The other two ODL nodes do not seem to have this websocket error, but that may
      only be because just the one ODL is being hit with the requests (via haproxy). Or it
      may be that it's just not broken on the other two nodes.

            jluhrsen Jamo Luhrsen
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: