Uploaded image for project: 'netvirt'
  1. netvirt
  2. NETVIRT-1461

port create times out, instances go to error state

XMLWordPrintable

      we have sporadic failures in our 3node (aka clustered) csit jobs for netvirt where
      some openstack instances go to error state instead of active. There seems to be
      multiple reasons and another issue similar to this is NETVIRT-1460.

      For this one, it seems that after taking down one node (so two active nodes) there
      is some communication problem with networking_odl and odl. This trace is seen in
      the [ neutron-server log | https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/netvirt-csit-3node-0cmb-1ctl-2cmp-openstack-queens-upstream-stateful-oxygen/66/control_1/oslogs/neutron-server.log.gz] :

      ERROR networking_odl.common.client [[01;36mNone req-9b0893d1-4602-4008-8926-76975987b9a2 [00;36mNone None] [01;35mREST request ( post ) to url ( ports ) is failed. Request body : [{u'port': {'port_security_enabled': True, 'binding:host_id': '', 'name': '', 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id': u'1c164d7f-7231-4cef-a967-c02a79b7aacc', 'tenant_id': u'4c6d58bff60d4f21a95a606edf297214', 'binding:vif_details': {}, 'binding:vnic_type': 'normal', 'binding:vif_type': 'unbound', 'device_owner': '', 'mac_address': 'fa:16:3e:b2:ed:3f', 'binding:profile': '{}', 'project_id': u'4c6d58bff60d4f21a95a606edf297214', 'fixed_ips': [{'subnet_id': u'8131ce39-4062-4cb5-96eb-e5095b48da21', 'ip_address': u'26.0.0.4'}], 'id': '842361ae-3077-4c64-a71a-6a0704723a2e', 'security_groups': [{'id': u'79a43166-0365-4085-be3a-45efc8c4fd6e'}], 'device_id': u'3a2df104-7865-4399-8b52-89fba5c34208'}}] service[00m: ReadTimeout: HTTPConnectionPool(host='10.30.170.112', port=8181): Read timed out. (read timeout=10)
      ERROR networking_odl.journal.journal [[01;36mNone req-9b0893d1-4602-4008-8926-76975987b9a2 [00;36mNone None] [01;35mError while processing (Entry ID: 535) - create port 842361ae-3077-4c64-a71a-6a0704723a2e (Time stamp: 63672772711.3)[00m: ReadTimeout: HTTPConnectionPool(host='10.30.170.112', port=8181): Read timed out. (read timeout=10)
      ERROR networking_odl.journal.journal [01;35m[00mTraceback (most recent call last):
      ERROR networking_odl.journal.journal [01;35m[00m  File "/opt/stack/networking-odl/networking_odl/journal/journal.py", line 284, in _sync_entry
      ERROR networking_odl.journal.journal [01;35m[00m    self.client.sendjson(method, urlpath, to_send)
      ERROR networking_odl.journal.journal [01;35m[00m  File "/opt/stack/networking-odl/networking_odl/common/client.py", line 106, in sendjson
      ERROR networking_odl.journal.journal [01;35m[00m    'body': obj})
      ERROR networking_odl.journal.journal [01;35m[00m  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
      ERROR networking_odl.journal.journal [01;35m[00m    self.force_reraise()
      ERROR networking_odl.journal.journal [01;35m[00m  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
      ERROR networking_odl.journal.journal [01;35m[00m    six.reraise(self.type_, self.value, self.tb)
      ERROR networking_odl.journal.journal [01;35m[00m  File "/opt/stack/networking-odl/networking_odl/common/client.py", line 98, in sendjson
      ERROR networking_odl.journal.journal [01;35m[00m    self.request(method, urlpath, data))
      ERROR networking_odl.journal.journal [01;35m[00m  File "/opt/stack/networking-odl/networking_odl/common/client.py", line 91, in request
      ERROR networking_odl.journal.journal [01;35m[00m    method, url=url, headers=headers, data=data, timeout=self.timeout)
      ERROR networking_odl.journal.journal [01;35m[00m  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
      ERROR networking_odl.journal.journal [01;35m[00m    resp = self.send(prep, **send_kwargs)
      ERROR networking_odl.journal.journal [01;35m[00m  File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
      ERROR networking_odl.journal.journal [01;35m[00m    r = adapter.send(request, **kwargs)
      ERROR networking_odl.journal.journal [01;35m[00m  File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 521, in send
      ERROR networking_odl.journal.journal [01;35m[00m    raise ReadTimeout(e, request=request)
      ERROR networking_odl.journal.journal [01;35m[00mReadTimeout: HTTPConnectionPool(host='10.30.170.112', port=8181): Read timed out. (read timeout=10)
      ERROR networking_odl.journal.journal [01;35m[00m
      

      robot log showing the instance state

      ODL3 is the default shard leader after ODL2 was downed, so I'm assuming ODL3
      would be handling the transactions/requests. ODL3 karaf log
      there are plenty of exceptions in the log, but I'm not sure which are going to be the
      type we are ok with or not. There are some other logs mentioning "Job still failed on
      retry" which seems serious.

            skitt Stephen Kitt
            jluhrsen Jamo Luhrsen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: