Uploaded image for project: 'netconf'
  1. netconf
  2. NETCONF-724

Making NETCONF chunk aggregator more robust

XMLWordPrintable

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Medium Medium
    • 8.0.0
    • None
    • netconf
    • None

      Currently, if an error occurs during composition of NETCONF chunks into NETCONF RPC replies (for example, written number of chunk bytes is not correct), some exception is thrown. However, this exception is never handled on NETCONF layer using NETTY's exceptionCaught(..) handler method (for example, in the last processing bit of NETCONF channel pipeline - AbstractNetconfSession or derived class). It results in:

      • Stalled NETCONF channel - unhandled exception in NETTY channel pipeline is just logged in DefaultChannelPipeline.onUnhandledInboundException(..) - but the state of NetconfChunkAggregator and possibly other components stays corrupted.
      • Client's futures in the transaction (from mountpoint DOMDataBroker) are not completed until they are dropped by timeout or underlay session is dropped.

      Possible solutions:

      • Just invalidate all requests in the queue of NetconfDeviceCommunicator on caught exception and reset state of the handlers across channel pipeline - this is not nice solution, because we could also drop futures that didn't cause issue (isolation between requests is weakened).
      • Try to look for message-id of invalid reply on caught exception - if it can be found, then invalidate only request with this message-id. Then we can ignore other bytes that belong to corrupted NETCONF reply and continue processing of next chunks only from the next valid chunk. This is more complex solution but cleaner.

      Motivation:

      IOS XR 6.1.2/6.6.x seems to be buggy - after sending get-config request on path to Operational datastore it sometimes inserts RPC errors into response but these RPC errors are not counted in chunk size. This happens more often when get RPC is invoked on data that has a lot of lines (measured in thousands). For example:

      <get>
      <filter xmlns:ns0="urn:ietf:params:xml:ns:netconf:base:1.0" ns0:type="subtree">
      <cfg-hist-gl xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-config-cfgmgr-exec-oper">
      <record-type>
      <record-type>commit</record-type>
      </record-type>
      </cfg-hist-gl>
      </filter>
      </get>

      RPC error is inserted randomly between chunks:

      ...
      <rpc-error><error-type>transport</error-type>
       <error-tag>resource-denied</error-tag><error-message>transport
       throttling error</error-message></rpc-error>
       #462
       <record>
       <record>1395</record>
       <timestamp>1578037903</timestamp>
      ...

      And error message from NetconfChunkAggregator:

      Got byte 60 while waiting for 10

            jaro0149odl Jaroslav Tóth
            jaro0149odl Jaroslav Tóth
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: