Uploaded image for project: 'controller'
  1. controller
  2. CONTROLLER-7

System can go into livelock after forwarding state is established between multiple switches

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Resolution: Won't Do
    • 0.4.0
    • None
    • adsal
    • None
    • Operating System: Linux
      Platform: Other

    • 9

    Description

      Currently there is no liveness capability based on echo/echo_reply to flag a switch as down.. SwitchHandler might be hinted on NIO channel read errors to start a polling interval.

      Procedure:

      Utilize SimpleForwarding sample app to provide learning_bridge capability
      Establish topology of roughly 15 switches leveraging mininet simulator.
      Do a pairwise ping to establish flow-state
      Bounce switches by exiting mininet.

      A number of services continue to try and read from the channel consuming SwitchEvent and queue entries. There are references to these objects which are never freed which keeps them in OldGen space until the heap is exhausted.

      Heap Histogram

      num #instances #bytes class name
      ----------------------------------------------
      1: 18815125 451563000 java.util.concurrent.LinkedBlockingQueue$Node
      2: 18814799 451555176 o.o.c.p.openflow.core.internal.SwitchEvent
      3: 86428 13728920 <constMethodKlass>
      4: 75675 13373088 [C
      5: 86428 11764480 <methodKlass>
      6: 8782 10267656 <constantPoolKlass>
      7: 8782 6374792 <instanceKlassKlass>

      Rapidly the heap grows filling Eden and OldGen space until no new objects can be created.

      Heap GC

      S0 S1 E O P YGC YGCT FGC FGCT GCT
      0.00 100.00 100.00 100.00 99.65 306 7.199 40 55.495 62.694
      0.00 100.00 100.00 100.00 99.65 306 7.199 40 55.495 62.694
      0.00 100.00 100.00 100.00 99.65 306 7.199 40 55.495 62.694
      0.00 100.00 100.00 100.00 99.65 306 7.199 40 55.495 62.694
      0.00 100.00 100.00 100.00 99.62 306 7.199 40 55.495 62.694
      0.00 100.00 100.00 100.00 99.62 306 7.199 40 55.495 62.694
      0.00 0.00 67.58 100.00 99.62 306 7.199 40 58.027 65.226
      0.00 5.77 100.00 100.00 99.62 306 7.199 40 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226
      0.00 100.00 100.00 100.00 99.62 306 7.199 41 58.027 65.226

      A timer must be implemented to clean up switch events and possibly trigger a cleanup of new messages (Statistics, FlowMods) based on a call back to AsynchronousCloseException.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            gaberger@cisco.com Gary Berger
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: