[BGPCEP-494] PCRpt received with bandwidth reoptimization object leads to loop causing OOM Created: 20/Jul/16 Updated: 03/Mar/19 Resolved: 10/Aug/16 |
|
| Status: | Resolved |
| Project: | bgpcep |
| Component/s: | PCEP |
| Affects Version/s: | Bugzilla Migration |
| Fix Version/s: | Bugzilla Migration |
| Type: | Bug | ||
| Reporter: | Ajay L | Assignee: | Ajay L |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Operating System: All |
||
| Attachments: |
|
| External issue ID: | 6242 |
| Description |
|
Issue happens in real network environment. If a PCRpt message is received with bandwidth reoptimization object (ref: https://tools.ietf.org/html/rfc5440#section-7.7) it causes the controller to loop and ultimately results in heap OutOfMemory error. Crafted packet used to repro the issue is attached. Is is not clear at this point why PCC is including bandwidth reoptimization object in PCRpt |
| Comments |
| Comment by Ajay L [ 20/Jul/16 ] |
|
Attachment pcrpt-oom-repro.pcap has been added with description: PCRpt used to reproduce the issue |
| Comment by Al Goddard [ 22/Jul/16 ] |
|
Cisco DE is investigating if/how a 0x5 / 0x2 BW object could be sent from the router, but current XR code is only expected to send 0x5 / 0x5 BW Sample. |
| Comment by Ajay L [ 25/Jul/16 ] |
|
Based on OOM heap dump analysis, controller is receiving PCRpt with objects in below sequence: RFC 5440 (https://tools.ietf.org/html/rfc5440#section-7.7) describes the 2 types of bandwidth objects and specifies under what scenario the reoptimization bandwidth object is used: " o R (Reoptimization - 1 bit): when set, the requesting PCC specifies <request>::= <RP> Stateful PCEP draft (ref: https://tools.ietf.org/html/draft-ietf-pce-stateful-pce-15) introduces PCRpt message and it is defined as below: <PCRpt Message> ::= <Common Header> <state-report-list> ::= <state-report>[<state-report-list>] <state-report> ::= [<SRP>] Where: Now attribute-list in RFC 5440 includes the bandwidth object: <attribute-list>::=[<LSPA>] <metric-list>::=<METRIC>[<metric-list>] So technically the bandwidth reoptimization object can be expected within PCRpt message Proposed fix will make 2 changes: |
| Comment by Ajay L [ 25/Jul/16 ] |
|
(In reply to Al Goddard from comment #1) Thx Al for the update. FYI - in the PCRpt sent by CRS which causes the issue, the bandwidth object has bandwidth value as zero whereas the bandwidth reoptimization object has non-zero bandwidth value |
| Comment by Ajay L [ 25/Jul/16 ] |
|
master: https://git.opendaylight.org/gerrit/42434 |
| Comment by Al Goddard [ 25/Jul/16 ] |
|
(In reply to Ajay L from comment #3) From Cisco PCEP DE: Hi Al, I checked the code, and do not see a way for the PCE report message to ever contain the type 2 BW object. For now, until such a packet is actually seen, I would be disinclined to think that was the trigger. Thanks, |
| Comment by Ajay L [ 25/Jul/16 ] |
|
(In reply to Al Goddard from comment #5) Attaching couple of screenshots from heap dump analysis which show the various objects, including bandwidth and reoptimization bandwidth objects, received from PCRpt (after parsing by ODL code) |
| Comment by Ajay L [ 25/Jul/16 ] |
|
Attachment Processed-pcrpt-objects.png has been added with description: Processed PCRpt objects |
| Comment by Ajay L [ 25/Jul/16 ] |
|
Attachment Pending-pcrpt-objects.png has been added with description: Pending PCRpt objects |
| Comment by Al Goddard [ 26/Jul/16 ] |
|
Additional info/request from Cisco DE: Is this is from the heap dump after the heap went OOM? Am I correct that the 0x5/0x5 BW object is not visible here? If it is possible to correlate this back to an actual message, that would help. |
| Comment by Ajay L [ 26/Jul/16 ] |
|
(In reply to Al Goddard from comment #9) Agree. But analysis so far does not show any issue in ODL parsing logic > 0x5/0x5 BW object? Did u mean 0x5/0x1 or 0x5/0x2? I see both of those objects in the heap dump > Attaching a screenshot showing LSP symbolic name which is "DCCRS2_t7" > Ideally, a procedure to replicate this would be ideal, or was this a one-off? Agree. But I think this has been seen only once so far |
| Comment by Ajay L [ 26/Jul/16 ] |
|
Attachment lsp-symbolic-path-name.png has been added with description: LSP symbolic path-name |
| Comment by Al Goddard [ 27/Jul/16 ] |
|
Can you answer the two questions from Cisco: The image with the knob to change the bandwidth value is building now and should be available in a few hours. Thanks, 1. Regarding this: Agree. But analysis so far does not show any issue in ODL parsing logic My understanding is that it was the parsing of the object that resulted in the eventual looping until OOM state. Was it something else? Can you answer: _______________ 2. Do you send an RP object R-bit=1 in any messages? Spec shows this object as part of at PCRep message, not a PCRpt message. The router implements RFC5440 as well as stateful PCEP drafts, so it can originate and process PCRep messages which do contain the RP object. Can you answer: _______________ 3. Let me know if there are any specific debugs to validate the LSP 7 (shown below) would be sending this object. The ‘dump-messages’ debug can provide such low-level debugging of all messages originating from or arriving on the router, if this can be recreated.. |
| Comment by Ajay L [ 28/Jul/16 ] |
|
(In reply to Al Goddard from comment #12) Processing of objects in PCRpt message caused the loop. I was referring to the fact that issue was in the processing of objects and not in de-serializing or parsing the data received from wire into objects. So we still believe that somehow type=2 bandwidth object was received > The screenshot is from the OOM heap dump seen in ATT setup, not the crafted one. We do not believe PCRep getting interpreted as PCRpt is happening here. Per RFC 5440: "PCRep is a PCEP message sent by a PCE to a requesting PCC in response to a previously received PCReq message.". So PCRep is supposed to be received by the PCC (router in this case, not the controller) > |
| Comment by Milos Fabian [ 28/Jul/16 ] |