FW: Forwarding issues related to MACs starting with a 4 or a 6 (Was: [c-nsp] Wierd MPLS/VPLS issue)
Alloah, TL;DR: Cisco Nexus 92160 switches kunnen packets met bepaalde specifieke payloads niet forwarden. Er komt geen software fix, de ASIC is niet goed gedesigned. Stel je hebt twee routers aan elkaar geknoopt via een VLAN over zo'n Nexus 9000 switch, en je doet VPLS tussen deze routers, dan kan het zijn dat packets waarbij de payload een ethernet frame binnen die VPLS instance is, waarbij het destination MAC begint met een 4 of een 6, gedropped worden. Als je precies weet wat voor payloads je stuurt over deze switches dan kun je er wel omheen werken (in een enterprise omgeving bijvoorbeeld), maar als je als service provider VLANs van A naar B verkoopt dan weet je natuurlijk niet wat de klant over het circuit gaat sturen, en dan kan dit flink bijten. Met vriendelijke groeten, Job ----- Forwarded message from Job Snijders <job@instituut.net> ----- Date: Fri, 2 Dec 2016 15:32:13 +0100 From: Job Snijders <job@instituut.net> To: nanog@nanog.org Subject: Forwarding issues related to MACs starting with a 4 or a 6 (Was: [c-nsp] Wierd MPLS/VPLS issue) Hi all, Ever since the IEEE started allocating OUIs (MAC address ranges) in a randomly distributed fashion rather then sequentially, the operator community has suffered enormously. Time after time issues pop up related to MAC addresses that start with a 4 or a 6. I believe IEEE changed their strategy to attempt to purposefully higher the chance of collisions with MAC squatters, to encourage people to register and pay the fee. The forwarded email at the bottom is yet another example of a widely deployed, but fundamentally broken ASIC. The switch can't forward VPLS frames which contain a payload where the inner packet is destined to a MAC starting with a 4 or a 6. This is with the switch operating in pure layer-2 mode, it doesn't know what MPLS or VPLS even are. The switch is dropping packets on the floor, based on their _payload_. Try selling such circuits to customers "discounted layer-2 service, some flows might not be forwarded". Had IEEE continued the sequential OUI allocations, it probably would've taken many years before we ever reached MACs starting with a 4 or a 6, but instead, in 2012 the first linecards started rolling out of factories with MACs burned in which start with a 4 or a 6, and this took some vendors by surpise. There have been quite some issues, both in hardware and software: Brocade produced a 24x10GE linecard to the market in 2013/2014, with limited FIB scale, meant for a BGP-free MPLS core, but the card can't keep flows together on LACP bundles if the inner packets in a pseudowire were destined for a 4 or 6 MAC. The result: out of order delivery, hurting performance. Cisco ASR 9k's had a bug where if a payload started with a 6, it assumed it would be an IPv6 packet, compare the calculated packet-length with the packet-length in the packet and obviously fail because an ethernet packet is not an IPv6 packet. The result: packets dropped on the floor. (Fixed in 4.3(0.32)I) The Nexus 9000 issue described at the top of this mail. Brocade IronWare had an issue related to packet reordering for flows inside pseudowires, fixed in 2013/2014. There are probably many more examples out there in the wild, slowly driving operators insane. At this moment, some issues related to MACs starting with a 4 or a 6 can be mitigated if you enable Pseudowire Control-Word (RFC 4385) _AND_ Flow-Aware Transport (RFC 6391). You need both to mitigate certain issues in multi-vendor networks (for instance if you have Cisco edge + Juniper core). But what to do when the ASIC won't forward the payload? As ISP you often don't control the payload. Unfortunatly, I don't think we've seen the end of this. The linecards bought in 2012 will trickle down to the grey/second-hand market about now, often without accompanying support contracts. In a world with increased complexity in our interconnectedness, and lack of visibility into the underlaying infrastructure (think remote peering, cloud connectivity, resellers reselling layer-2) it will hurt when some flows inexplicably fail to arrive. Dear IEEE, please pause assigning MAC addresses that start with a 4 or a 6 for the next 6 years. Or at least, next time you change the policy, consult the operational community. This 4/6 MAC issue was well documented in BCP128 back in 2007. The control-word drafts mentioned that there would be dragons related to 4 and 6 back in 2004. Dear Vendors, take this issue more serious. Realise that for operators these issues are _extremely_ hard to debug, this is an expensive time sink. Some of these issues are only visible under very specific, rare circumstances, much like chasing phantoms. So take every vague report of "mysterious" packetloss, or packet reordering at face value and immediately dispatch smart people to delve into whether your software or hardware makes wrong assumptions based on encountering a 4 or a 6 somewhere in the frame. And you, my fellow operators, please continue to publicly document these issues and possible workarounds. Kind regards, Job resources: c-nsp thread "Wierd MPLS/VPLS issue": https://puck.nether.net/pipermail/cisco-nsp/2016-December/thread.html https://www.nanog.org/meetings/nanog57/presentations/Tuesday/tues.general.Sn... BCP128: https://tools.ietf.org/html/bcp128 ----- Forwarded message from Simon Lockhart <simon@slimey.org> ----- Date: Fri, 2 Dec 2016 11:44:21 +0000 From: Simon Lockhart <simon@slimey.org> To: cisco-nsp@puck.nether.net Subject: Re: [c-nsp] Wierd MPLS/VPLS issue On Wed Nov 23, 2016 at 12:01:20PM +0000, Simon Lockhart wrote:
On Fri Nov 04, 2016 at 03:40:05PM +0000, Simon Lockhart wrote:
To me, everything *looks* right, it's just that some VPLS traffic traversing the new link gets lost.
For those who are interested...
Well, I finally got to the bottom of this, and have pushed it to Cisco TAC for a fix...
Cisco TAC finally accepted the issue. Bug CSCvc33783 has been logged. Nexus BU has investigated. Response is... "[...] unfortunately this is an ASIC limitation on the Nexus 9000 switches and is therefore not fixable." If you want a Layer 2 switch that will forward all valid Ethernet frames, I'd suggest avoiding the Nexus 9000 range... Simon _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/ ----- End forwarded message ----- ----- End forwarded message -----
Ola, Wij hebben hetzelfde probleem vastgesteld op een Arista 7150S-52 met EOS 4.12.3.1 (transient MPLS traffic via Layer2). Arista EOS 4.16.9M heeft dit voor ons opgelost. MACs die niet getransporteerd werden: 4X:XX:XX:XX:XX:XX 6X:XX:XX:XX:XX:XX X4:XX:XX:XX:XX:XX X6:XX:XX:XX:XX:XX XX:XX:XB:XX:6X:XX XX:XX:XF:XX:4X:XX Misschien zijn er nog meer geweest, maar zijn nog niet gevonden… Dank gaat uit naar: Richard van Looijen van Flowmailer voor het vaststellen van het probleem. Edwin Kalle van 2hip voor het wijzen op onderstaande tread. En natuurlijk Job Snijders voor zijn mail van vrijdagavond, waardoor voor ons alles in 1 keer op zijn plaats viel en we de tussenliggende switch gingen bekijken… Mvg, Robert Heuvel atom86 On 02/12/2016, 21:45, "NLNOG on behalf of Job Snijders" <nlnog-bounces@nlnog.net on behalf of job@instituut.net> wrote: Alloah, TL;DR: Cisco Nexus 92160 switches kunnen packets met bepaalde specifieke payloads niet forwarden. Er komt geen software fix, de ASIC is niet goed gedesigned. Stel je hebt twee routers aan elkaar geknoopt via een VLAN over zo'n Nexus 9000 switch, en je doet VPLS tussen deze routers, dan kan het zijn dat packets waarbij de payload een ethernet frame binnen die VPLS instance is, waarbij het destination MAC begint met een 4 of een 6, gedropped worden. Als je precies weet wat voor payloads je stuurt over deze switches dan kun je er wel omheen werken (in een enterprise omgeving bijvoorbeeld), maar als je als service provider VLANs van A naar B verkoopt dan weet je natuurlijk niet wat de klant over het circuit gaat sturen, en dan kan dit flink bijten. Met vriendelijke groeten, Job ----- Forwarded message from Job Snijders <job@instituut.net> ----- Date: Fri, 2 Dec 2016 15:32:13 +0100 From: Job Snijders <job@instituut.net> To: nanog@nanog.org Subject: Forwarding issues related to MACs starting with a 4 or a 6 (Was: [c-nsp] Wierd MPLS/VPLS issue) Hi all, Ever since the IEEE started allocating OUIs (MAC address ranges) in a randomly distributed fashion rather then sequentially, the operator community has suffered enormously. Time after time issues pop up related to MAC addresses that start with a 4 or a 6. I believe IEEE changed their strategy to attempt to purposefully higher the chance of collisions with MAC squatters, to encourage people to register and pay the fee. The forwarded email at the bottom is yet another example of a widely deployed, but fundamentally broken ASIC. The switch can't forward VPLS frames which contain a payload where the inner packet is destined to a MAC starting with a 4 or a 6. This is with the switch operating in pure layer-2 mode, it doesn't know what MPLS or VPLS even are. The switch is dropping packets on the floor, based on their _payload_. Try selling such circuits to customers "discounted layer-2 service, some flows might not be forwarded". Had IEEE continued the sequential OUI allocations, it probably would've taken many years before we ever reached MACs starting with a 4 or a 6, but instead, in 2012 the first linecards started rolling out of factories with MACs burned in which start with a 4 or a 6, and this took some vendors by surpise. There have been quite some issues, both in hardware and software: Brocade produced a 24x10GE linecard to the market in 2013/2014, with limited FIB scale, meant for a BGP-free MPLS core, but the card can't keep flows together on LACP bundles if the inner packets in a pseudowire were destined for a 4 or 6 MAC. The result: out of order delivery, hurting performance. Cisco ASR 9k's had a bug where if a payload started with a 6, it assumed it would be an IPv6 packet, compare the calculated packet-length with the packet-length in the packet and obviously fail because an ethernet packet is not an IPv6 packet. The result: packets dropped on the floor. (Fixed in 4.3(0.32)I) The Nexus 9000 issue described at the top of this mail. Brocade IronWare had an issue related to packet reordering for flows inside pseudowires, fixed in 2013/2014. There are probably many more examples out there in the wild, slowly driving operators insane. At this moment, some issues related to MACs starting with a 4 or a 6 can be mitigated if you enable Pseudowire Control-Word (RFC 4385) _AND_ Flow-Aware Transport (RFC 6391). You need both to mitigate certain issues in multi-vendor networks (for instance if you have Cisco edge + Juniper core). But what to do when the ASIC won't forward the payload? As ISP you often don't control the payload. Unfortunatly, I don't think we've seen the end of this. The linecards bought in 2012 will trickle down to the grey/second-hand market about now, often without accompanying support contracts. In a world with increased complexity in our interconnectedness, and lack of visibility into the underlaying infrastructure (think remote peering, cloud connectivity, resellers reselling layer-2) it will hurt when some flows inexplicably fail to arrive. Dear IEEE, please pause assigning MAC addresses that start with a 4 or a 6 for the next 6 years. Or at least, next time you change the policy, consult the operational community. This 4/6 MAC issue was well documented in BCP128 back in 2007. The control-word drafts mentioned that there would be dragons related to 4 and 6 back in 2004. Dear Vendors, take this issue more serious. Realise that for operators these issues are _extremely_ hard to debug, this is an expensive time sink. Some of these issues are only visible under very specific, rare circumstances, much like chasing phantoms. So take every vague report of "mysterious" packetloss, or packet reordering at face value and immediately dispatch smart people to delve into whether your software or hardware makes wrong assumptions based on encountering a 4 or a 6 somewhere in the frame. And you, my fellow operators, please continue to publicly document these issues and possible workarounds. Kind regards, Job resources: c-nsp thread "Wierd MPLS/VPLS issue": https://puck.nether.net/pipermail/cisco-nsp/2016-December/thread.html https://www.nanog.org/meetings/nanog57/presentations/Tuesday/tues.general.Sn... BCP128: https://tools.ietf.org/html/bcp128 ----- Forwarded message from Simon Lockhart <simon@slimey.org> ----- Date: Fri, 2 Dec 2016 11:44:21 +0000 From: Simon Lockhart <simon@slimey.org> To: cisco-nsp@puck.nether.net Subject: Re: [c-nsp] Wierd MPLS/VPLS issue On Wed Nov 23, 2016 at 12:01:20PM +0000, Simon Lockhart wrote: > On Fri Nov 04, 2016 at 03:40:05PM +0000, Simon Lockhart wrote: > > To me, everything *looks* right, it's just that some VPLS traffic traversing > > the new link gets lost. > > For those who are interested... > > Well, I finally got to the bottom of this, and have pushed it to Cisco TAC > for a fix... Cisco TAC finally accepted the issue. Bug CSCvc33783 has been logged. Nexus BU has investigated. Response is... "[...] unfortunately this is an ASIC limitation on the Nexus 9000 switches and is therefore not fixable." If you want a Layer 2 switch that will forward all valid Ethernet frames, I'd suggest avoiding the Nexus 9000 range... Simon _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/ ----- End forwarded message ----- ----- End forwarded message ----- _______________________________________________ NLNOG mailing list NLNOG@nlnog.net http://mailman.nlnog.net/listinfo/nlnog
Bedankt voor het delen Robert! Vooral de laatste vier entries in de lijst zijn heel bijzonder. :-) On 5 Dec 2016, 18:28 +0100, Robert Heuvel <RHeuvel@atom86.net>, wrote:
Ola,
Wij hebben hetzelfde probleem vastgesteld op een Arista 7150S-52 met EOS 4.12.3.1 (transient MPLS traffic via Layer2). Arista EOS 4.16.9M heeft dit voor ons opgelost.
MACs die niet getransporteerd werden: 4X:XX:XX:XX:XX:XX 6X:XX:XX:XX:XX:XX X4:XX:XX:XX:XX:XX X6:XX:XX:XX:XX:XX XX:XX:XB:XX:6X:XX XX:XX:XF:XX:4X:XX
Misschien zijn er nog meer geweest, maar zijn nog niet gevonden…
Dank gaat uit naar: Richard van Looijen van Flowmailer voor het vaststellen van het probleem. Edwin Kalle van 2hip voor het wijzen op onderstaande tread. En natuurlijk Job Snijders voor zijn mail van vrijdagavond, waardoor voor ons alles in 1 keer op zijn plaats viel en we de tussenliggende switch gingen bekijken…
Mvg, Robert Heuvel atom86
On 02/12/2016, 21:45, "NLNOG on behalf of Job Snijders" <nlnog-bounces@nlnog.net on behalf of job@instituut.net> wrote:
Alloah,
TL;DR: Cisco Nexus 92160 switches kunnen packets met bepaalde specifieke payloads niet forwarden. Er komt geen software fix, de ASIC is niet goed gedesigned.
Stel je hebt twee routers aan elkaar geknoopt via een VLAN over zo'n Nexus 9000 switch, en je doet VPLS tussen deze routers, dan kan het zijn dat packets waarbij de payload een ethernet frame binnen die VPLS instance is, waarbij het destination MAC begint met een 4 of een 6, gedropped worden.
Als je precies weet wat voor payloads je stuurt over deze switches dan kun je er wel omheen werken (in een enterprise omgeving bijvoorbeeld), maar als je als service provider VLANs van A naar B verkoopt dan weet je natuurlijk niet wat de klant over het circuit gaat sturen, en dan kan dit flink bijten.
Met vriendelijke groeten,
Job
----- Forwarded message from Job Snijders <job@instituut.net> -----
Date: Fri, 2 Dec 2016 15:32:13 +0100 From: Job Snijders <job@instituut.net To: nanog@nanog.org Subject: Forwarding issues related to MACs starting with a 4 or a 6 (Was: [c-nsp] Wierd MPLS/VPLS issue)
Hi all,
Ever since the IEEE started allocating OUIs (MAC address ranges) in a randomly distributed fashion rather then sequentially, the operator community has suffered enormously.
Time after time issues pop up related to MAC addresses that start with a 4 or a 6. I believe IEEE changed their strategy to attempt to purposefully higher the chance of collisions with MAC squatters, to encourage people to register and pay the fee.
The forwarded email at the bottom is yet another example of a widely deployed, but fundamentally broken ASIC. The switch can't forward VPLS frames which contain a payload where the inner packet is destined to a MAC starting with a 4 or a 6. This is with the switch operating in pure layer-2 mode, it doesn't know what MPLS or VPLS even are. The switch is dropping packets on the floor, based on their _payload_. Try selling such circuits to customers "discounted layer-2 service, some flows might not be forwarded".
Had IEEE continued the sequential OUI allocations, it probably would've taken many years before we ever reached MACs starting with a 4 or a 6, but instead, in 2012 the first linecards started rolling out of factories with MACs burned in which start with a 4 or a 6, and this took some vendors by surpise.
There have been quite some issues, both in hardware and software:
Brocade produced a 24x10GE linecard to the market in 2013/2014, with limited FIB scale, meant for a BGP-free MPLS core, but the card can't keep flows together on LACP bundles if the inner packets in a pseudowire were destined for a 4 or 6 MAC. The result: out of order delivery, hurting performance.
Cisco ASR 9k's had a bug where if a payload started with a 6, it assumed it would be an IPv6 packet, compare the calculated packet-length with the packet-length in the packet and obviously fail because an ethernet packet is not an IPv6 packet. The result: packets dropped on the floor. (Fixed in 4.3(0.32)I)
The Nexus 9000 issue described at the top of this mail. Brocade IronWare had an issue related to packet reordering for flows inside pseudowires, fixed in 2013/2014. There are probably many more examples out there in the wild, slowly driving operators insane.
At this moment, some issues related to MACs starting with a 4 or a 6 can be mitigated if you enable Pseudowire Control-Word (RFC 4385) _AND_ Flow-Aware Transport (RFC 6391). You need both to mitigate certain issues in multi-vendor networks (for instance if you have Cisco edge + Juniper core). But what to do when the ASIC won't forward the payload? As ISP you often don't control the payload.
Unfortunatly, I don't think we've seen the end of this. The linecards bought in 2012 will trickle down to the grey/second-hand market about now, often without accompanying support contracts. In a world with increased complexity in our interconnectedness, and lack of visibility into the underlaying infrastructure (think remote peering, cloud connectivity, resellers reselling layer-2) it will hurt when some flows inexplicably fail to arrive.
Dear IEEE, please pause assigning MAC addresses that start with a 4 or a 6 for the next 6 years. Or at least, next time you change the policy, consult the operational community. This 4/6 MAC issue was well documented in BCP128 back in 2007. The control-word drafts mentioned that there would be dragons related to 4 and 6 back in 2004.
Dear Vendors, take this issue more serious. Realise that for operators these issues are _extremely_ hard to debug, this is an expensive time sink. Some of these issues are only visible under very specific, rare circumstances, much like chasing phantoms. So take every vague report of "mysterious" packetloss, or packet reordering at face value and immediately dispatch smart people to delve into whether your software or hardware makes wrong assumptions based on encountering a 4 or a 6 somewhere in the frame.
And you, my fellow operators, please continue to publicly document these issues and possible workarounds.
Kind regards,
Job
resources:
c-nsp thread "Wierd MPLS/VPLS issue": https://puck.nether.net/pipermail/cisco-nsp/2016-December/thread.html https://www.nanog.org/meetings/nanog57/presentations/Tuesday/tues.general.Sn... BCP128: https://tools.ietf.org/html/bcp128
----- Forwarded message from Simon Lockhart <simon@slimey.org> -----
Date: Fri, 2 Dec 2016 11:44:21 +0000 From: Simon Lockhart <simon@slimey.org To: cisco-nsp@puck.nether.net Subject: Re: [c-nsp] Wierd MPLS/VPLS issue
On Wed Nov 23, 2016 at 12:01:20PM +0000, Simon Lockhart wrote:
On Fri Nov 04, 2016 at 03:40:05PM +0000, Simon Lockhart wrote:
To me, everything *looks* right, it's just that some VPLS traffic traversing the new link gets lost.
For those who are interested...
Well, I finally got to the bottom of this, and have pushed it to Cisco TAC for a fix...
Cisco TAC finally accepted the issue. Bug CSCvc33783 has been logged. Nexus BU has investigated.
Response is...
"[...] unfortunately this is an ASIC limitation on the Nexus 9000 switches and is therefore not fixable."
If you want a Layer 2 switch that will forward all valid Ethernet frames, I'd suggest avoiding the Nexus 9000 range...
Simon _______________________________________________ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
----- End forwarded message -----
----- End forwarded message ----- _______________________________________________ NLNOG mailing list NLNOG@nlnog.net http://mailman.nlnog.net/listinfo/nlnog
_______________________________________________ NLNOG mailing list NLNOG@nlnog.net http://mailman.nlnog.net/listinfo/nlnog
participants (3)
-
Job Snijders -
job@ntt.net -
Robert Heuvel