FW: Forwarding issues related to MACs starting with a 4 or a 6 (Was: [c-nsp] Wierd MPLS/VPLS issue)

2 Dec 2016

      Alloah,

TL;DR: Cisco Nexus 92160 switches kunnen packets met bepaalde specifieke
payloads niet forwarden. Er komt geen software fix, de ASIC is niet goed
gedesigned.

Stel je hebt twee routers aan elkaar geknoopt via een VLAN over zo'n
Nexus 9000 switch, en je doet VPLS tussen deze routers, dan kan het zijn
dat packets waarbij de payload een ethernet frame binnen die VPLS
instance is, waarbij het destination MAC begint met een 4 of een 6,
gedropped worden.

Als je precies weet wat voor payloads je stuurt over deze switches dan
kun je er wel omheen werken (in een enterprise omgeving bijvoorbeeld),
maar als je als service provider VLANs van A naar B verkoopt dan weet je
natuurlijk niet wat de klant over het circuit gaat sturen, en dan kan
dit flink bijten.

Met vriendelijke groeten,

Job

----- Forwarded message from Job Snijders <job@instituut.net> -----

Date: Fri, 2 Dec 2016 15:32:13 +0100
From: Job Snijders <job@instituut.net>
To: nanog@nanog.org
Subject: Forwarding issues related to MACs starting with a 4 or a 6  (Was: [c-nsp] Wierd MPLS/VPLS
	issue)

Hi all,

Ever since the IEEE started allocating OUIs (MAC address ranges) in a
randomly distributed fashion rather then sequentially, the operator
community has suffered enormously.

Time after time issues pop up related to MAC addresses that start with a
4 or a 6. I believe IEEE changed their strategy to attempt to
purposefully higher the chance of collisions with MAC squatters, to
encourage people to register and pay the fee. 

The forwarded email at the bottom is yet another example of a widely
deployed, but fundamentally broken ASIC. The switch can't forward VPLS
frames which contain a payload where the inner packet is destined to a
MAC starting with a 4 or a 6. This is with the switch operating in pure
layer-2 mode, it doesn't know what MPLS or VPLS even are. The switch is
dropping packets on the floor, based on their _payload_. Try selling
such circuits to customers "discounted layer-2 service, some flows might
not be forwarded".

Had IEEE continued the sequential OUI allocations, it probably would've
taken many years before we ever reached MACs starting with a 4 or a 6,
but instead, in 2012 the first linecards started rolling out of
factories with MACs burned in which start with a 4 or a 6, and this took
some vendors by surpise.

There have been quite some issues, both in hardware and software:

Brocade produced a 24x10GE linecard to the market in 2013/2014, with
limited FIB scale, meant for a BGP-free MPLS core, but the card can't
keep flows together on LACP bundles if the inner packets in a pseudowire
were destined for a 4 or 6 MAC. The result: out of order delivery,
hurting performance.

Cisco ASR 9k's had a bug where if a payload started with a 6, it assumed
it would be an IPv6 packet, compare the calculated packet-length with
the packet-length in the packet and obviously fail because an ethernet
packet is not an IPv6 packet. The result: packets dropped on the floor.
(Fixed in 4.3(0.32)I)

The Nexus 9000 issue described at the top of this mail. Brocade IronWare
had an issue related to packet reordering for flows inside pseudowires,
fixed in 2013/2014. There are probably many more examples out there in
the wild, slowly driving operators insane.

At this moment, some issues related to MACs starting with a 4 or a 6 can
be mitigated if you enable Pseudowire Control-Word (RFC 4385) _AND_
Flow-Aware Transport (RFC 6391). You need both to mitigate certain issues
in multi-vendor networks (for instance if you have Cisco edge + Juniper
core). But what to do when the ASIC won't forward the payload? As ISP
you often don't control the payload.

Unfortunatly, I don't think we've seen the end of this. The linecards
bought in 2012 will trickle down to the grey/second-hand market about
now, often without accompanying support contracts. In a world with
increased complexity in our interconnectedness, and lack of visibility
into the underlaying infrastructure (think remote peering, cloud
connectivity, resellers reselling layer-2) it will hurt when some
flows inexplicably fail to arrive.

Dear IEEE, please pause assigning MAC addresses that start with a 4 or a
6 for the next 6 years. Or at least, next time you change the policy,
consult the operational community. This 4/6 MAC issue was well
documented in BCP128 back in 2007. The control-word drafts mentioned
that there would be dragons related to 4 and 6 back in 2004.

Dear Vendors, take this issue more serious. Realise that for operators
these issues are _extremely_ hard to debug, this is an expensive time
sink. Some of these issues are only visible under very specific, rare
circumstances, much like chasing phantoms. So take every vague report of
"mysterious" packetloss, or packet reordering at face value and
immediately dispatch smart people to delve into whether your software or
hardware makes wrong assumptions based on encountering a 4 or a 6
somewhere in the frame. 

And you, my fellow operators, please continue to publicly document these
issues and possible workarounds.

Kind regards,

Job

resources:

c-nsp thread "Wierd MPLS/VPLS issue": https://puck.nether.net/pipermail/cisco-nsp/2016-December/thread.html
https://www.nanog.org/meetings/nanog57/presentations/Tuesday/tues.general.Sn...
BCP128: https://tools.ietf.org/html/bcp128

----- Forwarded message from Simon Lockhart <simon@slimey.org> -----

Date: Fri, 2 Dec 2016 11:44:21 +0000
From: Simon Lockhart <simon@slimey.org>
To: cisco-nsp@puck.nether.net
Subject: Re: [c-nsp] Wierd MPLS/VPLS issue

On Wed Nov 23, 2016 at 12:01:20PM +0000, Simon Lockhart wrote:
...
On Fri Nov 04, 2016 at 03:40:05PM +0000, Simon Lockhart wrote:
...
To me, everything *looks* right, it's just that some VPLS traffic traversing
the new link gets lost.
For those who are interested...
Well, I finally got to the bottom of this, and have pushed it to Cisco TAC
for a fix...
Cisco TAC finally accepted the issue. Bug CSCvc33783 has been logged.
Nexus BU has investigated.

Response is...

"[...] unfortunately this is an ASIC limitation on the Nexus 9000
switches and is therefore not fixable."

If you want a Layer 2 switch that will forward all valid Ethernet
frames, I'd suggest avoiding the Nexus 9000 range...

Simon
_______________________________________________
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

----- End forwarded message -----

----- End forwarded message -----

Job Snijders

Robert Heuvel

job＠ntt.net

tags

participants (3)