Published 2023-02-19
This article will be comparing MLAG and ESI for EVPN multihoming. Multihoming is the practice of connecting a device to multiple points in the network. For example, a server may connect to two switches for redundancy, maintaining network connectivity in case one of the switches fail. This is better than a server being singlehomed to one point in the network.
With EVPN and VXLAN there are two ways to achieve multihoming. The first is built into the EVPN protocol using Ethernet Segment Identifiers (ESI). By uniquely identifying multihomed ethernet segments, two or more switches can learn that they provide redundant connectivity to the same ethernet segment. This segment can connect to a downstream router, switch or server. The server uplinks are bundled into a Port-Channel (LAG), creating one logical interface.
Another technology is Multi-Chassi Link Aggregation (MLAG) that enable two physical switches to operate as a single logical switch. This technology is usually vendor-proprietary and its implementation details are often a well guarded secret.
The goal of this article is to compare these two technologies to help you decide which one you would prefer when designing your network. Each technology comes with its own benefits and drawbacks; I hope to cover most of them below.
I will be using Arista vEOS images in this article to build the lab topology, so this means that we will be limited to learning about the Arista MLAG implementation. Many vendors have MLAG implementations, Cisco Nexus virtual Port-Channel (vPC) being one example, but I will focus on Arista in this article.
Some quick links if you want to skip ahead:
This is the lab topology that we will be configuring in this article. The left side contain two MLAG-pairs, LE03a/b and LE04a/b. Connected to LE03 we have SRV31 and SRV32. Connected to LE04 are SRV41 and SRV42. We will focus on this part of the topology in the MLAG chapter.
On the right we have three standalone switches (LE05, LE06 and LE07) that provide EVPN Multihoming using the ESI method. Each switch connect to one singlehomed server; SRV51, SRV61 and SRV71, respectively. There are also four multihomed servers, for example SRV561 and SRV562 connected to LE05 and LE06.
In the middle we have two spine switches providing inter-leaf connectivity. Each switch, spine or leaf, has a router-ID in the 10.0.0.XX/32 format where XX is the node ID. This router-ID is configured as an IP-address on the Loopback0 interface and will be used for BGP EVPN adjacencies. LE03a/b and LE04a/b has a shared IP-address configured on Loopback1 with IP-address 10.0.0.3/32 and 10.0.0.4/32, respectively. I will cover why Loopback1 is necessary later in the article. Spines and leaves run OSPF as IGP to advertise their loopback-prefixes.
Since the spine configurations remain unchanged, I display them here. The spines act as BGP Route-Reflectors, reflecting EVPN routes between leaves. The spines do not run any VXLAN features, their main job is to forward packets between leaves as quickly as possible.
service routing protocols model multi-agent!interface Ethernet1 description "LE03a" no switchport ip address 10.1.31.1/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet2 description "LE03b" no switchport ip address 10.1.32.1/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet3 description "LE04a" no switchport ip address 10.1.41.1/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet4 description "LE04b" no switchport ip address 10.1.42.1/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet5 description "LE05" no switchport ip address 10.1.5.1/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet6 description "LE06" no switchport ip address 10.1.6.1/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet7 description "LE07" no switchport ip address 10.1.7.1/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Loopback0 ip address 10.0.0.1/32!ip routing!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN route-reflector-client neighbor EVPN timers 5 15 neighbor EVPN send-community neighbor 10.0.0.5 peer group EVPN neighbor 10.0.0.6 peer group EVPN neighbor 10.0.0.7 peer group EVPN neighbor 10.0.0.31 peer group EVPN neighbor 10.0.0.32 peer group EVPN neighbor 10.0.0.41 peer group EVPN neighbor 10.0.0.42 peer group EVPN ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connected service routing protocols model multi-agent!interface Ethernet1 description "LE03a" no switchport ip address 10.2.31.2/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet2 description "LE03b" no switchport ip address 10.2.32.2/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet3 description "LE04a" no switchport ip address 10.2.41.2/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet4 description "LE04b" no switchport ip address 10.2.42.2/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet5 description "LE05" no switchport ip address 10.2.5.2/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet6 description "LE06" no switchport ip address 10.2.6.2/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet7 description "LE07" no switchport ip address 10.2.7.2/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Loopback0 ip address 10.0.0.2/32!ip routing!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN route-reflector-client neighbor EVPN timers 5 15 neighbor EVPN send-community neighbor 10.0.0.5 peer group EVPN neighbor 10.0.0.6 peer group EVPN neighbor 10.0.0.7 peer group EVPN neighbor 10.0.0.31 peer group EVPN neighbor 10.0.0.32 peer group EVPN neighbor 10.0.0.41 peer group EVPN neighbor 10.0.0.42 peer group EVPN ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connectedThis is the first multihoming solution we will be looking at in this article. The goal of MLAG is to turn two physical switches into one virtual switch. An MLAG can only consist of two switches. They must be the same model and run the same software version. MLAG revolves around generating a virtual System MAC-address, also known as System ID. This system MAC-address is used by many parts of the switches as we will demonstrate below:
# "the physical system ID"LE03a#show versionSystem MAC address: 5001.0000.003a # "the virtual system ID generated by MLAG"LE03a#show mlagstate : Activesystem-id : 5001.0000.3333 LE03a#show spanning-treeMST0 Spanning tree enabled protocol mstp Root ID Priority 32768 Address 5001.0000.3333 This bridge is the root LE03a# show lacp internalLACP System-identifier: 8000,5001.0000.003aMLAG System-identifier: 8000,5001.0000.3333 # "the physical system ID"LE03a#show versionSystem MAC address: 5001.0000.003b # "the virtual system ID generated by MLAG"LE03a#show mlagstate : Activesystem-id : 5001.0000.3333 LE03b#show spanning-treeMST0 Spanning tree enabled protocol mstp Root ID Priority 32768 Address 5001.0000.3333 This bridge is the root LE03a# show lacp internalLACP System-identifier: 8000,5001.0000.003bMLAG System-identifier: 8000,5001.0000.3333The above textbox show that the physical system MAC-address of LE03a is 5001.0000.003a, but the virtual system MAC-address generated by MLAG is 5001.0000.3333. Let's look at some of places where the virtual MAC-address is used:
STP uses the virtual MAC-address as its Bridge ID. As both switches generate the same Bridge ID, they can both be assigned as Root Bridge for the topology. This is fine as MLAG has a proprietary magic sauce to keep the topology loop-free.
This is a useful protocol for negotiating and maintaining Port-Channels as it, among other things, send keepalive messages to verify that every physical link in the LAG is healthy. Another part is establishing who is at the other end of the physical link in the bundle, an important detail with when MLAG is used.
If MLAG was not configured, LE03a and LE03b would send different System IDs in their LACPDUs to the downstream server. This stops the server from enabling all links in its Port-Channel, as connecting to multiple switches could cause a network loop. When MLAG is enabled on the LE03-switches, they send the same System ID in their LACPDUs and the server can confidently enable all links in the bundle.
Since LE01 and LE02 are not using EVPN ESI or MLAG, they cannot provide the same active-active forwarding functionality. Any connected server must use active-passive forwarding where only a single link (in green) is active at a time. As one can imagine, this wastes potential network resources as a server connected with a 10G link to each switch has a potential total bandwidth of 20G, but can only use 10G due to STP blocking all but one link.
All links turn green when MLAG is configured, enabling full link utilization.There are probably more virtual system ID use cases than those I've mentioned here. The goal is to have any device communicating with the LE03a/b switches believe they are talking to one switch, not two.
Note: Any routed interface on the LE03-switches will continue using the physical system MAC-address. This ensures that traffic originated by LE03a is not returned to LE03b. If I created an SVI on LE03a and pinged one of the server IP-addresses, the traffic would be sourced from 5001.0000.003a, ensuring that the return traffic come back to LE03a.
While MLAG on its own is a great technology, we need to take a moment to examine how it operates together with VXLAN. One problem that has to be solved is how to avoid packet duplication. Imagine a server on SW1 sending an ARP broadcast frame. When VXLAN-flooding, SW1 sends one copy to LE03a (10.0.0.31) and one copy to LE03b (10.0.0.32). Both LE03 switches flood their received copy out on their local switchports, causing packet duplication.
To avoid this problem, we configure Anycast IP-address 10.0.0.3/32 on Loopback1 on both LE03 switches. We then alter the VXLAN flood-list on SW1 to include [10.0.0.3] instead of [10.0.0.31, 10.0.0.32]. When SW1 perform its VXLAN-flooding, it sends one copy destined for 10.0.0.3 to SP01. SP01 has two paths to the destination and this time it sends the packet to LE03b, who receives it and floods it on all local switchports. Since no copy was sent to LE03a, no packet duplication was created. Problem solved!
It's time to review the MLAG configuration that was applied to LE03 and LE04 MLAG-pairs. I will share the full configuration for completeness, but the important parts will be examined further below.
service routing protocols model multi-agent!spanning-tree mode mstpspanning-tree mst 0 priority 4096no spanning-tree vlan-id 4094!vlan 10 name VLAN10!vlan 4094 name MLAG trunk group PEER-LINK!interface Port-Channel3 description "PEER-LINK" switchport mode trunk switchport trunk group PEER-LINK!interface Port-Channel31 description "SRV31" switchport mode trunk mlag 31!interface Port-Channel32 description "SRV32" switchport mode trunk mlag 32!interface Ethernet1 description "SP01" no switchport ip address 10.1.31.3/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet2 description "SP02" no switchport ip address 10.2.31.3/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet3 description "LE03b" channel-group 3 mode active!interface Ethernet31 description "SRV31" channel-group 31 mode active!interface Ethernet32 description "SRV32" channel-group 32 mode active!interface Loopback0 ip address 10.0.0.31/32!interface Loopback1 description "VXLAN SOURCE-INTERFACE" ip address 10.0.0.3/32!interface Vlan4094 no autostate ip address 10.0.3.1/30 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Vxlan1 vxlan source-interface Loopback1 vxlan udp-port 4789 vxlan vlan 10 vni 10!ip routing!mlag configuration domain-id LE03 local-interface Vlan4094 peer-address 10.0.3.2 peer-link Port-Channel3 reload-delay mlag 60 reload-delay non-mlag 30!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN send-community neighbor 10.0.0.1 peer group EVPN neighbor 10.0.0.2 peer group EVPN ! vlan 10 rd 65000:10 route-target both 65000:10 redistribute learned ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connected service routing protocols model multi-agent!spanning-tree mode mstpspanning-tree mst 0 priority 4096no spanning-tree vlan-id 4094!vlan 10 name VLAN10!vlan 4094 name MLAG trunk group PEER-LINK!interface Port-Channel3 description "PEER-LINK" switchport mode trunk switchport trunk group PEER-LINK!interface Port-Channel31 description "SRV31" switchport mode trunk mlag 31!interface Port-Channel32 description "SRV32" switchport mode trunk mlag 32!interface Ethernet1 description "SP01" no switchport ip address 10.1.32.3/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet2 description "SP02" no switchport ip address 10.2.32.3/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet3 description "LE03a" channel-group 3 mode active!interface Ethernet31 description "SRV31" channel-group 31 mode active!interface Ethernet32 description "SRV32" channel-group 32 mode active!interface Loopback0 ip address 10.0.0.32/32!interface Loopback1 description "VXLAN SOURCE-INTERFACE" ip address 10.0.0.3/32!interface Vlan4094 no autostate ip address 10.0.3.2/30 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Vxlan1 vxlan source-interface Loopback1 vxlan udp-port 4789 vxlan vlan 10 vni 10!ip routing!mlag configuration domain-id LE03 local-interface Vlan4094 peer-address 10.0.3.1 peer-link Port-Channel3 reload-delay mlag 60 reload-delay non-mlag 30!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN send-community neighbor 10.0.0.1 peer group EVPN neighbor 10.0.0.2 peer group EVPN ! vlan 10 rd 65000:10 route-target both 65000:10 redistribute learned ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connected service routing protocols model multi-agent!spanning-tree mode mstpspanning-tree mst 0 priority 4096no spanning-tree vlan-id 4094!vlan 10 name VLAN10!vlan 4094 name MLAG trunk group PEER-LINK!interface Port-Channel3 description "PEER-LINK" switchport mode trunk switchport trunk group PEER-LINK!interface Port-Channel41 description "SRV41" switchport mode trunk mlag 41!interface Port-Channel42 description "SRV42" switchport mode trunk mlag 42!interface Ethernet1 description "SP01" no switchport ip address 10.1.41.4/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet2 description "SP02" no switchport ip address 10.2.41.4/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet3 description "LE04b" channel-group 3 mode active!interface Ethernet41 description "SRV41" channel-group 41 mode active!interface Ethernet42 description "SRV42" channel-group 42 mode active!interface Loopback0 ip address 10.0.0.41/32!interface Loopback1 description "VXLAN SOURCE-INTERFACE" ip address 10.0.0.4/32!interface Vlan4094 no autostate ip address 10.0.4.1/30 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Vxlan1 vxlan source-interface Loopback1 vxlan udp-port 4789 vxlan vlan 10 vni 10!ip routing!mlag configuration domain-id LE04 local-interface Vlan4094 peer-address 10.0.4.2 peer-link Port-Channel3 reload-delay mlag 60 reload-delay non-mlag 30!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN send-community neighbor 10.0.0.1 peer group EVPN neighbor 10.0.0.2 peer group EVPN ! vlan 10 rd 65000:10 route-target both 65000:10 redistribute learned ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connected service routing protocols model multi-agent!spanning-tree mode mstpspanning-tree mst 0 priority 4096no spanning-tree vlan-id 4094!vlan 10 name VLAN10!vlan 4094 name MLAG trunk group PEER-LINK!interface Port-Channel3 description "PEER-LINK" switchport mode trunk switchport trunk group PEER-LINK!interface Port-Channel41 description "SRV41" switchport mode trunk mlag 41!interface Port-Channel42 description "SRV42" switchport mode trunk mlag 42!interface Ethernet1 description "SP01" no switchport ip address 10.1.42.4/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet2 description "SP02" no switchport ip address 10.2.42.4/28 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Ethernet3 description "LE04a" channel-group 3 mode active!interface Ethernet41 description "SRV41" channel-group 41 mode active!interface Ethernet42 description "SRV42" channel-group 42 mode active!interface Loopback0 ip address 10.0.0.42/32!interface Loopback1 description "VXLAN SOURCE-INTERFACE" ip address 10.0.0.4/32!interface Vlan4094 no autostate ip address 10.0.4.2/30 ip ospf network point-to-point ip ospf area 0.0.0.0!interface Vxlan1 vxlan source-interface Loopback1 vxlan udp-port 4789 vxlan vlan 10 vni 10!ip routing!mlag configuration domain-id LE04 local-interface Vlan4094 peer-address 10.0.4.1 peer-link Port-Channel3 reload-delay mlag 60 reload-delay non-mlag 30!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN send-community neighbor 10.0.0.1 peer group EVPN neighbor 10.0.0.2 peer group EVPN ! vlan 10 rd 65000:10 route-target both 65000:10 redistribute learned ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connectedAs there is quite a lot going on in the configuration above, I will go through it step by step below. The textbox show only the parts of the configuration necessary for setting up the MLAG-pair:
no spanning-tree vlan-id 4094!vlan 4094 name MLAG trunk group PEER-LINK!interface Port-Channel3 description "PEER-LINK" switchport mode trunk switchport trunk group PEER-LINK!interface Ethernet3 description "LE03b" channel-group 3 mode active!interface Vlan4094 description "LE03a-LE03b routed link" ip address 10.0.3.1/30 no autostate!mlag configuration domain-id LE03 local-interface Vlan4094 peer-address 10.0.3.2 peer-link Port-Channel3 reload-delay non-mlag 30 reload-delay mlag 60Arista MLAG is built around designating a physical peer-link interface between the two MLAG-switches. The peer-link is used for MLAG synchronization and MAC-learning. The peer-link can be a single Ethernet interface but the recommended setup is using a Port-Channel with multiple links to minimize the risk of the peer-link ever going down. In this configuration I'm using Port-Channel3 (Ethernet3) as my MLAG peer-link.
The peer-link is a switched trunk interface with all VLANs allowed. This is important as it facilitates MAC-learning for single-homed devices. For example, LE03b would not be able to learn the MAC-address of a device connected to LE03a, unless LE03a was able to flood broadcast frames from that device over the peer-link.
The peer-link being a switched trunk is extra important when VXLAN is used as LE03a has no other way to reliably flood the frame to LE03b. If we imagine that no peer-link existed between LE03a and LE03b, any broadcast frame received by LE03a from a downstream device would have to be VXLAN-flooded. Since both LE03-switches share the same VXLAN anycast IP-address (10.0.0.3), the VXLAN packet would have to be sent from 10.0.0.3 to 10.0.0.3. When SP01 or SP02 receive the packet they do a routing-lookup for 10.0.0.3 and find that there are two paths, one via LE03a and one via LE03b. There is therefore a 50% chance that the packet is sent back to LE03a instead of to LE03b, so VXLAN flooding does not work.
But if we're using EVPN, can't LE03b learn the MAC-address from LE03a via BGP? The answer is unfortunately no. We will see below that any EVPN route advertised by LE03a/b will have BGP nexthop 10.0.0.3 set, which is the MLAG Anycast IP-address. When LE03b receive the route from LE03a, BGP will mark the route as invalid as the nexthop is a locally configured IP-address. Installing this route could therefore create a routing loop. So any EVPN route advertised by LE03a is ignored by LE03b, and vice versa.
LE03a#show bgp evpn route-type mac-ip detailBGP routing table information for VRF defaultRouter identifier 10.0.0.31, local AS number 65000BGP routing table entry for mac-ip 5001.0000.1234, Route Distinguisher: 65000:10 Paths: 2 available Local 10.0.0.3 from 10.0.0.1 (10.0.0.1) Origin IGP, metric -, localpref 100, weight 0, invalid, internal Originator: 10.0.0.32, Cluster list: 10.0.0.1 Extended Community: Route-Target-AS:65000:10 TunnelEncap:tunnelTypeVxlan VNI: 10 ESI: 0000:0000:0000:0000:0000Looking at the output above, we can see that LE03a received a route from LE03b (Originator: 10.0.0.32) for MAC-address 5001.0000.1234. The route is marked as invalid. Even though the output doesn't say why, it's because the nexthop (10.0.0.3) is a locally configured IP-address on LE03a (Loopback1).
Another part of the MLAG configuration is vlan 4094 and its associated VLAN-interface. This VLAN and SVI is dedicated to MLAG communication, giving the switches in the MLAG-pair a routed point-to-point link for MLAG control traffic. By configuring trunk group PEER-LINK on VLAN 4094 and Port-Channel3, the VLAN is guaranteed to only exist on the peer-link and not leak to any other switched interfaces.
Because this VLAN is used for mission-critical MLAG communication, the commands no autostate and no spanning-tree vlan-id 4094 are entered to ensure that the interface always stays active. While not strictly necessary, I have enabled OSPF on the Vlan4094 SVI, giving the switches a backup path in the unlikely event of one of the switches losing connectivity to both spines.
The final step in our MLAG configuration is setting two reload-delay values; 30 seconds for non-mlag and 60 seconds for mlag interfaces. The purpose of these commands is to keep interfaces down while the switch is still loading after booting to avoid it receiving any traffic from downstream devices before it is ready to start forwarding.
With this configuration, a switch will behave like this when it comes online after booting:
As soon as the interfaces are ready, the peer-link comes up. MLAG can negotiate and synchronize network state. Because I enabled OSPF on the Vlan4094 interface, OSPF will come up allowing spine BGP adjacencies to establish.
Any non-mlag interface will come up. This is typically spine uplinks and interfaces to single-homed devices. Spine OSPF adjacencies are established.
The mlag interfaces come up. These are typically Port-Channels to downstream servers, signaling that the switch is ready to forward traffic.
In reality these reload-delay values are usually much higher. Some Arista hardware platforms require 300 seconds before MLAG-interfaces come up, others need 600 seconds or more. In my tiny virtual lab I'm not too keen on waiting 5-10 minutes, so I set aggressive timers.
Whenever you configure a multihomed Port-Channel, you need to assign a mlag number. For example, I assigned Port-Channel31 with the mlag 31 command. We must use the same ID on both switches in the MLAG-pair, as this information is used to identify the Port-Channel as an MLAG and not a normal LAG interface.
Once identified, secret MLAG sauce is used to synchronize Port-Channel state and MAC-addresses between the two switches. Once this is configured, you can see a "PeerEthernet" interface in the output, shown below:
LE03a#sh run int po31interface Port-Channel31 switchport mode trunk mlag 31 LE03a#show port-channelPort-Channel3: Active Ports: "Ethernet3" Port-Channel31: Active Ports: "Ethernet31" "PeerEthernet31" Port-Channel32: Active Ports: "Ethernet32" Configured, but inactive ports: Port Reason ------------------- ------------------------- "PeerEthernet32" waiting for LACP response LE03b#sh run int po31interface Port-Channel31 switchport mode trunk mlag 31 LE03b#show port-channelPort-Channel3: Active Ports: "Ethernet3" Port-Channel31: Active Ports: "Ethernet31" "PeerEthernet31" Port-Channel32: Active Ports: "PeerEthernet32" Configured, but inactive ports: Port Reason --------------- ------------------------- "Ethernet32" waiting for LACP responseExamining the output above, we can see that Port-Channel3 is not an MLAG, because it has no PeerEthernet interface. Port-Channel31 has two member interfaces, Ethernet31 and PeerEthernet31, so it is an MLAG with one member interface on LE03a and the other on LE03b. Last in the output we can see Port-Channel32 where the Ethernet32 member interface on LE03b has not established correctly due to a lack of LACP messages from the server.
Despite the above configuration specifying Loopback0 as the update-source in the EVPN peer group BGP configuration, the LE03 and LE04 switches will set their Loopback1 IP-address as the BGP next-hop for any EVPN route they advertise. Let's look at an example:
SP01#show bgp evpn route-type mac-ip detail BGP routing table entry for mac-ip 5001.0000.0041 Route Distinguisher: 65000:10 Paths: 2 available Local (Received from a RR-client) 10.0.0.4 from 10.0.0.42 (10.0.0.42) Origin IGP, metric -, localpref 100, weight 0, valid, internal, best Extended Community: Route-Target-AS:65000:10 TunnelEncap:tunnelTypeVxlan VNI: 10 ESI: 0000:0000:0000:0000:0000 Local (Received from a RR-client) 10.0.0.4 from 10.0.0.41 (10.0.0.41) Origin IGP, metric -, localpref 100, weight 0, valid, internal Extended Community: Route-Target-AS:65000:10 TunnelEncap:tunnelTypeVxlan VNI: 10 ESI: 0000:0000:0000:0000:0000 BGP routing table entry for mac-ip 5001.0000.0031 Route Distinguisher: 65000:10 Paths: 2 available Local (Received from a RR-client) 10.0.0.3 from 10.0.0.31 (10.0.0.31) Origin IGP, metric -, localpref 100, weight 0, valid, internal, best Extended Community: Route-Target-AS:65000:10 TunnelEncap:tunnelTypeVxlan VNI: 10 ESI: 0000:0000:0000:0000:0000 Local (Received from a RR-client) 10.0.0.3 from 10.0.0.32 (10.0.0.32) Origin IGP, metric -, localpref 100, weight 0, valid, internal Extended Community: Route-Target-AS:65000:10 TunnelEncap:tunnelTypeVxlan VNI: 10 ESI: 0000:0000:0000:0000:0000As we can see in the highlighted output above. LE03a (10.0.0.31) and LE03b (10.0.0.32) both advertise the 5001.0000.0031 MAC-address with the next-hop set to Loopback1 IP-address 10.0.0.3. This is what enables the Anycast functionality, avoiding packet duplication and optimizing BUM traffic flooding. The BGP EVPN next-hop is decided by the vxlan source-interface command in the interface Vxlan1 configuration mode.
We have now examined the configuration of a MLAG switch-pair. We have seen what is needed to avoid packet duplication, why the peer-link must be a switched trunk interface and the limitations of BGP EVPN and Anycast IP-addressing.
Now that we have examined the MLAG solution, lets examine multihoming using capabilities built into EVPN.
EVPN was originally invented as a protocol for L2VPN in Service-Provider (SP) networks as a replacement for VPLS. VPLS has drawbacks similar to standard VXLAN where the dataplane is also the control plane; MAC-learning was performed when frames were flooded between PEs. Another drawback of VPLS is that it can't do active-active multihoming.
So EVPN was invented to fix the shortcomings of VPLS. Using BGP as control plane, MAC-addresses were advertised without the need for flooding frames across the topology. Active-active multihoming was implemented using ESI and EVPN route-types 1 and 4, advertising Auto-Discovery routes and Ethernet Segment Identifiers (ESI), respectively. The former is used for multihoming MAC aliasing, the latter for Designated Forwarder (DF) elections. We will go into greater detail on both below.
One benefit that EVPN ESI has over MLAG is that a downstream device can connect to any combination of switches for multihoming. Whereas MLAG forces the downstream device to connect to switches in the same MLAG pair, with EVPN ESI a server can connect to any switch.
Lab TopologyGoing back to our lab topology diagram and focusing on the right side this time, we have three standalone switches (LE05, LE06 and LE07) that provide EVPN Multihoming using the ESI method. Each switch connect to one singlehomed server; SRV51, SRV61 and SRV71, respectively. There are also four multihomed servers, for example SRV561 and SRV562 connected to LE05 and LE06.
Because all multihoming communication is sent via BGP, there is no need for a peer-link. This is what makes ESI more flexible than MLAG.
This EVPN ESI awesomeness does have a drawback compared to MLAG, and that is how packet duplication is avoided. Where MLAG solved this problem using an Anycast IP-address on both switches in the MLAG-pair, EVPN ESI must use a split horizon-based approach which can be quite complex. So buckle up, this is about to get nutty!
When LE05 and LE06 realize that they are both connected to SRV561, they independently run an algorithm to determine the Designated Forwarder (DF). Whoever becomes DF is responsible for forwarding Broadcast, Multicast and unknown Unicast (BUM) traffic to the multihomed server. The other switch or switches are not allowed to forward these frames, thus avoiding packet duplication. The same process occurs for the SRV562 ethernet segment and this time LE06 may be elected as DF. This helps share the BUM-traffic load between the switches.
For simplicity sake we will assume that LE05 is DF for SRV561 and SRV562, and LE06 is DF for SRV671 and SRV672. Because SRV51, SRV61 and SRV71 are all singlehomed there is no need for a DF election on these ethernet segments.
Let's say SRV31 behind LE03a/b sends an ARP broadcast frame. LE03a is the receiver and perform VXLAN-flooding, sending a copy to each remote VTEP: LE04, LE05, LE06, LE07.
Dotted line means ARP was not flooded out on this portLE05 floods the ARP to SRV51 (singlehomed) and to SRV561/562 because LE05 is the DF.
LE06 floods the ARP to SRV61 (singlehomed) and to SRV671/672 because LE06 is the DF.
LE07 floods the ARP to SRV71 (singlehomed).
While this solution does work very well, it can't scale as high as the MLAG solution. A limitation of VXLAN scalability is Ingress Replication. A switch can only generate a finite amount of copies while VXLAN flooding before reaching some kind of hardware limit. While researching I found this document saying that a Cisco Nexus 9000-switch is limited to 64 peers per VNI in regards to Ingress Replication. In MLAG-terms, 64 peers equal 128 switches thanks to the Anycast VTEP. In EVPN ESI terms, 64 peers equal 64 switches. This suggests that MLAG has twice the scalability of EVPN ESI.
We're not done with Packet Duplication yet. I told you it was about to get nutty and we're getting closer. What happens when a device connected to a EVPN ESI leaf send out an ARP broadcast frame?
Each multihomed server has a unique color to show that it has multiple connections. For example, only SRV561 is blue.In this example SRV561 sends out an ARP broadcast frame. It happened to be sent to LE06 even though it also connects to LE05. To avoid packet duplication, two split-horizon rules must be used. The first rule goes as follows:
This means that when LE05 receive its VXLAN-flooded copy of the ARP broadcast frame, it will see that the VXLAN packet came from LE06 (10.0.0.6). Based on this, LE05 must not forward this frame to SRV561 or SRV562 as these devices are also connected to LE06. Note that this overrides the default DF behavior. Even though LE05 is the DF for SRV561 and SRV562, because the frame came from LE06 it cannot be forwarded as doing so could cause a network loop. LE05 therefore only floods the frame to SRV51.
LE07 will only forward the frame to singlehomed SRV71. SRV671/672 also connect to LE06, so LE07 must not flood this BUM packet to them. LE07 would not to do anyway as it is not the DF, but the above rule still take precedence.
The second rule says this:
Because the first rule forced LE05 to override its DF behavior and not forward the BUM packet to SRV561 or SRV562, this rule forces LE06 to override its non-DF behavior and forward the frame to SRV562. LE06 forward the frame to SRV61, SRV671 and SRV672 as they too are directly connected. The packet is not forwarded to SRV561 as that would mean sending the frame out on the same interface it was received.
The ARP packet travel path across the network.These split-horizon rules effectively stop packet duplication by having the ingress VTEP perform local flooding. The egress VTEP only flood the frame out on ethernet segments that are not shared with the ingress VTEP.
Note: these rules only apply to VXLAN EVPN. MPLS EVPN utilizes labels to influence the split-horizon behavior.
Let's review the configuration necessary to build multihoming with ESI. I will again start by displaying the full configuration, then go into more detail further below.
service routing protocols model multi-agent!link tracking group EVPN-ESI-MH recovery delay 60!spanning-tree mode mstpspanning-tree mst 0 priority 4096!vlan 10 name VLAN10!vlan 20 name VLAN20!interface Port-Channel51 switchport mode trunk!interface Port-Channel561 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0005:0006:0561 route-target import 00:05:00:06:05:61 lacp system-id 5001.0005.0006 link tracking group EVPN-ESI-MH downstream!interface Port-Channel562 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0005:0006:0562 route-target import 00:05:00:06:05:62 lacp system-id 5001.0005.0006 link tracking group EVPN-ESI-MH downstream!interface Ethernet1 no switchport ip address 10.1.5.5/28 ip ospf network point-to-point ip ospf area 0.0.0.0 link tracking group EVPN-ESI-MH upstream!interface Ethernet2 no switchport ip address 10.2.5.5/28 ip ospf network point-to-point ip ospf area 0.0.0.0 link tracking group EVPN-ESI-MH upstream!interface Ethernet51 description "SRV51" channel-group 51 mode active!interface Ethernet56/1 description "SRV561" channel-group 561 mode active!interface Ethernet56/2 description "SRV562" channel-group 562 mode active!interface Loopback0 ip address 10.0.0.5/32!interface Vxlan1 vxlan source-interface Loopback0 vxlan udp-port 4789 vxlan vlan 10 vni 10 vxlan vlan 20 vni 20!ip routing!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN send-community neighbor 10.0.0.1 peer group EVPN neighbor 10.0.0.2 peer group EVPN ! vlan 10 rd 10.0.0.5:10 route-target both 65000:10 redistribute learned ! vlan 20 rd 10.0.0.5:20 route-target both 65000:20 redistribute learned ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connected service routing protocols model multi-agent!link tracking group EVPN-ESI-MH recovery delay 60!spanning-tree mode mstpspanning-tree mst 0 priority 4096!vlan 10 name VLAN10!vlan 20 name VLAN20!interface Port-Channel61 switchport mode trunk!interface Port-Channel67 lacp system-id 5001.0000.0067!interface Port-Channel561 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0005:0006:0561 route-target import 00:05:00:06:05:61 lacp system-id 5001.0005.0006 link tracking group EVPN-ESI-MH downstream!interface Port-Channel562 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0005:0006:0562 route-target import 00:05:00:06:05:62 lacp system-id 5001.0005.0006 link tracking group EVPN-ESI-MH downstream!interface Port-Channel671 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0006:0007:0671 route-target import 00:06:00:07:06:71 lacp system-id 5001.0006.0007 link tracking group EVPN-ESI-MH downstream!interface Port-Channel672 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0006:0007:0672 route-target import 00:06:00:07:06:72 lacp system-id 5001.0006.0007 link tracking group EVPN-ESI-MH downstream!interface Ethernet1 no switchport ip address 10.1.6.6/28 ip ospf network point-to-point ip ospf area 0.0.0.0 link tracking group EVPN-ESI-MH upstream!interface Ethernet2 no switchport ip address 10.2.6.6/28 ip ospf network point-to-point ip ospf area 0.0.0.0 link tracking group EVPN-ESI-MH upstream!interface Ethernet56/1 description "SRV561" channel-group 561 mode active!interface Ethernet56/2 description "SRV562" channel-group 562 mode active!interface Ethernet67/1 description "SRV671" channel-group 671 mode active!interface Ethernet67/2 description "SRV672" channel-group 672 mode active!interface Ethernet61 description "SRV61" channel-group 61 mode active!interface Loopback0 ip address 10.0.0.6/32!interface Vxlan1 vxlan source-interface Loopback0 vxlan udp-port 4789 vxlan vlan 10 vni 10 vxlan vlan 20 vni 20!ip routing!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN send-community neighbor 10.0.0.1 peer group EVPN neighbor 10.0.0.2 peer group EVPN ! vlan 10 rd 10.0.0.6:10 route-target both 65000:10 redistribute learned ! vlan 20 rd 10.0.0.6:20 route-target both 65000:20 redistribute learned ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connected service routing protocols model multi-agent!link tracking group EVPN-ESI-MH recovery delay 60!logging console informationallogging synchronous level informational!spanning-tree mode mstpspanning-tree mst 0 priority 4096!vlan 10 name VLAN10!vlan 20 name VLAN20!interface Port-Channel71 switchport mode trunk!interface Port-Channel671 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0006:0007:0671 route-target import 00:06:00:07:06:71 lacp system-id 5001.0006.0007 link tracking group EVPN-ESI-MH downstream!interface Port-Channel672 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0006:0007:0672 route-target import 00:06:00:07:06:72 lacp system-id 5001.0006.0007 link tracking group EVPN-ESI-MH downstream!interface Ethernet1 no switchport ip address 10.1.7.7/28 ip ospf network point-to-point ip ospf area 0.0.0.0 link tracking group EVPN-ESI-MH upstream!interface Ethernet2 no switchport ip address 10.2.7.7/28 ip ospf network point-to-point ip ospf area 0.0.0.0 link tracking group EVPN-ESI-MH upstream!interface Ethernet67/1 description "SRV671" channel-group 671 mode active!interface Ethernet67/2 description "SRV672" channel-group 672 mode active!interface Ethernet71 description "SRV71" channel-group 71 mode active!interface Loopback0 ip address 10.0.0.7/32!interface Vxlan1 vxlan source-interface Loopback0 vxlan udp-port 4789 vxlan vlan 10 vni 10 vxlan vlan 20 vni 20!ip routing!router bgp 65000 neighbor EVPN peer group neighbor EVPN remote-as 65000 neighbor EVPN update-source Loopback0 neighbor EVPN send-community neighbor 10.0.0.1 peer group EVPN neighbor 10.0.0.2 peer group EVPN ! vlan 10 rd 10.0.0.7:10 route-target both 65000:10 redistribute learned ! vlan 20 rd 10.0.0.7:20 route-target both 65000:20 redistribute learned ! address-family evpn neighbor EVPN activate ! address-family ipv4 no neighbor EVPN activate!router ospf 1 redistribute connectedWith the full configuration shown above, let's take a deeper look at the configuration lines that enable the EVPN multihoming functionality:
interface Port-Channel51 switchport mode trunk!interface Port-Channel561 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0005:0006:0561 route-target import 00:05:00:06:05:61 lacp system-id 5001.0005.0006 link tracking group EVPN-ESI-MH downstream!interface Port-Channel562 switchport mode trunk ! evpn ethernet-segment identifier 0000:0000:0005:0006:0562 route-target import 00:05:00:06:05:62 lacp system-id 5001.0005.0006Starting with Port-Channel51, this one is very simple as SRV51 is singlehomed to LE05. No special configuration is required. Moving on to Port-Channel561, we see a couple of new commands:
We use the identifier 0000:0000:0005:0006:0561 command to uniquely identify this Ethernet Segment. This is the ESI. By configuring the same value on LE05 and LE06, they understand that they connect to the same ethernet segment, SRV561.
As long as the ESI value start with 00, the rest of the identifier can contain any combination of hexadecimal characters. I elected to use a 0000:0000:<lower-leaf-ID>:<higher-leaf-ID>:<port-channel-ID> format.
The route-target import 00:05:00:06:05:61 command is used to create an inbound route-filter so that only switches with the ESI configured import the route. I use the <lower-leaf-ID>:<higher-leaf-ID>:<port-channel-ID> format.
The lacp system-id 5001.0005.0006 command is used make sure LE05 and LE06 send the same LACP system ID to SRV561 when negotiating the port-channel (LAG). SRV561 will not bring both interfaces up if it thinks they connect to different switches. This solves the same problem that MLAG did but at the interface level. I use the 5001:<lower-node-ID>:<higher-node-ID> syntax.
The Port-Channel562 configuration follow the same syntax and procedure. This ensure that each multi-homed ethernet segment is uniquely identified.
Note: Even singlehomed ethernet segments have an ESI value assigned, but use the default all-zeroes ESI.
This route-type is used for ESI multihoming. Its purpose is signaling the link-state of the local Port-Channel interface. One could argue that this is not necessary as the switch could just withdraw any MAC-IP route once an interface goes down. However, we will discover why this is a good thing below.
Note: This route is not advertised if the ESI value is all-zeroes (singlehomed).
LE05#show bgp evpn route-type auto-discovery esi 0000:0000:0005:0006:0561"Routes originated by LE05:" Network Next Hop Metric LocPref Weight Path * > RD: 10.0.0.5:1 auto-discovery 0000:0000:0005:0006:0561 - - - 0 i * > RD: 10.0.0.5:10 auto-discovery 0 0000:0000:0005:0006:0561 - - - 0 i * > RD: 10.0.0.5:20 auto-discovery 0 0000:0000:0005:0006:0561 - - - 0 i"Routes originated by LE06:" * > RD: 10.0.0.6:1 auto-discovery 0000:0000:0005:0006:0561 10.0.0.6 - 100 0 i Or-ID: 10.0.0.6 * > RD: 10.0.0.6:10 auto-discovery 0 0000:0000:0005:0006:0561 10.0.0.6 - 100 0 i Or-ID: 10.0.0.6 * > RD: 10.0.0.6:20 auto-discovery 0 0000:0000:0005:0006:0561 10.0.0.6 - 100 0 i Or-ID: 10.0.0.6Focusing on specific lines from the output above:
The 10.0.0.6:1 auto-discovery 0000 route is advertised when the interface is physically up and forwarding. If Port-Channel561 goes down on LE06, this route will be withdrawn.
The 10.0.0.6:10 auto-discovery 0 0000 route is withdrawn when VLAN 10/VNI 10 is no longer available on that interface. If I run the switchport trunk allowed vlan remove 10 command on interface Port-Channel561 on LE06, this route will be withdrawn.
So EVPN is used to signal both the link-state of a physical interface, but also individual VLANs on that interface. These routes on their own does not accomplish much, but if we keep digging we find references to these ESIs in our MAC-IP routes:
LE07#show bgp evpn vni 10 Network Next Hop Metric LocPref Weight Path * > RD: 10.0.0.5:10 auto-discovery 0 0000:0000:0005:0006:0561 10.0.0.5 - 100 0 i Or-ID: 10.0.0.5 * > RD: 10.0.0.6:10 auto-discovery 0 0000:0000:0005:0006:0561 10.0.0.6 - 100 0 i Or-ID: 10.0.0.6 * > RD: 10.0.0.5:10 mac-ip 5001.0000.0561 10.0.0.5 - 100 0 i Or-ID: 10.0.0.5 LE07#show bgp evpn route-type mac-ip vni 10 detailBGP routing table entry for mac-ip 5001.0000.0561, Route Distinguisher: 10.0.0.5:10 Paths: 2 available Local 10.0.0.5 from 10.0.0.1 (10.0.0.1) Origin IGP, metric -, localpref 100, weight 0, valid, internal, best Originator: 10.0.0.5, Cluster list: 10.0.0.1 Extended Community: Route-Target-AS:65000:10 TunnelEncap:tunnelTypeVxlan VNI: 10 ESI: 0000:0000:0005:0006:0561 LE07#show vxlan address-tableVLAN Mac Address Type Prt VTEP---- ----------- ---- --- ---- 10 5001.0000.0561 EVPN Vx1 10.0.0.5 10.0.0.6In this example LE07 have only learned the MAC-address 5001.0000.0561 from LE05, shown above. Despite this, its VXLAN address table specify both LE05 and LE06 as valid nexthops for traffic to that MAC-address. This is because both LE05 and LE06 advertise an Auto-Discovery route for ESI 0000:0000:0005:0006:0561. As soon as LE06 withdraws the AD-route for 0000:0000:0005:0006:0561 in VNI 10, LE07 updates its VXLAN-table to only use LE05 as the nexthop.
LE07 is able to do this thanks to what Arista calls a MAC Aliasing mechanism, which according to an Arista document can be quite a common occurrence. One such example is SRV561 deciding to only forward traffic via its LE05-interface.
Additionally, if SRV561 is only sending its traffic to LE05 then LE06 never gets a chance to locally learn the MAC-Address. Thanks to MAC Aliasing, LE06 is still able to install the SRV561 MAC-address entry based on the information received from LE05. If SRV561 start sending traffic on its LE06-interface, LE06 learn the MAC-address locally and start advertising the MAC-address to its EVPN neighbors.
Another strength of the AD-route is its mass withdrawal feature to improve convergence time. Let's assume 1000 MAC-addresses live on the SRV561 ethernet segment. Let's then imagine that LE06 loses connectivity to SRV561. LE06 must now tell the the network that it should no longer be sent any traffic destined for these 1000 MAC-addresses. Withdrawing 1000 MAC-IP routes will take a significant amount of time as LE06 has to generate and send them, the RRs have to reflect them and all other switches must process them to remove each MAC-addresses from their VXLAN address tables. This can be very slow and resource-intensive, negatively affecting the network convergence time.
Instead, LE06 send a single AD-route withdrawal first, announcing that it lost connectivity to the SRV561 ethernet segment. This one route is quickly reflected and processed on all switches, allowing them to efficiently remove the LE06 as nexthop for all MAC-addresses mapped to the 0000:0000:0005:0006:0561 ESI. The network convergence time is now only a fraction of what it would otherwise be. LE06 can now generate its 1000 MAC-IP route withdrawals at a leisurely pace to be reflected and processed without impacting the network convergence time.
When you configure EVPN multihoming with ESI, you must ensure that every switch advertises its routes with a unique Route-Distinguisher. This is necessary to ensure that all type-1 Auto-Discovery routes are received correctly. If you use the same RD (65000:10 for example) on all switches then routes coming from other switches will appear identical to one that the switch originates itself. Because of this, BGP will prefer its own locally originated route and discard the others. These auto-discovery routes must be received for multihoming to function correctly, so you must use the Router-ID:VNI Route-Distinguisher format shown below:
LE05#show run sec bgprouter bgp 65000 ! vlan 10 rd 10.0.0.5:10 <-- "Very important" route-target both 65000:10 redistribute learned LE06#show run sec bgprouter bgp 65000 ! vlan 10 rd 10.0.0.6:10 <-- "Very important" route-target both 65000:10 redistribute learnedThis route is used for Designated Forwarder elections. For every multihomed Ethernet segment, a DF must be elected.
LE05#sh run int po561interface Port-Channel561 switchport mode trunk switchport ! evpn ethernet-segment identifier 0000:0000:0005:0006:0561 designated-forwarder election algorithm preference 50 route-target import 00:05:00:06:05:61 lacp system-id 5001.0005.0006 LE05#show bgp evpn route-type ethernet-segment esi 0000:0000:0005:0006:0561 detailBGP routing table information for VRF defaultRouter identifier 10.0.0.5, local AS number 65000BGP routing table entry for ethernet-segment 0000:0000:0005:0006:0561 10.0.0.5 Route Distinguisher: 10.0.0.5:1 Paths: 1 available Local - from - (0.0.0.0) Origin IGP, metric -, localpref -, weight 0, valid, local, best Extended Community: TunnelEncap:tunnelTypeVxlan EvpnEsImportRt:00:05:00:06:05:61 DF Election: Preference 50BGP routing table entry for ethernet-segment 0000:0000:0005:0006:0561 10.0.0.6 Route Distinguisher: 10.0.0.6:1 Paths: 1 available Local 10.0.0.6 from 10.0.0.1 (10.0.0.1) Origin IGP, metric -, localpref 100, weight 0, valid, internal, best Originator: 10.0.0.6, Cluster list: 10.0.0.1 Extended Community: TunnelEncap:tunnelTypeVxlan EvpnEsImportRt:00:05:00:06:05:61 DF Election: Preference 100 LE05#show bgp evpn instEVPN instance: VLAN 10 Local ethernet segment: ESI: 0000:0000:0005:0006:0561 Interface: Port-Channel561 Mode: all-active State: up ES-Import RT: 00:05:00:06:05:61 DF election algorithm: preference Designated forwarder: 10.0.0.6 Non-Designated forwarder: 10.0.0.5EVPN instance: VLAN 20 ESI: 0000:0000:0005:0006:0561 Interface: Port-Channel561 Mode: all-active State: up ES-Import RT: 00:05:00:06:05:61 DF election algorithm: preference Designated forwarder: 10.0.0.6 Non-Designated forwarder: 10.0.0.5 LE06#show bgp evpn route-type ethernet-segment esi 0000:0000:0005:0006:0561 detailBGP routing table information for VRF defaultRouter identifier 10.0.0.6, local AS number 65000BGP routing table entry for ethernet-segment 0000:0000:0005:0006:0561 10.0.0.5 Route Distinguisher: 10.0.0.5:1 Paths: 1 available Local 10.0.0.5 from 10.0.0.1 (10.0.0.1) Origin IGP, metric -, localpref 100, weight 0, valid, internal, best Originator: 10.0.0.5, Cluster list: 10.0.0.1 Extended Community: TunnelEncap:tunnelTypeVxlan EvpnEsImportRt:00:05:00:06:05:61 DF Election: Preference 50BGP routing table entry for ethernet-segment 0000:0000:0005:0006:0561 10.0.0.6 Route Distinguisher: 10.0.0.6:1 Paths: 1 available Local - from - (0.0.0.0) Origin IGP, metric -, localpref -, weight 0, valid, local, best Extended Community: TunnelEncap:tunnelTypeVxlan EvpnEsImportRt:00:05:00:06:05:61 DF Election: Preference 100 LE06#show bgp evpn instEVPN instance: VLAN 10 Local ethernet segment: ESI: 0000:0000:0005:0006:0561 Interface: Port-Channel561 Mode: all-active State: up ES-Import RT: 00:05:00:06:05:61 DF election algorithm: preference Designated forwarder: 10.0.0.6 Non-Designated forwarder: 10.0.0.5EVPN instance: VLAN 20 Local ethernet segment: ESI: 0000:0000:0005:0006:0561 Interface: Port-Channel561 Mode: all-active State: up ES-Import RT: 00:05:00:06:05:61 DF election algorithm: preference Designated forwarder: 10.0.0.6 Non-Designated forwarder: 10.0.0.5In the output above we can see that a preference value was used to make sure that LE06 become the DF for the SRV561 ethernet segment. All data is carried by BGP as extended community values, ensuring that all switches independently come to the same conclusion about who should be DF. Because LE06 has the higher preference value (100 vs 50), it is elected as DF.
The show bgp evpn instance command show the DF status for each multihomed ethernet segment. To save space I only include output for the SRV561 segment.
With MLAG we had a nice feature called reload-delay which allowed the operator to setup timers to stagger the process of taking interfaces online after the switch had finished booting. This avoids traffic blackholing by stopping the switch from receiving traffic before it is ready to forward it. With EVPN ESI we are not using MLAG, so to get the same functionality we use the Arista link tracking feature. The configuration looks like this:
LE05#show run sec EVPN-ESI-MHlink tracking group EVPN-ESI-MH recovery delay 60!interface Port-Channel561 description "SRV561" link tracking group EVPN-ESI-MH downstream!interface Ethernet1 description "SP01" link tracking group EVPN-ESI-MH upstreaminterface Ethernet2 description "SP02" link tracking group EVPN-ESI-MH upstreamA link-tracking group EVPN-ESI-MH was created, tracking the spine uplinks. If both Ethernet1 and Ethernet2 interfaces go down, the link-tracking group puts Port-Channel561 into down state, signaling to SRV561 that LE05 should cannot forward any traffic. After Ethernet1 or Ethernet2 come back up, a 60 second timer starts. Once the timer ends, Port-Channel561 is put into up state, signaling to SRV561 that LE05 is ready to forward traffic again.
With MLAG, the two switches in the pair could synchronize their STP Bridge ID to ensure that downstream switches would see them as one switch. To achieve the same functionality without MLAG, Arista developed a STP super root feature, enabling a switch to send STP BPDUs with Bridge ID 0000.0000.0001 and priority set to 0. This allow L2 switches to be multihomed to LE05/6/7 just like with MLAG. The magic command is spanning-tree root support.
If your particular vendor does not support a similar command, the recommendation is to filter BPDUs on the EVPN ESI port-channel and leave it up to the downstream switch to ensure that the topology is loop-free.
This is a solid alternative (and the only?) when MLAG cannot be used. It avoids packet duplication using complex split-horizon rules. As EVPN carry all information, there is no need for a peer-link.
Pros:
Cons:
We have now compared MLAG and ESI for EVPN Multihoming. Both solutions must solve the packet duplication problem and they use vastly different methods of doing so.
If you're deploying a data center network, I would recommend the MLAG option. It is simple, solid and less complex than EVPN ESI. Its drawback is that it is proprietary to that vendor. You have to trust that their implementation is stable.
If you're deploying a service provider network you may not be able to deploy MLAG, and in that case EVPN ESI is your only alternative. With EVPN ESI you have a more complex but transparent setup where everything is advertised with BGP. This technically lowers your dependence on a single vendor as everything is using open standards and RFCs. As for real-world vendor inter-operability, only time and testing will tell. Two vendors may implement features or interpret RFCs differently.
I really enjoyed writing this article (although it took me quite a while to put together) and I learned a ton about EVPN ESI Multihoming along the way. I hope you enjoyed reading this article. Thanks for visiting, I hope to see you again soon!
If you want more to read, please consider other posts in my VXLAN series: