Firewall Cluster and EVPN

Published 2024-06-09


This article will cover three interesting problems I faced when deploying a Fortigate Firewall (FW) cluster in a VXLAN EVPN (MLAG) datacenter topology. The customer had requested that each FW member was to be deployed in separate leaf-pairs for increased redundancy. The cluster were to provide stateful inter-VRF inspection. Let us quickly walk through the topology before we get to the problems:

Our two FW members FW1a and FW1b are deployed to different leaf-pairs; LE03 and LE04 respectively. While this decision makes perfect sense, it brought along a few interesting design decisions that we would not have to consider should the two FW members be connected to the same leaf pair.

I have also added two servers to the topology so that we can generate traffic to test the design, both connected to LE05. The servers live in vrfs A and B respectively, forcing traffic between them to traverse our FW. The diagram below displays the logical topology and traffic flow:

The IPv6 range 2001:db8:dc01::/48 was allocated to this data center site. From that range we allocate 2001:db8:dc01:a00::/56 to vrf A and 2001:db8:dc01:b00::/56 to vrf B. As FW1 should inspect traffic traveling between the two vrfs, it needs connectivity to both vrfs. We setup a linknet on vlan 100 for vrf A and a linknet on vlan 200 for vrf B. SRV6 was placed vlan 106 and SRV7 in vlan 207.

We run OSPF/iBGP in the VXLAN-EVPN underlay using unnumbered Ethernet links. This setup reduces the configuration as interfaces Ethernet1 through Ethernet6 on the spines are identical. It also greatly reduces the number of IP-addresses required for the topology. You can see the full (initial) configuration below:

service routing protocols model multi-agent!hostname SP01!interface Ethernet1-6   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Loopback0   ip address 10.0.0.1/32   ip ospf area 0.0.0.0!ip routing!router bgp 65000   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN route-reflector-client   neighbor EVPN timers 5 15   neighbor EVPN send-community   neighbor 10.0.0.31 peer group EVPN   neighbor 10.0.0.32 peer group EVPN   neighbor 10.0.0.41 peer group EVPN   neighbor 10.0.0.42 peer group EVPN   neighbor 10.0.0.51 peer group EVPN   neighbor 10.0.0.52 peer group EVPN   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate!router ospf 1   redistribute connected   max-lsa 12000!end

service routing protocols model multi-agent!hostname SP02!interface Ethernet1-6   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Loopback0   ip address 10.0.0.2/32   ip ospf area 0.0.0.0!ip routing!router bgp 65000   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN route-reflector-client   neighbor EVPN timers 5 15   neighbor EVPN send-community   neighbor 10.0.0.31 peer group EVPN   neighbor 10.0.0.32 peer group EVPN   neighbor 10.0.0.41 peer group EVPN   neighbor 10.0.0.42 peer group EVPN   neighbor 10.0.0.51 peer group EVPN   neighbor 10.0.0.52 peer group EVPN   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate!router ospf 1   redistribute connected   max-lsa 12000!end

service routing protocols model multi-agent!hostname LE03a!vlan 100   name "FW1;VRF=A"!vlan 200   name "FW1;VRF=B"!vlan 4094   name "MLAG"   trunk group PEER-LINK!vrf instance A!vrf instance B!interface Port-Channel3   description "PEER-LINK"   switchport mode trunk   switchport trunk group PEER-LINK!interface Port-Channel4   description "FW1"   switchport mode trunk   mlag 4!interface Ethernet1   description "SP01"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet2   description "SP02"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet3   description "LE03b"   channel-group 3 mode active!interface Ethernet4   description "FW1a"   channel-group 4 mode active!interface Loopback0   description "OSPF/BGP ROUTER-ID"   ip address 10.0.0.31/32   ip ospf area 0.0.0.0!interface Loopback1   description "VXLAN SOURCE-INTERFACE"   ip address 10.0.0.3/32   ip ospf area 0.0.0.0!interface Vlan100   description "FW1;VRF=A"   vrf A   arp aging timeout 290   ipv6 address 2001:db8:dc01:a00::3/64!interface Vlan200   description "FW1;VRF=B"   vrf B   arp aging timeout 290   ipv6 address 2001:db8:dc01:b00::3/64!interface Vlan4094   no autostate   ip address 10.0.3.1/30   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Vxlan1   vxlan source-interface Loopback1   vxlan udp-port 4789   vxlan vlan 100 vni 100   vxlan vlan 200 vni 200   vxlan vrf A vni 5001   vxlan vrf B vni 5002!ip virtual-router mac-address 00:11:22:33:44:55!ip routing!ipv6 unicast-routingipv6 unicast-routing vrf Aipv6 unicast-routing vrf B!mlag configuration   domain-id LE03   local-interface Vlan4094   peer-address 10.0.3.2   peer-link Port-Channel3   reload-delay mlag 60   reload-delay non-mlag 30!router bgp 65000   no bgp default ipv4-unicast   distance bgp 20 200 200   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN send-community   neighbor 10.0.0.1 peer group EVPN   neighbor 10.0.0.2 peer group EVPN   !   vlan 100      rd 10.0.0.31:100      route-target both 65000:100      redistribute learned   !   vlan 200      rd 10.0.0.31:200      route-target both 65000:200      redistribute learned   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate!router ospf 1   redistribute connected   max-lsa 12000!end

service routing protocols model multi-agent!hostname LE03b!vlan 100   name "FW1;VRF=A"!vlan 200   name "FW1;VRF=B"!vlan 4094   name "MLAG"   trunk group PEER-LINK!vrf instance A!vrf instance B!interface Port-Channel3   description "PEER-LINK"   switchport mode trunk   switchport trunk group PEER-LINK!interface Port-Channel4   description "FW1"   switchport mode trunk   mlag 4!interface Ethernet1   description "SP01"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet2   description "SP02"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet3   description "LE03a"   channel-group 3 mode active!interface Ethernet4   description "FW1a"   channel-group 4 mode active!interface Loopback0   description "OSPF/BGP ROUTER-ID"   ip address 10.0.0.32/32   ip ospf area 0.0.0.0!interface Loopback1   description "VXLAN SOURCE-INTERFACE"   ip address 10.0.0.3/32   ip ospf area 0.0.0.0!interface Vlan100   description "FW1;VRF=A"   vrf A   arp aging timeout 290   ipv6 address 2001:db8:dc01:a00::4/64!interface Vlan200   description "FW1;VRF=B"   vrf B   arp aging timeout 290   ipv6 address 2001:db8:dc01:b00::4/64!interface Vlan4094   no autostate   ip address 10.0.3.2/30   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Vxlan1   vxlan source-interface Loopback1   vxlan udp-port 4789   vxlan vlan 100 vni 100   vxlan vlan 200 vni 200   vxlan vrf A vni 5001   vxlan vrf B vni 5002!ip virtual-router mac-address 00:11:22:33:44:55!ip routing!ipv6 unicast-routingipv6 unicast-routing vrf Aipv6 unicast-routing vrf B!mlag configuration   domain-id LE03   local-interface Vlan4094   peer-address 10.0.3.1   peer-link Port-Channel3   reload-delay mlag 60   reload-delay non-mlag 30!router bgp 65000   no bgp default ipv4-unicast   distance bgp 20 200 200   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN send-community   neighbor 10.0.0.1 peer group EVPN   neighbor 10.0.0.2 peer group EVPN   !   vlan 100      rd 10.0.0.32:100      route-target both 65000:100      redistribute learned   !   vlan 200      rd 10.0.0.32:200      route-target both 65000:200      redistribute learned   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate!router ospf 1   redistribute connected   max-lsa 12000!end

service routing protocols model multi-agent!hostname LE04a!vlan 100   name "FW1;VRF=A"!vlan 200   name "FW1;VRF=B"!vlan 4094   name "MLAG"   trunk group PEER-LINK!vrf instance A!vrf instance B!interface Port-Channel3   description "PEER-LINK"   switchport mode trunk   switchport trunk group PEER-LINK!interface Port-Channel4   description "FW1"   switchport mode trunk   mlag 4!interface Ethernet1   description "SP01"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet2   description "SP02"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet3   description "LE04b"   channel-group 3 mode active!interface Ethernet4   description "FW1a"   channel-group 4 mode active!interface Loopback0   description "OSPF/BGP ROUTER-ID"   ip address 10.0.0.41/32   ip ospf area 0.0.0.0!interface Loopback1   description "VXLAN SOURCE-INTERFACE"   ip address 10.0.0.4/32   ip ospf area 0.0.0.0!interface Vlan100   description "FW1;VRF=A"   vrf A   arp aging timeout 290   ipv6 address 2001:db8:dc01:a00::5/64!interface Vlan200   description "FW1;VRF=B"   vrf B   arp aging timeout 290   ipv6 address 2001:db8:dc01:b00::5/64!interface Vlan4094   no autostate   ip address 10.0.4.1/30   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Vxlan1   vxlan source-interface Loopback1   vxlan udp-port 4789   vxlan vlan 100 vni 100   vxlan vlan 200 vni 200   vxlan vrf A vni 5001   vxlan vrf B vni 5002!ip virtual-router mac-address 00:11:22:33:44:55!ip routing!ipv6 unicast-routingipv6 unicast-routing vrf Aipv6 unicast-routing vrf B!mlag configuration   domain-id LE04   local-interface Vlan4094   peer-address 10.0.4.2   peer-link Port-Channel3   reload-delay mlag 60   reload-delay non-mlag 30!router bgp 65000   distance bgp 20 200 200   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN send-community   neighbor 10.0.0.1 peer group EVPN   neighbor 10.0.0.2 peer group EVPN   !   vlan 100      rd 10.0.0.41:100      route-target both 65000:100      redistribute learned   !   vlan 200      rd 10.0.0.41:200      route-target both 65000:200      redistribute learned   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate!router ospf 1   redistribute connected   max-lsa 12000!end

service routing protocols model multi-agent!hostname LE04b!vlan 100   name "FW1;VRF=A"!vlan 200   name "FW1;VRF=B"!vlan 4094   name "MLAG"   trunk group PEER-LINK!vrf instance A!vrf instance B!interface Port-Channel3   description "PEER-LINK"   switchport mode trunk   switchport trunk group PEER-LINK!interface Port-Channel4   description "FW1"   switchport mode trunk   mlag 4!interface Ethernet1   description "SP01"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet2   description "SP02"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet3   description "LE04a"   channel-group 3 mode active!interface Ethernet4   description "FW1a"   channel-group 4 mode active!interface Loopback0   description "OSPF/BGP ROUTER-ID"   ip address 10.0.0.42/32   ip ospf area 0.0.0.0!interface Loopback1   description "VXLAN SOURCE-INTERFACE"   ip address 10.0.0.4/32   ip ospf area 0.0.0.0!interface Vlan100   description "FW1;VRF=A"   vrf A   arp aging timeout 290   ipv6 address 2001:db8:dc01:a00::6/64!interface Vlan200   description "FW1;VRF=B"   vrf B   arp aging timeout 290   ipv6 address 2001:db8:dc01:b00::6/64!interface Vlan4094   no autostate   ip address 10.0.4.2/30   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Vxlan1   vxlan source-interface Loopback1   vxlan udp-port 4789   vxlan vlan 100 vni 100   vxlan vlan 200 vni 200   vxlan vrf A vni 5001   vxlan vrf B vni 5002!ip virtual-router mac-address 00:11:22:33:44:55!ip routingip routing vrf Aip routing vrf B!ipv6 unicast-routingipv6 unicast-routing vrf Aipv6 unicast-routing vrf B!mlag configuration   domain-id LE04   local-interface Vlan4094   peer-address 10.0.4.1   peer-link Port-Channel3   reload-delay mlag 60   reload-delay non-mlag 30!router bgp 65000   distance bgp 20 200 200   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN send-community   neighbor 10.0.0.1 peer group EVPN   neighbor 10.0.0.2 peer group EVPN   !   vlan 100      rd 10.0.0.42:100      route-target both 65000:100      redistribute learned      no redistribute host-route   !   vlan 200      rd 10.0.0.42:200      route-target both 65000:200      redistribute learned      no redistribute host-route   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate!router ospf 1   redistribute connected   max-lsa 12000!end

service routing protocols model multi-agent!hostname LE05a!vlan 106   name SRV6!vlan 207   name SRV7!vlan 4094   name "MLAG"   trunk group PEER-LINK!vrf instance A!vrf instance B!interface Port-Channel3   description "PEER-LINK"   switchport mode trunk   switchport trunk group PEER-LINK!interface Ethernet1   description "SP01"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet2   description "SP02"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet3   description "LE05b"   channel-group 3 mode active!interface Ethernet6   description SRV6   switchport access vlan 106!interface Loopback0   description "OSPF/BGP ROUTER-ID"   ip address 10.0.0.51/32   ip ospf area 0.0.0.0!interface Loopback1   description "VXLAN SOURCE-INTERFACE"   ip address 10.0.0.5/32   ip ospf area 0.0.0.0!interface Vlan106   vrf A   arp aging timeout 290   ipv6 address virtual 2001:db8:dc01:a06::1/64!interface Vlan207   vrf B   arp aging timeout 290   ipv6 address virtual 2001:db8:dc01:b07::1/64!interface Vlan4094   no autostate   ip address 10.0.5.1/30   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Vxlan1   vxlan source-interface Loopback1   vxlan udp-port 4789   vxlan vlan 106 vni 106   vxlan vlan 207 vni 207   vxlan vrf A vni 5001   vxlan vrf B vni 5002!ip virtual-router mac-address 00:11:22:33:44:55!ip routingip routing vrf Aip routing vrf B!ipv6 unicast-routingipv6 unicast-routing vrf Aipv6 unicast-routing vrf B!mlag configuration   domain-id LE05   local-interface Vlan4094   peer-address 10.0.5.2   peer-link Port-Channel3   reload-delay mlag 60   reload-delay non-mlag 30!router bgp 65000   router-id 10.0.0.51   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN send-community   neighbor 10.0.0.1 peer group EVPN   neighbor 10.0.0.2 peer group EVPN   !   vlan 106      rd 10.0.0.51:106      route-target both 65000:106      redistribute learned   !   vlan 207      rd 10.0.0.51:207      route-target both 65000:207      redistribute learned   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate   !   vrf A      rd 10.0.0.51:5001      route-target import evpn 65000:5001      route-target export evpn 65000:5001      router-id 10.0.0.51      bgp default ipv6-unicast      redistribute connected   !   vrf B      rd 10.0.0.51:5002      route-target import evpn 65000:5002      route-target export evpn 65000:5002      router-id 10.0.0.51      bgp default ipv6-unicast      redistribute connected!router ospf 1   redistribute connected   max-lsa 12000!end

service routing protocols model multi-agent!hostname LE05b!vlan 106   name SRV6!vlan 207   name SRV7!vlan 4094   name "MLAG"   trunk group PEER-LINK!vrf instance A!vrf instance B!interface Port-Channel3   description "PEER-LINK"   switchport mode trunk   switchport trunk group PEER-LINK!interface Ethernet1   description "SP01"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet2   description "SP02"   no switchport   ip address unnumbered Loopback0   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Ethernet3   description "LE05a"   channel-group 3 mode active!interface Ethernet7   description SRV7   switchport access vlan 207!interface Loopback0   description "OSPF/BGP ROUTER-ID"   ip address 10.0.0.52/32   ip ospf area 0.0.0.0!interface Loopback1   description "VXLAN SOURCE-INTERFACE"   ip address 10.0.0.5/32   ip ospf area 0.0.0.0!interface Vlan106   vrf A   arp aging timeout 290   ipv6 address virtual 2001:db8:dc01:a06::1/64!interface Vlan207   vrf B   arp aging timeout 290   ipv6 address virtual 2001:db8:dc01:b07::1/64!interface Vlan4094   no autostate   ip address 10.0.5.2/30   ip ospf network point-to-point   ip ospf area 0.0.0.0!interface Vxlan1   vxlan source-interface Loopback1   vxlan udp-port 4789   vxlan vlan 106 vni 106   vxlan vlan 207 vni 207   vxlan vrf A vni 5001   vxlan vrf B vni 5002!ip virtual-router mac-address 00:11:22:33:44:55!ip routingip routing vrf Aip routing vrf B!ipv6 unicast-routingipv6 unicast-routing vrf Aipv6 unicast-routing vrf B!mlag configuration   domain-id LE05   local-interface Vlan4094   peer-address 10.0.5.1   peer-link Port-Channel3   reload-delay mlag 60   reload-delay non-mlag 30!router bgp 65000   router-id 10.0.0.52   neighbor EVPN peer group   neighbor EVPN remote-as 65000   neighbor EVPN update-source Loopback0   neighbor EVPN send-community   neighbor 10.0.0.1 peer group EVPN   neighbor 10.0.0.2 peer group EVPN   !   vlan 106      rd 10.0.0.52:106      route-target both 65000:106      redistribute learned   !   vlan 207      rd 10.0.0.52:207      route-target both 65000:207      redistribute learned   !   address-family evpn      neighbor EVPN activate   !   address-family ipv4      no neighbor EVPN activate   !   vrf A      rd 10.0.0.52:5001      route-target import evpn 65000:5001      route-target export evpn 65000:5001      router-id 10.0.0.52      bgp default ipv6-unicast      redistribute connected   !   vrf B      rd 10.0.0.52:5002      route-target import evpn 65000:5002      route-target export evpn 65000:5002      router-id 10.0.0.52      bgp default ipv6-unicast      redistribute connected!router ospf 1   redistribute connected   max-lsa 12000!end

config system global    set hostname "FW1a"endconfig system ha    set group-id 1    set group-name "FW1"    set mode a-p    set hbdev "port3" 0    set session-pickup enable    set session-pickup-delay enable    set override enable    set priority 200endconfig system interface    edit "port1"        set mode static    next    edit "LEAF"        set vdom "root"        set type aggregate        set member "port1" "port2"        set lldp-transmission enable        set snmp-index 9    next    edit "LEAF;VRF=A"        set vdom "root"        set device-identification enable        set role lan        set snmp-index 10        config ipv6            set ip6-address 2001:db8:dc01:a00::1/64            set ip6-allowaccess ping        end        set interface "LEAF"        set vlanid 100    next    edit "LEAF;VRF=B"        set vdom "root"        set device-identification enable        set role lan        set snmp-index 12        config ipv6            set ip6-address 2001:db8:dc01:b00::1/64            set ip6-allowaccess ping        end        set interface "LEAF"        set vlanid 200    nextendconfig firewall policy    edit 1        set srcintf "any"        set dstintf "any"        set action accept        set srcaddr "all"        set dstaddr "all"        set srcaddr6 "all"        set dstaddr6 "all"        set schedule "always"        set service "ALL"    nextend

config system global    set hostname "FW1b"endconfig system ha    set group-id 1    set group-name "FW1"    set mode a-p    set hbdev "port3" 0    set session-pickup enable    set session-pickup-delay enable    set override enable    set priority 150end

Problem One - BGP Adjacencies

We decided that we wanted to use a routing protocol between FW1 and the leaves in each vrf to exchange routes. We chose to use eBGP with FW1 residing in AS 65001 and the leaves in AS 65000. The FW should advertise a default route into each vrf; the leaves should advertise directly-connected subnets.

This leads us to our first interesting design choice. The initial plan was for each FW member to establish adjacencies to the leaf-pair it was physically connected to. FW1a would peer with LE03a/b; FW1b would peer with LE04a/b. One benefit of this plan would be that only the leaf-pair connected to the active FW member would receive the default route, ensuring that all traffic to exit the vrf would always hit LE03 when FW1a was active; LE04 when FW1b was active. This plan does not work for multiple reasons. The largest problem is that the configuration is synchronized between FW1a/b, so whichever adjacencies you configure on one member will also exist on the second.

The only viable solution here is to configure FW1a/b to peer with both LE03a/b and LE04a/b, totaling four adjacencies per vrf. This is what that configuration looks like:

FW1a # show router bgpconfig router prefix-list    edit "DEFAULT_ROUTE"        config rule            edit 1                set prefix 0.0.0.0 0.0.0.0            next        end    nextendconfig router prefix-list6    edit "DEFAULT_ROUTE"        config rule            edit 1                set prefix6 ::/0                unset ge                unset le            next        end    nextendconfig router route-map    edit "RM_LEAF_OUT"        config rule            edit 4                set match-ip-address "DEFAULT_ROUTE"            next            edit 6                set match-ip6-address "DEFAULT_ROUTE"            next        end    nextendconfig router bgp    set as 65001    config neighbor        edit "2001:db8:dc01:a00::3"            set advertisement-interval 0            set description "LE03a;VRF=A"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next        edit "2001:db8:dc01:a00::4"            set advertisement-interval 0            set description "LE03b;VRF=A"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next        edit "2001:db8:dc01:a00::5"            set advertisement-interval 0            set description "LE04a;VRF=A"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next        edit "2001:db8:dc01:a00::6"            set advertisement-interval 0            set description "LE04b;VRF=A"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next        edit "2001:db8:dc01:b00::3"            set advertisement-interval 0            set description "LE03a;VRF=B"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next        edit "2001:db8:dc01:b00::4"            set advertisement-interval 0            set description "LE03b;VRF=B"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next        edit "2001:db8:dc01:b00::5"            set advertisement-interval 0            set description "LE04a;VRF=B"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next        edit "2001:db8:dc01:b00::6"            set advertisement-interval 0            set description "LE04b;VRF=B"            set remote-as 65000            set route-map-out6 "RM_LEAF_OUT"        next    end    config redistribute6 "static"        set status enable    endend# Default route to be advertised into each VRF.config router static6    edit 1        set blackhole enable    nextend

interface Vlan100   description "FW1;VRF=A"   vrf A   ipv6 address 2001:db8:dc01:a00::3/64!router bgp 65000   !   vlan 100      rd 10.0.0.31:100      route-target both 65000:100      redistribute learned   !   vlan 200      rd 10.0.0.31:200      route-target both 65000:200      redistribute learned   !   vrf A      rd 10.0.0.31:5001      route-target import evpn 65000:5001      route-target export evpn 65000:5001      router-id 10.0.0.31      bgp default ipv6-unicast      neighbor 2001:db8:dc01:a00::1 remote-as 65001      neighbor 2001:db8:dc01:a00::1 description FW1;VRF=A      neighbor 2001:db8:dc01:a00::1 timers 10 30      redistribute connected   !   vrf B      rd 10.0.0.31:5002      route-target import evpn 65000:5002      route-target export evpn 65000:5002      router-id 10.0.0.31      bgp default ipv6-unicast      neighbor 2001:db8:dc01:b00::1 remote-as 65001      neighbor 2001:db8:dc01:b00::1 description FW1;VRF=B      neighbor 2001:db8:dc01:b00::1 timers 10 30      redistribute connected

interface Vlan100   description "FW1;VRF=A"   vrf A   ipv6 address 2001:db8:dc01:a00::6/64!router bgp 65000   !   vlan 100      rd 10.0.0.42:100      route-target both 65000:100      redistribute learned   !   vlan 200      rd 10.0.0.42:200      route-target both 65000:200      redistribute learned   !   vrf A      rd 10.0.0.42:5001      route-target import evpn 65000:5001      route-target export evpn 65000:5001      router-id 10.0.0.42      bgp default ipv6-unicast      neighbor 2001:db8:dc01:a00::1 remote-as 65001      neighbor 2001:db8:dc01:a00::1 description FW1;VRF=A      neighbor 2001:db8:dc01:a00::1 timers 10 30      redistribute connected   !   vrf B      rd 10.0.0.42:5002      route-target import evpn 65000:5002      route-target export evpn 65000:5002      router-id 10.0.0.42      bgp default ipv6-unicast      neighbor 2001:db8:dc01:b00::1 remote-as 65001      neighbor 2001:db8:dc01:b00::1 description FW1;VRF=B      neighbor 2001:db8:dc01:b00::1 timers 10 30      redistribute connected

Alright, that was relatively easy. Or was it..?

Problem Two - BGP does not establish

When deploying the configuration above, we noticed that when FW1a is the active member, it would only establish BGP adjacencies to its physically connected neighbors LE03a/b. The adjacencies to LE04a/b would not come up. If we failed over the cluster to FW1b then it would establish adjacencies to LE04a/b, but not LE03a/b. ICMPv6 worked just fine, the Fortigate could reach all leaves in the subnet. The NDP table show neighbor addresses MAC-addresses correctly, also as expected. But BGP just wouldn't establish.

Finding the answer require some information gathering and digging through packet captures. Let's analyze the output below:

LE04b(vrf:A)#show bgp evpn route-type mac-ip vni 100 detail "L2VNI"BGP routing table entry for mac-ip 0009.0f09.0100 RD: 10.0.0.31:100 Paths: 1 available  Local    10.0.0.3 from 10.0.0.1 (10.0.0.1)      Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best      Originator: 10.0.0.31, Cluster list: 10.0.0.1      Extended Community:        Route-Target-AS:65000:100        TunnelEncap:tunnelTypeVxlan      VNI: 100 "L2VNI + L3VNI"BGP routing table entry for mac-ip 0009.0f09.0100 2001:db8:dc01:a00::1 RD: 10.0.0.32:100 Paths: 1 available  Local    10.0.0.3 from 10.0.0.1 (10.0.0.1)      Origin IGP, metric -, localpref 100, weight 0, tag 0, valid, internal, best      Originator: 10.0.0.32, Cluster list: 10.0.0.1      Extended Community:        Route-Target-AS:65000:100        Route-Target-AS:65000:5001        TunnelEncap:tunnelTypeVxlan        EvpnRouterMac:5001.0000.0032      VNI: 100      L3 VNI: 5001

The above output shows the EVPN routes learned for VNI 100. We have received two routes; one containing only L2VNI information and the other containing both L2VNI and L3VNI. We can see this because the second entry tell us the IPv6-address in addition to the MAC-address. Each route generate its own set of entries in their respective table. For example, the L2VNI-only route generates these MAC/VXLAN-address table entries:

"L2VNI"LE04b(vrf:A)#show mac address-table vlan 100Vlan    Mac Address       Type        Ports      Moves   Last Move----    -----------       ----        -----      -----   --------- 100    0009.0f09.0100    DYNAMIC     Vx1        1       1:20:08 ago LE04b(vrf:A)#show vxlan address-table vlan 100VLAN  Mac Address     Type      Prt  VTEP             Moves   Last Move----  -----------     ----      ---  ----             -----   --------- 100  0009.0f09.0100  EVPN      Vx1  10.0.0.3         1       1:20:08 ago

The L2VNI-only EVPN route generated the MAC-address table entry above. It also created an entry in the VXLAN-address table, telling VXLAN behind VTEP the MAC-address lives.

Moving on to the L2VNI+L3VNI route, this route generated the following ND-entries and static route:

"L2VNI + L3VNI"LE04b(vrf:A)#show ipv6 neighbors 2001:db8:dc01:a00::1IPv6 Address           Age Hardware Addr   Interface2001:db8:dc01:a00::1   -   0009.0f09.0100  Vl100, Vxlan1 LE04b(vrf:A)#show ipv6 route 2001:db8:dc01:a00::1  B I      2001:db8:dc01:a00::1/128 [200/0]           via VTEP 10.0.0.3 VNI 5001 router-mac 5001.0000.0032 C        2001:db8:dc01:a00::/64 [0/0]           via Vlan100, directly connected

The L2VNI+L3VNI route gave our leaf enough information to generate an entry in the IPv6 Neighbor-table aswell as create a static ::/128 host-route pointing behind a VXLAN VTEP.

Packet captures

I couldn't find anything conclusive based on the above output, so I resorted to performing a packet capture. Let's analyze the packets captured on FW1:

"Request"  Ethernet II, Src: 0009.0f09.0100, Dst: 5001.0000.0042  802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::1, Dst: 2001:db8:dc01:a00::6  Internet Control Message Protocol v6    Type: Echo request "Reply"  Ethernet II, Src: 5001.0000.0032, Dst: 0009.0f09.0100  802.1Q Virtual LAN, PRI: 0, DEI: 0, ID: 100  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::6, Dst: 2001:db8:dc01:a00::1  Internet Control Message Protocol v6    Type: Echo reply

The above capture show an ICMPv6 request/reply from FW1 to LE04b. In the request we can see that the destination MAC-address is set to 5001.0000.0042, the MAC-address of LE04b. However, the reply was sent from 5001.0000.0032. That MAC-address belongs to LE03b. That's weird.

Let's continue and look at the packets captured on between SP01 and LE04b:

"Request"  Ethernet II, Src: 5001.0000.0001, Dst: 5001.0000.0042  Internet Protocol Version 4, Src: 10.0.0.3, Dst: 10.0.0.4  User Datagram Protocol, Src Port: 41830, Dst Port: 4789  Virtual eXtensible Local Area Network    VXLAN Network Identifier (VNI): 100  Ethernet II, Src: 0009.0f09.0100, Dst: 5001.0000.0042  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::1, Dst: 2001:db8:dc01:a00::6  Internet Control Message Protocol v6    Type: Echo (ping) request (128) "Reply"  Ethernet II, Src: 5001.0000.0042, Dst: 5001.0000.0001  Internet Protocol Version 4, Src: 10.0.0.4, Dst: 10.0.0.3  User Datagram Protocol, Src Port: 53928, Dst Port: 4789  Virtual eXtensible Local Area Network    VXLAN Network Identifier (VNI): 5001  Ethernet II, Src: 5001.0000.0042, Dst: 5001.0000.0032  Internet Protocol Version 6, Src: 2001:db8:dc01:a00::6, Dst: 2001:db8:dc01:a00::1  Internet Control Message Protocol v6    Type: Echo (ping) reply (129)
*MAC-address 5001.0000.0001 belongs to SP01*

This time the ICMPv6 packets are VXLAN-encapsulated. The VXLAN-packet was received from LE03 (10.0.0.3) to LE04 (10.0.0.4) on L2VNI 100. The inner packet has FW1 as the source MAC-address and LE04b as the destination(5001.0000.0042). So far so good.

When looking at the reply from LE04b back to FW1, we see a few differences. One difference is that LE04b decided to use the L3VNI 5001. We can also see that the inner packet destination MAC-address is set to LE03b (5001.0000.0032) instead of FW1.

Packet capture diagrams

In case the output above was difficult to parse, I took the liberty of creating two diagrams to help visualize what's happening. I also simplified the MAC-addresses to instead show hostnames. Let's see if we can spot what's going wrong:

ICMP packet walk from FW1 to LE04b

The above diagram has pieced together the packet captures we did on the FW1-LE03b and SP01-LE04b links, showing the ICMP request as it travels from FW1 to LE04b. Everything looks correct, so let us review the opposite direction:

ICMP reply from LE04b back to FW1

As we can see in the diagram above, there are lots things going wrong. One example is that the inside packet generated by LE04b has LE03b as its destination MAC-address, not FW1. But this is just a symptom. The root cause is that LE04b is using L3VNI 5001 to get the packet to FW1, not L2VNI 100.

We figured it out!

The above output tells us everything we need to know! As we learned in the L3VPN article, L3VNI require some gymnastics to be performed by the sending and receiving leaf:

Sending leaf (LE04b)

LE04b(vrf:A)#show ipv6 route 2001:db8:dc01:a00::1 B I      2001:db8:dc01:a00::1/128 [200/0]           via VTEP 10.0.0.3 VNI 5001 router-mac 5001.0000.0032 (LE03b)

According to the host-route installed on LE04b, for packet to reach FW1, LE04b must VXLAN-encapsulate the packet with VNI 5001 and set LE03b as the destination MAC-address. LE04b is a good boy and does as requested, causing all sorts of problems.

Receiving leaf (LE03b)

LE03b receive the VXLAN packet and see that VNI 5001 matches vrf A. When the inner Ethernet frame is processed LE03b realizes that the destination MAC-address is its own MAC-address, so it knows it is the intended destination for the frame. At this point the Ethernet frame has done its job and its header is discarded.

Now LE03b examines the IP header. It sees that the destination IP 2001:db8:dc01:a00::1 (FW1) is not a local IP-address, so LE03b is not the intended destination for the IP packet. A routing lookup is performed to learn that the destination is directly connected via the Vlan100 interface.

LE03b generates a new Ethernet header, setting itself (5001.0000.0032) as the source and FW1 (0009.0f09.0100) as the destination. The Ethernet header is prepended to the IP packet and the frame is sent to FW1.

Not the solution

I had to spend a lot of time thinking about how to solve this. My initial plan was to utilize the no redistribute host-route command on the FW1 vlans. This would stop the L2VNI+L3VNI EVPN route from being advertised into the VRF, which in turn stop the other leaves from generating that pesky FW1 ::/128 host-route:

router bgp 65000   !   vlan 100      no redistribute host-route   !   vlan 200      no redistribute host-route

However, not advertising the host-route creates a problem on LE05 as this leaf-pair relies on the host-route to figure out which leaf-pair FW1 is currently connected to. Without the host-route, LE05 only knows how to reach 2001:db8:dc01:a00::/64. This route is advertised by both LE03 and LE04, so traffic would be load-balanced between the two leaf-pairs. This would cause a suboptimal routing as whenever FW1a was active, traffic along the LE05-SP01-LE04 path would have to be VXLAN-forwarded back via LE04-SP01-LE03 before it could be sent to FW1.

We need a way to selectively block the L3VNI routes on LE03 and LE04 while still advertising the host-routes to LE05.

Problem Two Solution

I ended up creating an inbound EVPN route-map on LE03 and LE04 to reject any route containing both the 65000:100 and 65000:5001 extended communities. Let's look at the config below:

ip extcommunity-list ECL_FW1_L2VNI permit 65000:100ip extcommunity-list ECL_FW1_L2VNI permit 65000:200ip extcommunity-list ECL_FW1_L3VNI permit 65000:5001ip extcommunity-list ECL_FW1_L3VNI permit 65000:5002 route-map "RM_EVPN_IN" deny 10   match extcommunity ECL_FW1_L2VNI   sub-route-map "RM_MATCH_FW1_L3VNI"!route-map "RM_EVPN_IN" permit 100 route-map "RM_MATCH_FW1_L3VNI" permit 10   match extcommunity ECL_FW1_L3VNI!route-map "RM_MATCH_FW1_L3VNI" deny 100 router bgp 65000   neighbor EVPN route-map "RM_EVPN_IN" in

To make this work I had to combine two route-maps as it was the only way that would allow me to match only if two communities were present on the same route. Let's start with RM_EVPN_IN that is applied inbound on our EVPN bgp neighbors SP01 and SP02.

Whenever a route was received from these neighbors then the route-map RM_EVPN_IN deny 10 would first be checked. This entry would only match if the route contained extended communities 65000:100 or 65000:200, as configured in ECL_FW1_L2VNI. If there was a match, we would then use the sub-route-map command to call RM_MATCH_FW1_L3VNI which would check if the route contained extended communities 65000:5001 or 65000:5002. If both route-maps found a match, the route would be rejected and no ::/128 host-route would be installed.

This solution stops LE03 and LE04 from installing the FW1 host-route while still allowing LE05 to install it and avoid suboptimal routing. We can see the routes from LE03 being rejected on LE04b below:

LE04b#show bgp evpn route-type mac-ip           Network                Next Hop         * >      RD: 10.0.0.31:100 mac-ip 0009.0f09.010                                 10.0.0.3         * >      RD: 10.0.0.31:200 mac-ip 0009.0f09.010                                 10.0.0.3         * >      RD: 10.0.0.32:100 mac-ip 0009.0f09.010                                 10.0.0.3                  RD: 10.0.0.31:100 mac-ip 0009.0f09.0100 2001:db8:dc01:a00::                                 PolicyReject          RD: 10.0.0.32:100 mac-ip 0009.0f09.0100 2001:db8:dc01:a00::                                 PolicyReject          RD: 10.0.0.31:200 mac-ip 0009.0f09.0100 2001:db8:dc01:b00::                                 PolicyReject

Note: I must confess that I do not know if there is a better way to achieve this. The EVPN route filtering options are limited, atleast on the Arista Lab platform that I'm using. I get the feeling that the design I'm going for is not something you're supposed to do, that I'm doing something wrong. Anyway, whatever.

We now have a fully functioning topology where FW1 can establish BGP adjacencies to the LE03 and LE04 leaves. LE05 receive the host-route, allowing it to forward traffic to the leaf closest to FW1. Everything is great, but then a FW1 HA failover is triggered...

Problem Three - HA failover Convergence

You are doing a great job keeping up thus far. We're getting close to the end but we have one more big problem to solve.

What happens to the routing when the FW1 cluster perform a failover? If I perform a redundancy test and kill FW1a while pinging from SRV6 to SRV7, I can see that traffic continues to flow for about 10 or seconds. After that point no traffic is getting through. Then, after around 180 seconds or so, traffic start flowing again. This is of course unacceptable in this day and age, a HA failover should be seamless.

To understand this problem we need to explain a few things. The first thing we need to know is what state is synchronized between the two FW1 members:

  • TCP sessions flowing through the cluster are synchronized thanks to the set session-pickup enable command under config system ha. This allows FW1b to seamlessly continue traffic inspection when it becomes the active member. As the Fortigate is a stateful device, FW1b would not allow traffic for an established TCP session if there was no matching entry in its session table. So synchronizing the session table is vital for a failover to not impact active sessions flowing through it.

  • The kernel routing table on the active member is also synchronized. This table is the FIB and is used to make routing decisions for incoming packets. Synchronizing this table ensures that FW1b can continue forwarding packets after a failover, before it has had a chance to establish any routing protocol adjacencies. The default route-TTL is 10 seconds.
    When this timer expires, the synchronized kernel routes are removed. This essentially gives FW1b member a 10 second window to establish new routing protocol adjacencies to learn new routes and replace the ones learned from FW1a.

  • BGP sessions are not synchronized to the passive member. This means that any established BGP session has to be reestablished on failover. The reason this is a problem for us is that, by default, BGP will not accept more than one active session per adjacency. If LE03a has an established adjacency to FW1 and it suddenly receive a BGP Hello from the same neighbor, LE03a will ignore it. Not until the current BGP session has timed out will LE03a allow a new BGP session to establish.

Based on this information we can figure out what's happening here. In the first ten seconds after failover, the new primary firewall can keep forwarding traffic based on the synchronized kernel routes. During this time the firewall attempts to establish the BGP adjacencies it finds in its configuration, but receive no response from the leaves. After ten seconds, these routes time out and are removed from the kernel routing table. As the BGP adjacencies on the leaves are still active, they won't allow the firewall to establish a new session until after 180 seconds when their current BGP session finally time out.

One way to solve this could be to make sure the route-TTL is set to 200 seconds so that BGP with its 180 second default hold timer has a chance to establish before the cluster-learned kernel routes expire on FW1b. Alternatively, we could lower the BGP hold time to 9 seconds or less to accomplish the same goal.

However, neither approach fully work. The reason is that when the BGP session is timed out LE03a will withdraw the default route it learned from FW1a. This route withdrawal is enough to create some small window in time where no default route exist inside the VRF. Traffic from any server that can't be routed by any of the leaves due to missing routes will yield an ICMP Host unreachable message back to that server. The server OS will then terminate that TCP session.

Any subsequent packet that was received for that TCP session will be dropped as there's no matching connection. If that was a long-running database synchronization process, the application now has to handle the fact that the TCP session has to reestablish. Some applications handle this better than others. So even if the route is gone for only a second, the impact may be great.

Problem Three Solution

To the rescue comes BGP Graceful Restart. This is a feature that allow an established BGP adjacency to gracefully restart. By configuring set graceful-restart enable on FW1 and utilizing the enabled-by-default graceful-restart-helper command on the Arista leaves, LE03a and its leaf buddies will allow FW1b to establish a BGP adjancy before the stale adjacency to FW1a times out. As the same session is restarted there is no withdrawal/readvertisal of the default route, avoiding the ICMP Host unreachable problem.

config router bgp   set graceful-restart enableend
  </div>
*Graceful restart capability is automatically advertised to all BGP neighbors. *

Note: Applying this will hard restart all adjacencies. Also, this command unfortunately means that BGP neighbor-ranges nor neighbor-groups can be used on FW1. This is because the firewall has to initialize the adjacencies automatically when it comes online. When using a neighbor-range, it can only accept BGP adjacencies from neighbors, not initialize its own sessions.

The BGP sessions usually come up within a couple of seconds after a failover so there is generally little reason to increase the route-TTL, but if you may do so if it is required in your environment. Graceful Restart should not be used together with BFD as BFD would take down the old adjacency and cause a route withdrawal before a new adjacency even has a chance to establish.

Conclusion

The seemingly simple customer requirement of connecting firewall members to different leaf-pairs ended up sending me down a deep EVPN-shaped rabbit hole. From figuring out how L3VNIs take priority over L2VNIs to learning why Graceful Restart is a better option than BFD in this topology, I had a lot of new ground to cover.

I want to thank you for getting all the way to the end of this article. I hope you learned something and that reading it was worth your time.

If you want more to read, please consider other posts in my VXLAN series:

References:
Copyright 2021-2026, Emil Boklund.
All Rights Reserved.