NSX-T Logical Routing

Logical Routing in NSX-T

In NSX-T, logical routing provides an optimised and scalable way of handling east-west a north-south traffic. Logical routing is used in many ways including, support for single or multi-tenant deployment models such as containerised workloads and multi-cloud environments.  Separation of tenants and networks.  Optimised routing path and centralised services in the data centre. The distributed routing architecture provides optimal routing paths. Routing is done closest to the source. For example, traffic from two VMs on different subnets residing on the same host can be routed in the kernel. The traffic does not need to leave the host to get routed. This method helps avoid hairpinning. Also, to extend logical networks to physical environments.
Logical routing is distributed and decoupled from the underlying hardware. Basic forwarding decisions are made locally on the prepared transport nodes.
North-south routing enable tenants to access public networks. Traffic direction is entering or leaving the tenant administrative domain.
East-west traffic flows between various networks within the same tenant. The traffic is sent between logical networks such as the logical switches under the same administrative domain.
There are two types of gateways in NSX-T that provide a tired architecture. Tier-0 and Tier-1. Although,  networking can work using only a Tier-0, adding a Tier-1 gateway has the following benefits;
  • Supports tenant isolation
  • Includes separate controls for different administrative domains
  • Eliminates physical dependency when new tenants are introduced

Prerequisites for Logical Routing

For logical routing to work, the following requirements must be met:
  • The NSX management cluster must be be formed and available.
  • Transport zones and N-VDS/VDS should be created.
  • Hypervisors must be prepared as NSX-T transport nodes and added to the management plane.
  • Transport nodes must be attached to the appropriate transport zones
  • An N-VDS/VDS instance must be created on each on each transport node
  • The NSX Edge nodes must be deployed and preconfigured according to the requirements.

Tier-0 Gateways

Gateways are distributed across the kernel of each host. A gateway can be deployed as either a Tier-0 or a Tier-1 gateway:
  • Tier-0 gateways provide north-south connectivity.
  • Tier-1 gateways provide east-west connectivity.
Both Tier-0 and Tier-1 gateways support stateful services, such as NAT. Stateful services are centralized on gateway nodes.
  • A Tier-0 gateway performs the functions of a Tier-0 logical router. It processes traffic between the logical and physical networks.
  • North-South traffic
  • Provides north-south traffic, can be used as east-west if only Tier-0 is used without a Tier-1.
  • An Edge node can support only one tier-0 gateway or logical router. When you create a Tier-0 gateway or logical router, make sure you do not create more Tier-0 gateways or logical routers than the number of Edge nodes in the NSX Edge cluster.
  • Tier-0 usually configured by the provider.
  •  An Edge Cluster must be deployed.
  • A Tier-0 gateway has downlink connections to tier-1 gateways and uplink connections to physical networks.
  • Tier-0 runs BGP and peers with physical routers
  • Can have the following interfaces:
    • External Interface
      • Connected to the physical infrastructure/router. Static and BGP are supported on this interface. (Was referred to as uplink on earlier releases.
    • Service Interface
      • Internface for connecting VLAN segments to provide connectivity to VLAN backed physical or virtual workloads. Service interface can also be connected to overlay segments for Tier-1 standalone load balancer use cases
    • Intra-Tier Transit Link
      • Internal link between distributed router (DR) and Service Router (SR). A transit overlay segment is auto plumbed between DR and SR and each get an IP address assigned in subnet by default.
    • Linked Segments
      • Interface for connecting to an overlay. This interface was referred to as downlink interface in previous releases.
  • Supports equal-cost multipath (ECMP) routing to upstream physical gateways.

Tier-1 Gateways

  • A Tier-1 gateway has downlink connections to segments and uplink connections to Tier-0 gateways.
  • You can configure route advertisements and static routes on a Tier-1 gateway. Recursive static routes are supported.
  • A Tier-1 gateway is typically connected to a Tier-0 gateway in the northbound direction and to segments in the southbound direction.
  • Tier-1 cannot access the outside world without Tier-0
  • Load balancing only configured in Tier-1
  • Used for east-west traffic
  • Downlink connects segments to gateways
  • Its recommended not to add the Tier-1 gateway to an Edge cluster,  this will automatically create an SR on the Edge cluster for the Tier-1. this might lead to sub-optimal routing and void any ECMP that will be available on the Tier-0 gateway. Leaving this option out will allow ECMP on the Tier-0.

Single-Tier Topology

In a single tier deployment, network segments are connected directly to the Tier-0 gateway.

Two-Tier Routing

The concept of multi-tenancy is built into the routing mode. The top-tier gateway is referred to as Tier-0 gateway whilst the bottom tier gateway is Tier-1 gateway. This structure gives both provider and tenant administrators complete control over their services and policies.

Gateway Components

Distributed Router (DR)

A DR is always created when creating a gateway
  • Provides distributed east-west routing functionality
  • Provides basic packet-forwarding functionality
  • Spans all transport nodes (hypervisors and Edge nodes)
  • First hop routing performed on the hypervisors

Service Router (SR)

An SR is automatically created on the edge node when you configure the gateway with an edge cluster
  • Provides north-south routing functionality.
  • Provides routing and centralised services, such as NAT, load balancing etc.
  • Created only on NSX Edge nodes that are part of an edge cluster. Not on a hypervisor.
  • Required if the Tier-0 gateway is configured with uplinks.

Edge Nodes and Edge Clusters

Edge nodes run services that cannot be distributed to hypervisors
  • An Edge node can form as bare metal or virtual machine.
  • An Edge node can only be part of one cluster.
  • An Edge node can only host Tier-1 gateway.
An Edge cluster is a group of Edge nodes.
  • Allows for resilience.
  • Compulsory if you need to create a Tier-0 with uplinks.
  • Must exist if you want to create a Tier-0 or a Tier-1 gateway with stateful services, i.e. NAT, load balancer, DHCP etc.
  • A cluster can contain a single Edge node and a maximum of 10 Edge nodes.

Edge node VM sizing

Small  –  4GB memory  –  2 vCPU  –  200GB Disk space
Medium  –  8GB memory  –  4 vCPU  –  200GB Disk space
Large  –  32GB memory  –  8 vCPU  –  200GB Disk space
Extra large  –  64GB memory  –  16 vCPU  –  200GB Disk space


Equal Cost Multipath Routing (ECMP)
  • It increases the north-south communication bandwidth by combining multiple uplinks.
  • ECMP routing provides fault tolerance for any failed paths
  • A maximum of eight ECMP paths are supported
  • Hashing is based on 2-tuple IP source and destination addresses
  • ECMP routing is only available on Tier-0 gateways.


Unicast – One to one
Broadcast – one to all
Multicast – one to many (only to those who want the traffic)
Avoids unnecessary broadcast of traffic
Only runs on the tier-0 gateways. However, a multicast network needs to be configured in the physical network. . This isn’t a multicast setup, its there to consume the multicast that’s already on the network.
Multicast protocols include:
  • IGMP
    • Internet Group Management Protocol
      • A layer 2 protocol to establish multicast group memberships between hosts and adjacent routers
  • IGMPv2
    • IGMP version 2 enables hosts to signal leaving a multicast group
  • IGMP Snooping
    • This mechanism is implemented in layer 2 devices to maintain tables with clients that have solicited to join multicast groups. Multicast traffic is not broadcasted to all the links.
Protocol-independant Multicast (PIM)
    Layer 3 routing protocol to route multicast traffic between different networks.
  • PIM Sparse Mode (PIM-SM)
    • A type of PIM protocol where multicast traffic is not forwarded until a downstream router request is received. PIM-SM uses the Rendezvous Point (RP) as a meeting point for sources and receivers of multicast data.
    • Rendezvous Points are only out in the physical, not in NSX
  • PIM Bootstrap 
    • A protocol that ensures that all routers in the PIM domain have the same RP configured without requiring manual configuration.
Multicast limitations 
  • Multicast on KVM not supported
  • Cannot be configured on Tier-1 gateways or VRF.
  • Only activated in active/standby mode
  • Its not supported in layer 2 bridges
  • Tier-0 gateways cannot be configured as the RP or BSR (bootstrap router)
  • PIM can only be enabled I one uplink per SR

Unicast traffic

The N-VDS maintains a table for each segment/logical switch it is attached to. Either MAC address can be associated with a virtual NIC of locally attached VM or remote TEP when the MAC address is located on a remote transport node reached via the tunnel identified by a TEP.


BGP is enabled by default, you must set the local AS and configure neighbours.
Configure BGP on the tier-0 gateway to add in the next physical routers in the environment to communicate with. Which routers are neighbours, what are their automous ID numbers.
If the automous IDs are different to the nsx, then its considered external BGP, if the automous is the same, then it’s considered internal BGP.
Advanced BGP
Inter-SR Routing – increases resiliency by avoiding traffic black hole if only a single uplink is faulty. This is only available is T-ier-0 is active/active. SR1 (Service routers)  could hand traffic to SR2 to continue the routes. Bidirectional Forwarding Detection (BFD) is the end to end protocol that can detect that can detect forwarding paths.
  • Fast detection of failed node (edge or physical) or uplink failure.
  • Protects both static routes and BGP peers

Autonomous System (AS)

AS is also referred to as a routing domain. An autonomous system is assigned a globally unique number, sometimes called an Autonomous System Number (ASN). On the internet, an AS is the unit of router policy, either a single network or a group of networks controlled by a common network administrator on behalf of a single administrative entity, such as a university, ISP, business enterprise etc.
Networks within an AS communicate routing information to each other using an IGP (interior Gateway Protocol). An AS system shares routing information with other AS using the BGP.

Allow AS-In

Allow AS-In tells BGP when it learns a route from its own AS to allow that route, rather than ignoring it. By default, BGP drops received routes that contain their own ASN to avoid loops. Not on by default. This help prevents routing loops. Use case is a customer with two sites interconnected to the same ISP, routes received from BGP peer can contain the same ASN.  The BGP Allow AS-In configuration can be used to accept those routes.

IP Prefix list

IP Prefix lists are used to filter routes. Filter routes that are advertised in to BGP or from BGP.

VRF Lite

New to NSX-T 3.0 – Used to sub divide the Tier-0 gateway into mini T-0 gateways, used to minimise the number of T0 and edge nodes. Useful for dividing tenants without deploying additional Tier-o gateways and Edge nodes.
Virtual Routing and Forwarding (VRF) routing technology allows the coexistence of multiple routing instances in one routing device. VLAN tagging is used to separate the VRFs in the uplink segment that connects with the external devices.
There are a couple of limitations of using VRF, VPN and load balancers are not compatible with the VRF gateway.