Thursday, July 9, 2015

What CSP wants from Neutron !!! (Part 1: DVR)


Openstack Neutron project provides API abstraction to manage network elements in cloud environment.  CSPs as AT&T, Verizon etc has shown interest in deploying Telco Cloud, based on Openstack orchestration. They have certain key requirements from Neutron to give carrier grade  performance & scale.

Key requirements are :

1)      DVR (distributed virtual router)

2)      Dynamic Routing

3)      VLAN trunking

DVR (distributed virtual router)

To understand DVR, we need to understand:  

-        Source NAT: Network Address Translation is an Internet standard that allows hosts on local area networks to use one set of IP addresses for internal communications and another set of IP addresses for external communications. A LAN that uses NAT is referred as natted network. Source NAT is performed on packets that are originated from a natted network. A NAT router replaces the private source address of an IP packet with a new public IP address as it travels through the router. A reverse operation is applied to the packets travelling in the other direction. This way Network administrator hides source IP address before entering into public network.

 
-        Destination NAT: Destination NAT is performed on packets that are destined to the natted network. A NAT router performing destination NAT replaces the destination IP address of an IP packet as it travel through the router towards a private network. This way Network Administrator hides destination IP address before entering into private network.

IP packet format is shown below. When firewall hides Source IP address, it called SNAT, and when it hides destination, it called  DNAT.



 

 
 
 
 
 
       Floating IP: Floating IPs are just publicly routable IPs that you typically buy from an ISP (the one that you put on the firewall in the above example). Users can allocate them to their instances, thus making them reachable from the outside world. Floating IPs are not allocated to instances by default. If an instance dies for some reason, the user does not lose the floating IP—it remains his own resource, ready to be attached to another instance. Router performs the Destination NAT (DNAT) to rewrite packets from the floating IP address (chosen from a subnet on the external network) to the internal fixed IP (chosen from a private subnet that is behind the router).
 
 

 

-        East West Traffic: East-West traffic is primarily comprised of communication between applications hosted on physical and virtual machines, and VM to VM interactions within the DC. “North-South” traffic is primarily composed of traffic that enters and exits the DC, and generally includes queries, commands, and specific data either being retrieved or stored.
Problem Statement
Today Neutron L3 Routers are deployed in specific Nodes (Network Nodes) where all the Compute traffic will flow through. This lead to following bottlenecks:
-        East West Traffic
VMs traffic that belong to the same tenant & same subnet, switched by native hypervisor’s L2 agent, but traffic on a different subnet has to hit the Network Node to get routed between the subnets. This because L2 agent can’t route based on Layer 3 IP address. Hence traffic on different subnet, even if destination VM is residing on same physical server has to forwarded to Network node, where Layer 3 agent resides. This would affect Performance.
-        North South Traffic
As mentioned earlier, Floating IP are routable public IPs and mapped to private IPs. Today Floating IP (DNAT) translation done at the Network Node. External network gateway port is available only at the Network Node. So north south traffic i.e. traffic intended for the External Network from the VMs have to go through the Network Node. In this case the Network Node becomes a single point of failure (SPOC)  and also the traffic load will be heavy in the Network Node. This would affect the performance and scalability.
 
Solution
L3 agents with DNAT functionality and Floating IP name space should be part of Compute node. Distributed Virtual Router implements the L3 agents across the Compute Nodes, so that tenants’ intra VM communication(East-West traffic)  will occur without hitting the Network Node. Neutron Distributed Virtual Router implements the Floating IP namespace on every Compute Node where the VMs are located. In this case the VMs with Floating IPs can forward the traffic to the External Network without reaching the Network Node. (North-South Routing).
See figure below:
 
Solution Implementation
-        Current ML2 L3 agents should be running on each and every compute node. Existing L3 agents required to be DVR aware. New enhanced L3 agent should be working on “centralized”(existing network node) and “Dvr”(on compute node) mode.
-        Enhanced L2 agent e.g L3 plugin for OpenVswitch. OpenVswitch should interface with L3 plugin to acquire routing capabilities.
-        Enhanced Neutron REST API for DVR
Reference:
 ( VLAN trunking & BGP routing will be explained in next posts)