Monday, July 10, 2017

Telco Cloud Capacity Planning(Flavors & HA)

Telco Cloud Capacity planning is complex task. It involves to design capacity for various types of workloads, with specific requirements from underlying hardware.  Capacity design also includes distribute Datacenter server footprint( number of physical server/host a single Datacener can host due to space limitation). 

VM placement strategy is  key item in Cloud Capacity design. 

VM placement framework 
Cloud orchestration system use VM placement strategies & tools to place VM on correct physical Host. In Telco Cloud context, CSPs are concerned that their VM should be placed on physical host, which has enough capabilities to offer Telco grade performance & resiliency. 
Cloud should support various workloads with different resource (CPU, RAM, Disk),  & performance (latency, throughput) requirements. 

When user instantiates VM, Openstack Nova scheduler finds correct physical host that meet user’s workload requirements. To achieve precise  VM placement, Openstack Nova has detailed filter framework (see https://docs.openstack.org/mitaka/config-reference/compute/scheduler.html) . 
Based on filtering & weighing (prioritization of filters), Nova scheduler selects physical host which has required capabilities that meets Telco application needs. 

Telco workloads have mainly three types of VM placement needs: 
1) Resource Needs: 
CPU, Memory & Disk: These are very basic requirements. Selected host should have enough resources to offer. 
2) Deterministic Performance Needs:
Media (voice/video) processing VMs are latency prone.  Such latency prone VMs should be hosted on physical server with CPU pinning, NUMA aware  and supporting Huge Pages. 
  •  CPU pinning: In hyperthread architecture, each Physical CPU is have two threads. In Cloud environment, each thread can be shared among 2, 4 or more virtual CPUS e.g oversubscription ratio as 2:1 or 4:1. 
For Telco workloads, to avoid latency and high CPU processing requirements, user wants to make sure that each virtual CPU is fixed with physical CPU, e.g oversubscription ratio as 1:1 ( shown in figure below).  

 
This mapping table between virtual CPU and physical CPU ensures that packet from vCPU is directed towards particular Physical CPU .Without CPU pinning, hypervisor has to randomly select available CPU, increasing packet processing latency.  This will reduce latency & ensure that Telco work load gets enough CPU processing power, to mitigate noisy neighbor problem. There are various options to implement CPU pinning as described at https://networkbuilders.intel.com/docs/CPU_Pinning_With_Openstack_nova.pdf.
  •  NUMA awareness: Modern, multi-socket x86 systems use a shared memory architecture that describes the placement of the main memory modules with respect to processors in a multiprocessor system. In a NUMA-based system, each processor has its own local memory controller that it can access directly with a distinct performance advantage. It is advisable that user wants to select Numa aligned physical CPU for their virtual CPU. Hence in CPU pinning mapping table, designer has to make sure that mapped physical CPU has NUMA capabilities. Numa topology is hardware feature and during placement Nova scheduler has to select Numa aligned physical  host. NUMA awareness is one of the Nova scheduler filters.(Ref: http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/ )


1 (3)  Throughput Needs
  • Telco workloads, specifically Media VM (VM which carry live voice/video traffic) may have high throughput requirements such as 3 MPPS.  Technologies such as DPDK & SRIOV are able to meet Telco workloads’ high throughput requirements.. (details: http://www.metaswitch.com/the-switch/accelerating-the-nfv-data-plane ). DPDK & SR-IOV are hardware features and Nova scheduler has to select DPDK/SR-IOV enabled host, when required by tenants.


VM Placement Enablers

Flavors
To indicate workload needs, Openstack has “Flavors”. Flavors represents the resource templates that indicates Nova scheduler about VM placement needs e.g flavor m1.tiny indicates resource requirements of 1 vcpu, 1GB Memory and 512 MB disk.. When Nova schedulaer reads flavor as m1.tiny in ‘Create VM’ request, it searches for physical Host that has required vCPU, Memory and Disk available. ( see https://docs.openstack.org/admin-guide/compute-flavors.html)
Apart from resource, Nova scheduler requires to select host with Performance(compute intensive) & Throughput (Network intensive) requirements of tenant. Openstack uses Host Aggregates(HA) to provide Nova scheduler capability to select required hardware configuration as indicated in “Create VM” request. ( Ref:https://www.datadoghq.com/blog/openstack-host-aggregates-flavors-availability-zones/
Host Aggregates
HA is grouping of physical host with similar capabilities e.g all Numa aware hosts are grouped under NA and all Dpdk aware hosts are grouped under DK.  Once an aggregate is created, administrators can then define specific flavors from which tenants can choose to place their virtual machines. Flavors are used by tenants to indicate Nova scheduler, the type of hardware that will host their instance. (Ref: https://www.datadoghq.com/blog/openstack-host-aggregates-flavors-availability-zones/ )
For example flavor name DK1c1r512d, indicates that tenant wants to place VM at Dpdk aware physical host, which has 1cpu, 1GB memory & 512GB disk space available.

HA capacity planning
Openstack controller can support n number of compute nodes and each datacenter has certain footprint limitation e.g datacenter can host only 500 servers. Once that capacity is identified, Cloud administrator can start HA capacity planning. Assume that CSP DC has limitation of 500 physical hosts.  Cloud administrator has following requirements:
-        Total two AZs
-        Total two HA( general (GE) & special (SP)
Based on these requirements, Cloud admin will divide 500 Physical hosts as
-        250 hosts per AZ(500/2)
Figure below shows two scenarios of Host distribution.
Flavor
Capacity
Forecast
Designed
Physical Hosts
Flavor
Capacity
Forecast
Designed
Physical Hosts
 GE
50%
125
 GE
33%
83
NA
20%
50
NA
33%
83
DK
30%
75
DK
33%
84

Total
250

Total
250
Cloud Capacity Planning Essentials 
Two essential design requirements CSP should consider, for optimum VM placement:
1)      Design their Cloud flavors based on various types of workload they are going to support  
2)      Design Host Aggregates based their Capacity forecast. Incorrect capacity forecast, may lead to imbalanced  hardware resources  utilization e.g  if GE flavor consumes less than 33% (Table 1), 33 physical hosts seems overprovisioned(lower utilization), while NA flavor consumption (50%) is more than designed(33%), NA flavor tenants will  feel capacity crunch.
Flavor
Capacity Forecast
Designed
Physical Hosts
Actual usage
Required P Host
Gap
P host
GE
33%
83
20%
50
33 more
NA
33%
83
50%
125
42 less
DK
33%
83
30%
75
8 less

Total
250





The Cloud Capacity design process can be summarized as figure below:

Thanks
Vadan

(This blog represents Author's personal understanding of the subject)