5G Telco Cloud: New Beginning: 2017

Telco Cloud Capacity planning is complex task. It involves to design capacity for various types of workloads, with specific requirements from underlying hardware. Capacity design also includes distribute Datacenter server footprint( number of physical server/host a single Datacener can host due to space limitation).

VM placement strategy is key item in Cloud Capacity design.

VM placement framework

Cloud orchestration system use VM placement strategies & tools to place VM on correct physical Host. In Telco Cloud context, CSPs are concerned that their VM should be placed on physical host, which has enough capabilities to offer Telco grade performance & resiliency.

Cloud should support various workloads with different resource (CPU, RAM, Disk), & performance (latency, throughput) requirements.

When user instantiates VM, Openstack Nova scheduler finds correct physical host that meet user’s workload requirements. To achieve precise VM placement, Openstack Nova has detailed filter framework (see https://docs.openstack.org/mitaka/config-reference/compute/scheduler.html) .

Based on filtering & weighing (prioritization of filters), Nova scheduler selects physical host which has required capabilities that meets Telco application needs.

Telco workloads have mainly three types of VM placement needs:

1) Resource Needs:

• CPU, Memory & Disk: These are very basic requirements. Selected host should have enough resources to offer.

2) Deterministic Performance Needs:

Media (voice/video) processing VMs are latency prone. Such latency prone VMs should be hosted on physical server with CPU pinning, NUMA aware and supporting Huge Pages.

CPU pinning: In hyperthread architecture, each Physical CPU is have two threads. In Cloud environment, each thread can be shared among 2, 4 or more virtual CPUS e.g oversubscription ratio as 2:1 or 4:1.

For Telco workloads, to avoid latency and high CPU processing requirements, user wants to make sure that each virtual CPU is fixed with physical CPU, e.g oversubscription ratio as 1:1 ( shown in figure below).

This mapping table between virtual CPU and physical CPU ensures that packet from vCPU is directed towards particular Physical CPU .Without CPU pinning, hypervisor has to randomly select available CPU, increasing packet processing latency. This will reduce latency & ensure that Telco work load gets enough CPU processing power, to mitigate noisy neighbor problem. There are various options to implement CPU pinning as described at https://networkbuilders.intel.com/docs/CPU_Pinning_With_Openstack_nova.pdf.

NUMA awareness: Modern, multi-socket x86 systems use a shared memory architecture that describes the placement of the main memory modules with respect to processors in a multiprocessor system. In a NUMA-based system, each processor has its own local memory controller that it can access directly with a distinct performance advantage. It is advisable that user wants to select Numa aligned physical CPU for their virtual CPU. Hence in CPU pinning mapping table, designer has to make sure that mapped physical CPU has NUMA capabilities. Numa topology is hardware feature and during placement Nova scheduler has to select Numa aligned physical host. NUMA awareness is one of the Nova scheduler filters.(Ref: http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/ )

1 (3) Throughput Needs

Telco workloads, specifically Media VM (VM which carry live voice/video traffic) may have high throughput requirements such as 3 MPPS. Technologies such as DPDK & SRIOV are able to meet Telco workloads’ high throughput requirements.. (details: http://www.metaswitch.com/the-switch/accelerating-the-nfv-data-plane ). DPDK & SR-IOV are hardware features and Nova scheduler has to select DPDK/SR-IOV enabled host, when required by tenants.

VM Placement Enablers

Flavors

To indicate workload needs, Openstack has “Flavors”. Flavors represents the resource templates that indicates Nova scheduler about VM placement needs e.g flavor m1.tiny indicates resource requirements of 1 vcpu, 1GB Memory and 512 MB disk.. When Nova schedulaer reads flavor as m1.tiny in ‘Create VM’ request, it searches for physical Host that has required vCPU, Memory and Disk available. ( see https://docs.openstack.org/admin-guide/compute-flavors.html)

Apart from resource, Nova scheduler requires to select host with Performance(compute intensive) & Throughput (Network intensive) requirements of tenant. Openstack uses Host Aggregates(HA) to provide Nova scheduler capability to select required hardware configuration as indicated in “Create VM” request. ( Ref:https://www.datadoghq.com/blog/openstack-host-aggregates-flavors-availability-zones/

Host Aggregates

HA is grouping of physical host with similar capabilities e.g all Numa aware hosts are grouped under NA and all Dpdk aware hosts are grouped under DK. Once an aggregate is created, administrators can then define specific flavors from which tenants can choose to place their virtual machines. Flavors are used by tenants to indicate Nova scheduler, the type of hardware that will host their instance. (Ref: https://www.datadoghq.com/blog/openstack-host-aggregates-flavors-availability-zones/ )

For example flavor name DK1c1r512d, indicates that tenant wants to place VM at Dpdk aware physical host, which has 1cpu, 1GB memory & 512GB disk space available.

Ref: https://www.ibm.com/support/knowledgecenter/en/SS4KMC_2.4.0/com.ibm.sco.doc_2.4/dd/mp/mpc_flavor.html

HA capacity planning

Openstack controller can support n number of compute nodes and each datacenter has certain footprint limitation e.g datacenter can host only 500 servers. Once that capacity is identified, Cloud administrator can start HA capacity planning. Assume that CSP DC has limitation of 500 physical hosts. Cloud administrator has following requirements:

- Total two AZs

- Total two HA( general (GE) & special (SP)

Based on these requirements, Cloud admin will divide 500 Physical hosts as

- 250 hosts per AZ(500/2)

Figure below shows two scenarios of Host distribution.

Flavor	Capacity Forecast	Designed Physical Hosts	Flavor	Capacity Forecast	Designed Physical Hosts
GE	50%	125	GE	33%	83
NA	20%	50	NA	33%	83
DK	30%	75	DK	33%	84
	Total	250		Total	250

Cloud Capacity Planning Essentials

Two essential design requirements CSP should consider, for optimum VM placement:

1) Design their Cloud flavors based on various types of workload they are going to support

2) Design Host Aggregates based their Capacity forecast. Incorrect capacity forecast, may lead to imbalanced hardware resources utilization e.g if GE flavor consumes less than 33% (Table 1), 33 physical hosts seems overprovisioned(lower utilization), while NA flavor consumption (50%) is more than designed(33%), NA flavor tenants will feel capacity crunch.

Flavor	Capacity Forecast	Designed Physical Hosts	Actual usage	Required P Host	Gap P host
GE	33%	83	20%	50	33 more
NA	33%	83	50%	125	42 less
DK	33%	83	30%	75	8 less
	Total	250

The Cloud Capacity design process can be summarized as figure below:

Thanks

Vadan

(This blog represents Author's personal understanding of the subject)

5G Telco Cloud: New Beginning

Monday, July 10, 2017

Telco Cloud Capacity Planning(Flavors & HA)