Telco Cloud Capacity planning is complex task. It involves to design capacity for various types of workloads, with specific requirements from underlying hardware. Capacity design also includes distribute Datacenter server footprint( number of physical server/host a single Datacener can host due to space limitation).
VM placement strategy is key item in Cloud Capacity design.
VM placement framework
Cloud orchestration system use VM placement strategies & tools to place VM on correct physical Host. In Telco Cloud context, CSPs are concerned that their VM should be placed on physical host, which has enough capabilities to offer Telco grade performance & resiliency.
Cloud should support various workloads with different resource (CPU, RAM, Disk), & performance (latency, throughput) requirements.
When user instantiates VM, Openstack Nova scheduler finds correct physical host that meet user’s workload requirements. To achieve precise VM placement, Openstack Nova has detailed filter framework (see https://docs.openstack.org/mitaka/config-reference/compute/scheduler.html) .
Based on filtering & weighing (prioritization of filters), Nova scheduler selects physical host which has required capabilities that meets Telco application needs.
Telco workloads have mainly three types of VM placement needs:
1) Resource Needs:
• CPU, Memory & Disk: These are very basic requirements. Selected host should have enough resources to offer.
2) Deterministic Performance Needs:
Media (voice/video) processing VMs are latency prone. Such latency prone VMs should be hosted on physical server with CPU pinning, NUMA aware and supporting Huge Pages.
- CPU pinning: In hyperthread architecture, each Physical CPU is have two threads. In Cloud environment, each thread can be shared among 2, 4 or more virtual CPUS e.g oversubscription ratio as 2:1 or 4:1.
For Telco workloads, to avoid latency and high CPU processing
requirements, user wants to make sure that each virtual CPU is fixed with physical
CPU, e.g oversubscription ratio as 1:1 ( shown in figure below).
This
mapping table between virtual CPU and physical CPU ensures that packet from
vCPU is directed towards particular Physical CPU .Without CPU pinning, hypervisor
has to randomly select available CPU, increasing packet processing latency. This will reduce latency & ensure that
Telco work load gets enough CPU processing power, to mitigate noisy neighbor
problem. There are various options to implement CPU pinning as described at https://networkbuilders.intel.com/docs/CPU_Pinning_With_Openstack_nova.pdf.
1 (3) Throughput Needs
- Telco workloads, specifically Media VM (VM which carry live voice/video traffic) may have high throughput requirements such as 3 MPPS. Technologies such as DPDK & SRIOV are able to meet Telco workloads’ high throughput requirements.. (details: http://www.metaswitch.com/the-switch/accelerating-the-nfv-data-plane ). DPDK & SR-IOV are hardware features and Nova scheduler has to select DPDK/SR-IOV enabled host, when required by tenants.
VM Placement
Enablers
Flavors
To indicate workload needs,
Openstack has “Flavors”. Flavors represents the resource templates that
indicates Nova scheduler about VM placement needs e.g flavor m1.tiny indicates resource
requirements of 1 vcpu, 1GB Memory and 512 MB disk.. When Nova schedulaer reads
flavor as m1.tiny in ‘Create VM’ request, it searches for
physical Host that has required vCPU, Memory and Disk available. ( see https://docs.openstack.org/admin-guide/compute-flavors.html)
Apart from resource, Nova scheduler
requires to select host with Performance(compute intensive) &
Throughput (Network intensive) requirements of tenant. Openstack
uses Host Aggregates(HA) to provide Nova scheduler capability to select
required hardware configuration as indicated in “Create VM” request. ( Ref:https://www.datadoghq.com/blog/openstack-host-aggregates-flavors-availability-zones/
Host Aggregates
HA is grouping of physical host
with similar capabilities e.g all Numa aware hosts are grouped under NA and all
Dpdk aware hosts are grouped under DK. Once
an aggregate is created, administrators can then define specific flavors from
which tenants can choose to place their virtual machines. Flavors are used by tenants
to indicate Nova scheduler, the type of hardware that will host their instance.
(Ref: https://www.datadoghq.com/blog/openstack-host-aggregates-flavors-availability-zones/
)
For example flavor name DK1c1r512d,
indicates that tenant wants to place VM at Dpdk aware physical host, which has
1cpu, 1GB memory & 512GB disk space available.
HA capacity planning
Openstack controller can support n number of compute nodes
and each datacenter has certain footprint limitation e.g datacenter can host
only 500 servers. Once that capacity is identified, Cloud administrator can
start HA capacity planning. Assume that CSP DC has limitation of 500 physical
hosts. Cloud administrator has following
requirements:
-
Total two AZs
-
Total two HA( general (GE) & special (SP)
Based
on these requirements, Cloud admin will divide 500 Physical hosts as
-
250 hosts per AZ(500/2)
Figure below shows two scenarios
of Host distribution.
Flavor
|
Capacity
Forecast
|
Designed
Physical
Hosts
|
Flavor
|
Capacity
Forecast
|
Designed
Physical
Hosts
|
|
GE
|
50%
|
125
|
GE
|
33%
|
83
|
|
NA
|
20%
|
50
|
NA
|
33%
|
83
|
|
DK
|
30%
|
75
|
DK
|
33%
|
84
|
|
|
Total
|
250
|
|
Total
|
250
|
Two essential design requirements CSP should consider, for
optimum VM placement:
1)
Design their Cloud flavors based on various
types of workload they are going to support
2)
Design Host Aggregates based their Capacity
forecast. Incorrect capacity forecast, may lead to imbalanced hardware resources utilization e.g if GE flavor consumes less than 33% (Table
1), 33 physical hosts seems overprovisioned(lower utilization), while NA flavor
consumption (50%) is more than designed(33%), NA flavor tenants will feel capacity crunch.
Flavor
|
Capacity Forecast
|
Designed
Physical
Hosts
|
Actual usage
|
Required P Host
|
Gap
P host
|
GE
|
33%
|
83
|
20%
|
50
|
33 more
|
NA
|
33%
|
83
|
50%
|
125
|
42 less
|
DK
|
33%
|
83
|
30%
|
75
|
8 less
|
|
Total
|
250
|
|
|
|
The Cloud Capacity design process can be summarized as figure below:
Thanks
Vadan
(This blog represents Author's personal understanding of the subject)
|
Thanks for your valuable points on Cloud Computing, I was little bit confuse on Cloud Technology, Now I had a proper clarification on Cloud Computing Courses such as AWS, Azure, VMware, Salesforce.
ReplyDeleteCloud Courses in Chennai
Cloud Computing Classes in Chennai