Right-Sizing: What It Is, How To Do It, and How it Saved One Company 40% on Its Cloud Costs
Goldilocks had it easy. The fairy tale character walked into a house, tasted a few bowls of porridge, and found her "just right" fit on her third try. But for organizations migrating their infrastructure to the cloud, finding the "just right" fit is a lot more complicated. Unlike Goldilocks, organizations migrating to the cloud face over 25 million cloud configuration options. These options include different combinations of CPU, memory, storage, and capacity, and it’s quite challenging to figure out which is right for you.
That’s where right-sizing comes in.
What exactly is right-sizing? Right-sizing means achieving your best-fit cloud configuration, which includes having the optimal compute, storage, and network settings - as well as the best pricing plan - that will enable you to achieve your maximum performance requirements at the lowest possible cost. The provisioning of your compute, storage, and network resources is accurate in that they match their real-world usage, as opposed to traditional on-premises infrastructure that is over-provisioned. This is a major reason why the cloud can help you significantly reduce costs. Once you move to the cloud, you can just buy what you need at the time and provision more as you need it.
Right-sizing is an essential part of every stage of your cloud journey. It’s essential when calculating your total cost of ownership (TCO) as you think about moving to the cloud, when you plan your cloud migration, and once in your in the cloud right-sizing must be done on an ongoing basis to continually ensure cost-performance optimization.
The Analytics Essential to Accurately Right-Sizing Your Cloud
If you don’t accurately right-size your cloud workloads, your cloud costs could quickly spiral out of control, and you’ll also suffer serious performance issues. The problem is that it isn't easy - or even possible - to right-size without having highly precise analytics to guide you.
To right-size your cloud, you need a clear picture of your existing infrastructure, its workload performance profiles, and its usage patterns. Once this analysis is complete, you’ll need to perform apples-to-apples comparisons of how various cloud configuration options will serve your requirements.
1. Inventory Analysis: Identification of all the physical and virtual machines and applications running in your infrastructure. You need to be aware of everything you have in order to run an accurate performance and usage pattern analysis.
2. Infrastructure Performance Analysis: Undertake no less than a two week assessment of performance metrics for compute, storage, and network resources. These include:
- Peak CPU Utilization
- Allocated and Peak RAM usage
- Observed Storage On-Premise (capacity and current occupancy)
- Disc IOPS and Bandwidth
3. Usage Patterns: Identify idle compute resources and unused storage volumes for each node so you can turn off what you’re not using. Keep track of all instances: how many times is each instance on/off, how often is it being accessed and when during the day is it being accessed the most/least. You will see if the instance is idle most of time or actively used. From there you can determine if it can be turned off at certain times or taken down altogether. Together, usage patterns and performance analysis enable you to optimally provision your capacity so you’re only ever paying for what you’re actually using.
A performance analysis of your workloads is essential to identifying the instances that will ensure your capacity matches your usage and therefore meet performance requirements. Key metrics to consider for understanding your CPU requirements include peak CPU utilization, allocated and peak RAM usage, and usage patterns.
There are over 60 instance types for compute on AWS, ranging all the way from t2.nano to x1s. By measuring peak CPU utilization, allocated and peak RAM usage, and usage patterns, you can find the best instance types for each of your workloads.
For instance, in the following chart, the target CPU threshold is 60%. The dark blue line is the observed CPU utilization on the on-premises box, which has 8 cores. If the company purchases an instance with 8 cores, the CPU is at 25% - significantly over-provisioned from the target of 60%.
As shown in the following chart, the company can reduce the cores and get a smaller instance size, moving the CPU up to 45%. They can further reduce the cores, but it won’t meet their performance target. In this case, the m2.xlarge instance gives them 33% cost savings. Their CPU and memory are just the right size, and they can purchase more capacity when needed in the future. See more here on how provision compute resources.
Without a deep understanding of your current storage performance profile and how that profile translates into available cloud options, you risk over- or under-provisioning your cloud storage or choosing the wrong storage altogether. The options can be confusing since you have a choice between 4 storage options on Azure, 5 on AWS, and 3 on Google and choosing the wrong storage can result in significant performance and cost issues immediately.
A common example of over-provisioning storage happens when organizations with bursty IOPS provision for the highest possible level. For example, AWS offers GP-SSD and Provisioned IOPS storage options. Rather than provision for bursty behavior, you’ll want to get GP-SSD, which allows you to get more IOPS when you need it and to get rid of it when you don’t. With this configuration, you only pay for the storage that you need while still making sure that you have enough. In many cases, the cost difference between GP-SSD and Provisioned IOPS is over 91%.
To avoid over-provisioning, you can take a more detailed approach by conducting a thorough performance analysis that includes measuring IOPS, throughput, and other variables to understand your storage needs and choose an option that is provisioned to get just enough without overpaying.
Finding the Right Pricing Plan
The different pricing plan options offered by cloud vendors vary widely. For example, AWS offers an on-demand pricing plan and different types of reservation plans. Reserved Instance (RI) plans range from no up-front 1-year to 3-year all up-front RIs and can provide savings from 15% all the way up to 75% on top of on-demand pricing. Microsoft Azure offers pay-as-you-go subscriptions, and you can receive additional discounts based on your enterprise agreements. Meanwhile, Google offers a sustained usage model. If you know your performance profile, you will be able to identify the pricing plan that is extremely well-matched to your specific needs, which can help you cut costs significantly.
Case Study: Right-Sizing = Big Cost Savings
For one company, using comprehensive performance analysis and usage patterns to right-size its cloud saved it nearly 40% on its annual cloud costs than if they simply forklifted their environment into the cloud without changing any hardware requirements.
A large asset management company was planning to migrate 840 servers and 180 applications to AWS, and it wanted to understand the cost difference between simply forklifting its infrastructure as is to the cloud or right-sizing it when migrating. Here’s what it found after analyzing its performance metrics and usage patterns to inform its optimal AWS configuration options:
If the company was to forklift its environment and put it in the cloud without changing any hardware requirements, then its annual cloud costs would be $4.2 million.
If the company was to right-size its compute and storage resources in the cloud based on its workload performance profiles, then its annual cloud cost would be $2.6 million, a 38% cost savings.
If the company then purchased 3-yr RIs to optimize costs further, then its cost would be $1.7 million, a 60% cost savings.
If the company further optimized its environment for cloud elasticity, such as turning off instances when it’s not using them, or using autoscaling, it could realize 74% cost savings.
In the case of this company, its on-premises cost was $5 million. Simply moving its infrastructure to the cloud without any modifications would have reduced its infrastructure costs. However, by analyzing the right metrics, it could right-size its cloud infrastructure to realize anywhere from 38-74% in additional cost savings annually.
Right-Sizing is the Key to Realizing the Promise of the Cloud
Many organizations don’t fully realize the cost benefits of the cloud because they don’t accurately right-size their workloads on an ongoing basis. The only way to accurately right-size your cloud before, during, and after your cloud migration is to conduct continual comprehensive automated data analysis. The greater the precision of your analytics is, the greater the ROI of your cloud investments.