AWS Cloud Cost Optimization: Strategies & Best Practices

Jun 19, 2024

Cloud cost optimization is a key element in the long-term success of any organization using AWS. For many startups and early-stage companies, the stakes are even higher: containing cloud costs is a critical survival factor.

While it isn’t that difficult to shut down some unused EC2 instances or to reduce the size of a few resources, leveraging cloud native efficiencies for major financial benefit is a more precise process. Whether you need to extend your runway or just be more cost-conscious, it should be approached as a project that can have a massive impact on the bottom line and, therefore, handled by a team with the experience to get every last dollar of value.

For more than a decade, I have worked with dozens of organizations to save money on and improve the performance of their AWS accounts. This experience covers companies with a single AWS account to those with dozens and monthly spends up to the tens of thousands. In working with all these environments, I have found that more can almost always be done to save money on your cloud vendor bill, no matter how much has already been done to reduce costs.

It was important to me to share my insights to allow others to learn from my experience. My goal is for this post to act as a launching pad for organizations looking to increase efficiency and improve ROI. If your team is in need of more direct support, consider contacting me for a free consultation to learn more about Stern Devops Group’s Cloud Cost Optimization Services.

An Introduction to AWS Cost Saving Strategies

Before you start looking at where to cut costs, any significant project of this type requires capacity planning. You must fully understand your current and projected cloud usage to successfully implement cost-saving measures, especially when committing to long-term contracts for added discounts. Projections are notoriously difficult to get right, but even a rough idea of business plans for the next 12-36 months will inform your decisions and prevent overcommitting to resources you won’t fully utilize.

When you have an idea of utilization needs and projections, you can start looking at where you can save money. Managing cloud expenses in AWS (and other providers) requires three general strategies:

  1. Rightsize computing resources
  2. Commit to longer-term usage
  3. Rearchitect cloud infrastructure

Each of these will be covered in more detail below to give you a sense which is right for you. In most cases, the best strategy is a combination of all three. Each application, department, and workload has different use cases and needs. You will employ one or more of these differently for each isolated service or environment in your system.

There are also automated cost management tools that will analyze resource usage and recommend areas to reduce spend. For larger businesses with an extensive amount of cloud infrastructure, these may be a more efficient starting point than human analysis, but they are not a substitute for actual people reviewing expenditures based on real company initiatives. For startups and smaller organizations, consider these types of services carefully. In many cases, it’s cheaper and at least as effective to evaluate a relatively small set of cloud resources on your own using a guide like this one.

An Overview of Cloud Pricing Models

Cloud usage is mostly charged in one of three ways:

  1. For file storage resources such as S3 or EBS, you’ll pay for the data stored.
  2. For compute resources, you’ll pay for CPU and memory.
  3. Egress, or the amount of data transferred out of AWS

Services such as EC2, ECS, EKS, RDS and Elasticache will charge for both compute and storage since you need both to operate them, but each attribute can be scaled independently.

There are many tiers within these pricing structures. S3, for example, has a variety of file redundancy to choose from. More availability for critical data costs more, but archiving files in Glacier is very inexpensive for data only needed in an emergency. This is where your capacity planning exercises will greatly inform the types of storage you need for each data set.

Egress costs are less flexible than file storage and compute, so we will focus on the first two.

Cloud Cost Optimization Best Practices

To create the most performant cloud architecture for the cheapest price, we need to understand how to measure and configure the right amounts of both models now and in the future.

1. Capacity Planning

Before you can apply a strategy to reduce cloud expenditure, it’s important to be aware of how your cloud services are used and the expectations for future usage. To do this correctly, we always advise technical teams to work with the product, financial and management teams to understand their future plans. Every department in the organization can affect resources, so accurately estimating future needs requires collaboration. The feature roadmap and planned promotions, for example, have a major impact on computing resources.

2. Rightsize Computing Resources

Rightsizing simply means evaluating your current and planned resource usage, then moving to smaller or fewer instances. The difficult part can be knowing how to evaluate usage. Too often, teams look at a short time horizon like a week or a month. It’s much more advantageous, however, to survey a longer time scale. If, for instance, your application is a product in high demand around the holidays, you won’t see an actionable pattern if you assess the past quarter during a July review.

Additionally, capacity planning demands that you understand the maximum values your resources can reach, even if they happen infrequently. Your ECS cluster may run at an average 10% CPU throughout the year, but if it spikes to 80% once a month during a special event or to generate reports, you must plan for this. Constraining resources to sizes that are too small to handle such peaks, or not opting for burstable (using more resources during spikes) instances, could mean outages and a negative customer experience.

3. Commit to Longer Terms

All cloud providers offer discounts to users that commit to a predefined amount of expenditure. AWS has “savings plans” and “reserved instances” for generalized computing and specific instance types and sizes, respectively. These types of discounts are frequently overlooked and they should always be on your radar as a way to save money.

Once you understand the current utilization of your cloud resources and the minimum amount expected over the next 12, 24, and 36 months, signing a contract is hugely beneficial. Organizations can save up to 40% on services they would use anyway. For early-stage companies, this can be a massive amount of savings, frequently in the thousands of dollars per month.

4. Rearchitect Cloud Infrastructure

The most difficult strategy for cloud cost management is rearchitecting existing infrastructure. Doing this, however, can have several additional benefits. Many companies, for example, that do a lift-and-shift migration into AWS from a data center leave money on the table by using traditional on-premise design methodologies. Optimizing for a cloud native architecture creates efficiencies that almost always improve performance and reduce costs.

Rebuilding existing architecture requires expertise in not only how to build cloud native resources, but also how to migrate seamlessly to the new architecture with no downtime or disruption to users. The savings also have to be worth the expense of having your DevOps and development personnel design, test and execute the migration. While this is usually not a trivial project, it is frequently worth the time when planned well.

A Framework for Developing a Cost Containment Strategy

  1. Start with your monthly bill. Find the most recent representative full-month bill to analyze. It’s important to start with a bill that represents standard usage. If you are doing your review in November and you had an outage or an unusual spike in October, go back to September and use that utilization as a baseline. If your utilization varies, find the monthly average for the services you use.
  2. Capacity plan for the minimum utilization needed for the next 12, 24, 36 months.
  3. Rearchitect if necessary and if it has a significant return on the investment.
  4. Reduce the number and size of instances and compute if possible.
  5. For the remaining resources, decide on a budget to invest in commitments like Reserved Instances and Savings Plans.
  6. Apply the reserved instances and plans.
  7. Review utilization every quarter or two, particularly for early stage companies. Utilization can change quickly and taking even a small amount of time to adjust can have a long-term financial impact.

Outsourcing Cloud Cost Optimization

Cloud cost optimization strategies are not rocket science. Some technical skill is needed to squeeze more value out of your cloud account, but it doesn’t require years of experience. Optimizing for maximum dollar value, however, requires cloud expertise and can gain your company potentially exponential benefits. Outsourcing this to a qualified technical team can easily pay for itself. If the net outcome is positive cash flow for the business, finding someone who can bring this value is well worth the expense.

Stern Devops Group has engineered massive savings for many clients, in some cases reducing their AWS expenditure by 50%. More information is available here: https://www.sterndevopsgroup.com/aws-cost-optimization-service/.

devops consulting firm

Dave Stern, President

My name is Dave Stern and I have been a developer, systems and network engineer for over 20 years and a cloud architect in AWS for almost a decade. With a wide range of experience in linux, web application architecture, automation, security and software development, I have worked with multiple venture-backed startups and large organizations alike to build and improve their cloud presence.

My company Stern Devops Group provides devops, cloud architecture, automation and security consulting primarily in AWS. Our focus is scaling platforms for massive growth, increasing developer productivity and driving infrastructure costs down.