Cloud Cost Optimisation: Lessons from 50+ Deployments

Back to insights

Cloud infrastructure costs have a tendency to grow quietly. What starts as a reasonable monthly bill can double or triple within a year as teams spin up resources without tearing them down, choose instance sizes based on guesswork, and neglect the pricing models that providers offer for predictable workloads. After optimising cloud spend across more than fifty client deployments, we have identified the patterns that consistently deliver meaningful savings without compromising performance or reliability.

Right-Sizing: The Single Biggest Win

The most common source of cloud waste is oversized instances. Development teams typically provision for peak load during initial setup and never revisit the decision. We routinely find instances running at 5-15% average CPU utilisation - meaning 85% of the compute budget is wasted.

Monitor before you resize

Before changing any instance sizes, collect at least two weeks of utilisation data. Use CloudWatch, Grafana, or your provider's native monitoring to track CPU, memory, network, and disk I/O at the instance level. Look for the P95 (95th percentile) values, not averages - you need headroom for traffic spikes.

# AWS CLI - get CPU utilisation statistics for the past 14 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-0abc123def456 \
  --start-time $(date -d '14 days ago' -u +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 3600 \
  --statistics Average Maximum p95

If the P95 CPU is below 40%, the instance is a strong candidate for downsizing. If memory utilisation is consistently below 50%, consider moving to a compute-optimised family rather than a general-purpose one.

Use the right instance family

Cloud providers offer dozens of instance families optimised for different workload profiles. A web application server does not need the same resource balance as a machine learning training job or a database server. Matching the instance family to the workload profile typically saves 20-35% compared to using general-purpose instances for everything.

Reserved Capacity and Savings Plans

On-demand pricing is the most expensive way to run cloud infrastructure. For workloads that run continuously - databases, application servers, monitoring systems - reserved instances or savings plans offer 30-60% discounts in exchange for a one or three year commitment.

Start with a 1-year no-upfront commitment

Many teams avoid reserved instances because they fear committing to the wrong configuration. The safest entry point is a one-year, no-upfront reserved instance for your most stable workloads. You get roughly 30% savings with no capital outlay and the flexibility to modify the reservation if your needs change.

// Cost comparison for a single m6i.xlarge instance (EU West)
const pricing = {
  onDemand:   { monthly: '$140.16', annual: '$1,681.92' },
  reserved1y: { monthly: '$89.79',  annual: '$1,077.48', saving: '36%' },
  reserved3y: { monthly: '$56.94',  annual: '$683.28',   saving: '59%' }
};
// For 10 instances, the 1-year reservation saves ~$6,000/year

Only move to three-year commitments once you have at least six months of stable usage data and confidence that the workload will persist. The deeper discount is attractive but the commitment risk is real.

Edge Caching: Reduce Origin Load by 80%+

Cloudflare, CloudFront, and similar CDN services can dramatically reduce both your bandwidth costs and origin server load. For content-heavy applications, edge caching is often the single most impactful cost optimisation.

Cache aggressively with proper invalidation

Static assets (images, CSS, JavaScript, fonts) should have cache TTLs of at least 30 days. Use content-based hashing in filenames so you can cache indefinitely without worrying about stale content.

For API responses, identify endpoints that return data which changes infrequently - product catalogues, configuration, public content - and cache them at the edge with shorter TTLs (1-5 minutes). Even a 60-second cache on a high-traffic endpoint can reduce origin requests by 90%.

Container Optimisation

Containerised workloads introduce their own cost dynamics. The most common waste patterns are oversized container resource requests, idle replicas, and bloated container images that slow deployments and increase registry storage costs.

Set resource requests based on actual usage

Kubernetes resource requests determine how much capacity is reserved for each pod. If you request 1 CPU and 2GB RAM but your container typically uses 0.2 CPU and 400MB, you are reserving five times more capacity than needed. Multiply this by dozens of services and the waste compounds quickly.

# Before optimisation - based on guesswork
resources:
  requests:
    cpu: "1000m"
    memory: "2Gi"
  limits:
    cpu: "2000m"
    memory: "4Gi"

# After optimisation - based on P95 usage data
resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "1Gi"

Monitoring and Cost Alerts

Cost optimisation is not a one-time project. Without ongoing monitoring, costs will drift upward as teams add resources and usage patterns change. Set up automated alerts that notify you when spending exceeds expected thresholds.

Tag everything

Implement a mandatory tagging policy for all cloud resources. At minimum, tag by environment (production, staging, development), team, and project. This lets you attribute costs accurately and identify which teams or projects are driving spend increases.

Kill zombie resources

Unattached EBS volumes, unused Elastic IPs, orphaned snapshots, and idle load balancers accumulate quietly. Run a monthly audit to identify and terminate resources that are no longer serving a purpose. In our experience, zombie resources account for 5-12% of total cloud spend in organisations that do not actively manage them.

The Bottom Line

Cloud cost optimisation does not require exotic techniques or expensive third-party tools. The fundamentals - right-sizing instances, using reserved capacity for stable workloads, caching at the edge, and maintaining cost visibility - consistently deliver 30-50% savings across our client base. The key is treating cost as an engineering metric, not just a finance problem, and building optimisation into your regular operational cadence.

Back to insights