Introduction

Amazon Simple Storage Service (S3) is a cloud storage solution known for its scalability, reliability, and security. S3 storage can be used in many ways for many use cases. However, they all have one thing in common: cost management.

Managing costs on S3 may be demanding for companies that migrate their on-premises solutions to the AWS cloud or for those with non-standardized storage management policies. Improperly managing S3 costs can cause expenses to rapidly escalate and have a serious impact on budgets.

This article will give a concise overview of how to optimise costs on S3 by implementing appropriate standards. I will explore strategies such as selecting appropriate storage classes, leveraging lifecycle policies, and efficient data management to help you manage your S3 expenditures efficiently. We will focus on best practices for storing and processing data files. By understanding the nuances of data file management in S3, you will be better prepared to reduce costs.

Ways to optimise costs

Storage classes

Amazon Simple Storage Service (S3) offers multiple storage classes that are designed for different storage needs, optimizing costs and performance. Each storage class is intended for a different use case, providing flexibility in how data is stored and accessed. Understanding these storage classes is crucial for effective data management and cost optimization in the cloud.

S3 Standard
Should be used for frequently accessed data. It offers high durability, availability, and low latency.
S3 Intelligent-Tiering
Automatically moves data between two access tiers based on changing access patterns.
S3 Standard-IA (Infrequent Access)
Should be used for infrequently accessed data. It offers lower storage costs with a retrieval fee.
S3 One Zone-IA
Similar to Standard-IA but stored in a single availability zone. It offers lower costs for infrequently accessed data with slightly less durability.
S3 Glacier Instant Retrieval
Designed for archival storage with immediate access to data. Data retrieval is instant.
S3 Glacier Flexible Retrieval
Designed for long-term archival with retrieval times ranging from minutes to hours, offering very low storage costs for data that is rarely accessed.
S3 Glacier Deep Archive
The lowest-cost storage option for data that is accessed very infrequently, with retrieval times ranging from 12 to 48 hours, is ideal for data that needs to be preserved for years or decades.

You can find detailed descriptions of Storage Classes in the documentation, links below:

Lifecycle Policies

Lifecycle policies are a set of rules that define the actions performed by Amazon S3 to a group of objects. These rules are used to move objects into different storage classes and archive or delete them after a certain time. Lifecycle rules are divided into “Transition actions” and “Expiration actions”. Actions are based on the creation date for the current file version. For noncurrent versions of objects, you define the number of days after which action should be performed.

Transition actions

Transition actions are responsible for moving objects to another storage class based on their age or the date they were created. Example of transition actions:

Move objects from S3 Standard to S3 Standard-IA 30 days after creation
Move objects from S3 Standard to S3 One Zone-IA 60 days after creation
Move objects from S3 Standard-IA to S3 Glacier 90 days after creation
Move objects from S3 Glacier Flexible Retrieval to S3 Glacier Deep Archive 180 days after creation

Expiration actions

Expiration actions are responsible for permanently deleting objects, thus freeing up storage space and reducing costs. Example of expiration actions:

Delete objects 100 days after creation
Permanently delete previous versions of objects older than 15 days
Delete objects based on a prefix older than 15 days
Delete objects based on a tag older than 15 days

Efficient Data Management

Aggregation of small files

Small files can cause inefficiencies and higher costs in Amazon S3 due to the overhead associated with storing and retrieving each individual object. Aggregating small files into larger objects can significantly reduce these costs and improve performance.

Reduced Storage Overhead: Amazon S3 charges based on the number of objects stored and the storage used. By combining small files into larger objects, you reduce the number of objects, thereby lowering overhead costs
Lower Request Costs: Each PUT, GET, and LIST request incurs a cost. Aggregating files reduces the number of requests needed to manage and access your data
Improved Performance: Fewer objects mean fewer metadata operations, which can lead to faster access times and more efficient data retrieval
Potential reduced requests number:
- To other services: e.g. when you use KMS you will reduce encrypt and decrypt requests
- Data transfer
- S3 Replication

Data compression

Data compression is another effective strategy for reducing storage costs and improving data transfer efficiency. Compressing data before storing it in Amazon S3 reduces the amount of storage space required and can lower data transfer costs. Currently, Amazon S3 does not offer native data compression features, so compression must be handled by the data provider or user before/after files are uploaded to S3.

Reduced Storage Costs: Compressed data occupies less space, which directly translates to lower storage costs
Faster Data Transfers: Compressed files are smaller, resulting in faster uploads and downloads, and reduced data transfer costs
Improved Performance: Smaller file sizes mean quicker retrieval times, enhancing the overall performance of data access

Example of data compression:

When you work with CSV files, it is beneficial to store them in a compressed .gz format. PySpark can decompress these files when loading data, allowing compressed CSV files to be read directly.

df = spark.read.format(„csv”).option(„header”, „true”).load(„datafile.csv.gz”)

Algorithm of decreasing S3 costs

The diagram below shows how to approach effective cost management on S3 from a data storage perspective.

The diagram shows how to approach effective cost management on S3 from a data storage perspective.

Use cases

Introduction

The presented cases are among the most common. They show typical storage management problems, possible solutions, and real benefits resulting from properly applied cost management practices.

Case #1

Brief introduction

A company needs to reduce costs, and they look for some solutions. They use many AWS services, and they have conducted an audit. It turned out that S3 is one of the most expensive services. The company asked for a cost-effective S3 storage management policy.

As-is state:

All S3 objects are stored in S3 standard storage class
No lifecycle policies are set
Company receives 3k files per month
Expected average new file size for the next 12 months: ~1 GB
There is one bucket to optimize
- Raw bucket – where source files are stored
  - Files are accessed for 30 days and then only once or twice a year, if at all
  - Files should be available immediately when needed
  - Data cannot be deleted
  - CSV file format
  - Region: us-east-1
  - 600k objects
  - Total size: 60 TB
  - Storage cost: $1370 per month
Files are not divided into many small parts

Possibilities to reduce costs

In this scenario, the best storage classes to use are S3 Standard and S3 Glacier Instant Retrieval. S3 Standard will handle files accessed for 30 days, and after that time, they should be transferred to S3 Glacier Instant Retrieval. Changing the file format from CSV to Parquet will reduce the total files size by 2 to 10 times. This solution provides high durability, availability, low latency, and is more cost-efficient.

To achieve this, there are two possible steps to reduce costs:

Create a lifecycle rule to move objects to Glacier Instant Retrieval Storage class 30 days after object creation
Change file format to Parquet – prepare job to convert CSV files to Parquet Costs were calculated in June 2024.

Current pricing: S3 pricing documentation.

Solution	Objects number	Total size	Storage class	Cost per month
As-is	600k	60TB	S3 Standard	1 370,00 USD
CSV to Parquet	600k	6TB-30TB	S3 Standard	138,00 – 690,00 USD
Glacier Instant Retrieval	600k	57TB + 3TB	Glacier Instant Retrieval + S3 Standard	297,00 USD
Parquet + Glacier Instant Retrieval	600k	5,7TB-28,5TB + 3TB	Glacier Instant Retrieval + S3 Standard	91,80 – 183,00 USD

We can see the difference between different variants in the cost-per-month perspective. The file format transformation significantly reduces cost even when using S3 Standard storage class, as does moving unused objects to Glacier Instant Retrieval. The combination of these two scenarios gives the best result.

Let’s compare costs from a 12-month perspective, including transition costs and data delivery.

Month	Objects number	CSV total size	Parquet total size	As-is solution	Parquet + Glacier Instant Retrieval		Transition costs
Month	Objects number	CSV total size	Parquet total size	As-is solution	Lowest	Highest	Transition costs
1	600 000	60TB	6TB-30TB	1 370,00 USD	91,80 USD	183,00 USD	5,97 USD
2	603 000	63TB	6,3TB-31,5TB	1 436,00 USD	93,00 USD	189,00 USD	0,03 USD
3	606 000	66TB	6,6TB-33TB	1 502,00 USD	94,20 USD	195,00 USD	0,03 USD
4	609 000	69TB	6,9TB-34,5TB	1 568,00 USD	95,40 USD	201,00 USD	0,03 USD
5	612 000	72TB	7,2TB-36TB	1 634,00 USD	96,60 USD	207,00 USD	0,03 USD
6	615 000	75TB	7,5TB-37,5TB	1 700,00 USD	97,80 USD	213,00 USD	0,03 USD
7	618 000	78TB	7,8TB-39TB	1 766,00 USD	99,00 USD	219,00 USD	0,03 USD
8	621 000	81TB	8,1TB-40,5TB	1 832,00 USD	100,20 USD	225,00 USD	0,03 USD
9	624 000	84TB	8,4TB-42TB	1 898,00 USD	101,40 USD	231,00 USD	0,03 USD
10	627 000	87TB	8,7TB-43,5TB	1 964,00 USD	102,60 USD	237,00 USD	0,03 USD
11	630 000	90TB	9TB-45TB	2 030,00 USD	103,80 USD	243,00 USD	0,03 USD
12	633 000	93TB	9,3TB-46,5TB	2 096,00 USD	105,00 USD	249,00 USD	0,03 USD
			SUM	20 796,00 USD	1 180,80 USD	2 592,00 USD	6,30 USD

Summary

We can see that the predicted costs for the 12 months result in a huge difference. The company can save ~18k/19k USD in 12 months in comparison to the current solution.

Case #2

Brief introduction

A company is looking for some cost improvements of S3 service. They requested an audit.

As-is state:

All S3 objects are stored in appropriate storage classes
Lifecycle policies are set:
- Transaction actions – move objects to the appropriate storage class
- Expiration actions – remove objects at the appropriate time
The company receives ~5k files per month
- On average, they receive one file in ~20 parts (250 files)
  - Average file (part) size: ~5 MB
The company uses one bucket
- Parquet file format
- Region: us-east-1
There is no aggregation job – there is a possibility of doing it

Possibilities to reduce costs

In this scenario the lifecycle policies are correctly used – there is no need for improvement. There is no possibility to change the file format. It is possible to aggregate data by combining many small files into one. This will reduce the number of requests and will be a more cost-effective solution.

To achieve this, there are two possible options to reduce costs:

Create a job to combine files
Instruct data providers to send data in one file

Creating the job to combine files is not the best solution from a cost-effective perspective, but sometimes, this is the only way. You have to prepare and maintain the new solution. This is reflected in the increase in operational costs, and this solution does not reduce costs as much as the second approach. A better solution is to outsource this directly to data providers, but their cooperation and flexibility are required. The comparison of these two approaches is presented below:

Aspect	Aggregate files in AWS	Instruct Data Providers to send a single file
Implementation Complexity	Moderate	Low
Cost	Medium	Low
Maintenance	High	Low
Operational Overhead	High	Low
Time to Implement	Moderate to High	Low to Moderate (could be longer)
Integration Complexity	High	Low
Cost Predictability	Moderate	High
Provider Cooperation	Not Required	Essential

Summary

Our recommendation is to instruct data providers to send the data in a single file. This solution results in lower costs and effort but cooperation with data providers is crucial.

Case #3

Brief introduction

A company is looking for some cost improvements of S3 service. They requested an audit.

As-is state:

All S3 objects are stored in appropriate storage classes
Life cycle rules are set up properly
The company uses one bucket
- Region: us-east-1
- Parquet file format
- ~28 million objects
- Total size: ~3 TB
- Storage cost: ~ $80 per month
- API requests number: ~1 000 000 000 000
  - Cost: ~ $600 per month
- Requests number to KMS: ~230 000 000
  - Cost: ~ $700 per month
- Requests number to GuardDuty: ~1 000 000 000 000
  - Cost: ~ $500 per month
- The company has too many small files: files ~100 KB per object

Possibilities to reduce costs

Implementing Delta Lake can help reduce the costs associated with excessive API requests and inefficient storage usage caused by too many small files. Delta Lake uses compaction techniques that combine smaller files into larger ones. To realize these savings, you need to integrate Delta Lake into your existing environment.

Cost Category	Before Delta Lake	After Delta Lake	Savings
S3 Storage	80,00 USD	~ 80,00 USD	0,00 USD
S3 API Requests	600,00 USD	240,00 USD	360,00 USD
KMS Requests	700,00 USD	280,00 USD	420,00 USD
GuardDuty Requests	500,00 USD	200,00 USD	300,00 USD
Total Monthly Cost	1 880,00 USD	800,00 USD	1 080,00 USD

It is worth mentioning that the processing itself is also 5 times faster than before (previously it took around 12 hours).

Summary

Implementing the Delta Lake solution has resulted in savings of $1 000 per month.

Summary

Cost-effective S3 management can be easily implemented through the implementation of a few simple rules or the use of S3 Intelligent-Tiering or very complex ones containing dedicated rules relating to specific locations. Proper S3 management is very important for companies dealing with Big Data as the size of the data reaches unimaginable proportions. Failure to manage S3 or doing it in the wrong way can have a serious impact on budgets.

Each case is different, even if very similar. The key factors are constraints, client and environment requirements, and finally cooperation during the analysis. Openness and cooperation are key to a reliable analysis, which has a huge impact on further actions.

All content in this blog is created exclusively by technical experts specializing in Data Consulting, Data Insight, Data Engineering, and Data Science. Our aim is purely educational, providing valuable insights without marketing intent.

Back to the list

Hi-Tech Pharmacy Energy IT technology

Amazon S3: cost management strategies and use cases

Introduction

Ways to optimise costs

Storage classes

Lifecycle Policies

Transition actions

Expiration actions

Efficient Data Management

Aggregation of small files

Data compression

Algorithm of decreasing S3 costs

Use cases

Introduction

Case #1

Brief introduction

Possibilities to reduce costs

Summary

Case #2

Brief introduction

Possibilities to reduce costs

Summary

Case #3

Brief introduction

Possibilities to reduce costs

Summary

Summary

Similar posts

BitPeak at CDO Forum 2025: From chaos to order in data governance

Lessons from NEXT Pharma Summit 2025 in Dubrovnik

Transforming airline safety with AI