S3 Replication: Keep Your Data Secure and Always Available

Introduction

In the previous article of this series, I showed you how to use Presigned URLs to grant temporary access to your objects. Today, we’re taking a step up in data architecture and talking about S3 Replication. This is one of those features that seem simple to activate but have technical details that can save your life (or give you a headache) if you don’t know them well.

Context

Imagine you work for a company that stores critical customer documents in an S3 bucket in the Virginia (us-east-1) region. Everything is going great until, due to a legal requirement or a Disaster Recovery strategy, you’re asked that this data must also physically exist in Ireland (eu-west-1).

Or maybe you have a data analysis team in the same region that needs to work with an exact copy of production data without the risk of deleting it or affecting performance. How do you keep two buckets automatically synchronized without writing complex scripts? The answer is replication.

Replication Types: CRR vs SRR

AWS offers two ways depending on where our destination bucket is:

  • Cross-Region Replication (CRR): Used to copy objects between buckets in different AWS regions. It’s ideal for complying with regulations that require geographical distance or for minimizing latency for users on different continents.
  • Same-Region Replication (SRR): Here, the destination is in the same region as the source. It’s very useful for aggregating logs from several buckets into one or for synchronizing development and test environments with real data.

Fundamental Requirements

For the magic to happen, just clicking a button isn’t enough. There are two golden rules you must follow:

  1. Versioning: Both the source and destination buckets must have versioning enabled. Without this, S3 cannot track which object to copy or how to handle changes.
  2. IAM Role: You need an IAM role with the appropriate permissions so S3 can read from the source and write to the destination.

Important: Data replication is asynchronous. This means that when you upload a file, it doesn’t appear instantly in the destination; it can take a few seconds or minutes depending on the size.

What’s Replicated and What’s Not?

This is where many get confused, so pay attention to these details:

New vs. Existing Objects

By default, replication only works for objects you upload after enabling the rule. If you already have 1 TB of data and want to replicate it, S3 won’t do it automatically. For that, you’ll need to use S3 Batch Replication.

Handling Deletions

  • Delete Markers: If you delete an object without specifying a version ID, S3 creates a delete marker. You can configure replication so that this marker is also copied to the destination.
  • Deletions with VersionID: If you decide to permanently delete a specific version of an object (using its versionId), that action is NOT replicated. This is an AWS security measure to prevent an accidental or malicious deletion at the source from destroying the backups at the destination.

No Chaining

If Bucket A replicates to Bucket B, and Bucket B has a rule to replicate to Bucket C, the objects that arrived in B from A will not be sent to C. Replication is not transitive.

Use Cases

  • Compliance and Data Sovereignty: Keeping copies at a mandatory minimum distance.
  • Security: Replicating to a different AWS account to protect data against a total compromise of the source account.
  • Efficiency: Bringing data closer to your end users or your compute instances in other regions.

See You Soon

Setting up replication is a fundamental step for any self-respecting cloud architect. Always remember to check the versioning status before you start.

This is the seventh article in this series. In the next post, we’ll explore S3 bucket performance topics using Multi Part Upload and Transfer Acceleration. I’ll see you there to continue diving deeper into the world of S3!


Related Content

Get latest posts delivered right to your inbox
0%