S3 Core Concepts

Simple storage service (S3) was the first AWS service launched in 2006. S3 focuses on general object storage in “the cloud”. General objects include files, source code, media, and data. S3 is generally cheap, highly available, highly durable, and it integrates with many AWS services. S3 is useful for website hosting, database backups, data processing, and storing software artifacts. Notes below cover S3 core concepts.

S3 AWS documentation

Short video on S3 basics

Core concepts

objects in buckets with globally unique names
buckets have a 5TB maximum size
bucket is a general purpose file system with flexible organization
objects are stored in buckets in the file system
create bucket and structure, put objects in bucket, retrieve objects from bucket
retrieve objects via http url, only public bucket, private by default
retrieve object via boto3 s3client.get_object()
retrieve object via aws console
control bucket access and operations with bucket policy or IAM policy

Storage classes

buckets and objects can have different storage classes
storage classes have decreased cost but decreased performance
standard, intelligent, infrequent access, glacier
different pricing, latency, and availability
hot data, infrequent access, and cold data cycle through storage classes
life cycle rules automate storage tier cycling

Security

misconfigured S3 Bukets can easily leak data
public access is blocked by default
objects are encrypted in transit and at rest, has to be configured
access and resource controls with AWS IAM
logging and alarms integrate with cloudwatch, has to be configured
assume AWS cloud infrastructure security

S3 in action

data ingestion pipeline
- API based stock data stream
- kinesis firehouse batching of independent events
- deliver batches to s3
- batch event triggers lambda
- batch object is used by lambda
S3 events for object creation, deletion, modification
- trigger event to invoke lambda, pass bucket name and object key
- pull object, do operation, return object to bucket, or do something else
s3 and lambda is a useful combination
analytics and dashboarding
- athena analytics service on bucket contents
- bucket objects have to follow a scheme
- pay per use, no database provisioning
- athena, s3, quicksight
event driven architectures
- image uploading pipeline
- customer, s3, lambda, appsync
- s3 to lambda put notification, run process on lambda
- notify customer download is ready with pub/sub model

Pricing

dependent on storage classes
three main factors
- stored object size and number
- accessing objects over network with GET and POST
- transferring objects between buckets and services
100gb storage, 10K PUT, 10 Read
- standard 6.76 / month
- infrequent access 6.27 / month
- glacier 6.64 / month
free tier
- 5gb, 20k get, 2K put
- 12 month max