Apprendre Amazon Kinesis Data Streams for Developers

Glissez pour afficher le menu

Aisha's team built a real-time fraud detection system. Every payment generated a stream of events — card details (hashed), location, amount, device fingerprint — at peak about 8,000 events per second. They needed multiple downstream systems to read this stream: a fraud model, a fraud analyst dashboard, and an archival pipeline writing to S3. SNS couldn't do it. EventBridge couldn't sustain that throughput. The right answer is Amazon Kinesis Data Streams.

This chapter is about when Kinesis is the right tool, how its model differs from SNS and SQS, and the pieces a developer touches.

What Kinesis Data Streams Is

Kinesis Data Streams is a real-time streaming service. Producers write records to a stream; consumers read those records in order. Unlike SQS, records are not deleted when consumed — they stay in the stream for a retention period and can be re-read by multiple consumers independently.

The core differences from messaging services:

Order is preserved within a shard (a partition of the stream);
Records persist for a configurable retention — 24 hours by default, up to 365 days;
Multiple consumers can read the same data independently, at their own pace;
Designed for high throughput — thousands to millions of records per second;
Consumers track their own position with a sequence number or shard iterator. The mental model is closer to a log file than a queue.

Shards: The Unit of Parallelism

A stream is divided into shards. Each shard provides:

1 MB/second of write throughput, or 1,000 records/second;
2 MB/second of read throughput for shared consumers, or 2 MB/second per consumer for Enhanced Fan-Out. You provision the shard count (or use on-demand mode for automatic scaling). Adding more shards adds more parallelism — but also more cost.

Partition Keys

When you write a record, you supply a partition key — a string. Kinesis hashes the key to decide which shard the record lands on. Records with the same partition key always go to the same shard, preserving order for that key.

For Aisha's fraud system, the partition key was the card hash — so all transactions for a given card stayed in order on one shard. Picking a partition key with too few distinct values creates a hot shard; too many creates noise.

Producers and Consumers

Producers write records using:

PutRecord / PutRecords API calls;
The Kinesis Producer Library (KPL) for batching and retries at high throughput;
Amazon Data Firehose for AWS service integrations. Consumers read records using:
The Kinesis Client Library (KCL) — the standard for production consumers;
Direct GetRecords API calls;
AWS Lambda as a consumer, via event source mapping;
Enhanced Fan-Out for low-latency, high-throughput dedicated consumers. Lambda as a Kinesis consumer is the easy default for most workloads.

Enhanced Fan-Out

Standard consumers share the 2 MB/second per shard. With many consumers, they compete and slow each other down. Enhanced Fan-Out gives each consumer its own dedicated 2 MB/second per shard:

Push-based delivery — Kinesis pushes records to the consumer, instead of the consumer polling;
Sub-200 ms latency end-to-end;
Costs extra per consumer-shard hour. For Aisha's three consumers (fraud model, dashboard, archival), Enhanced Fan-Out kept latency at 80 ms even at peak load.

Retention and Replay

A Kinesis stream retains records for a configurable period:

Default 24 hours;
Extended retention up to 7 days at no extra cost;
Long-term retention up to 365 days at additional cost. Retention enables replay — if the fraud model has a bug and miscategorizes 30 minutes of transactions, the team can rewind the consumer and reprocess that period after the fix.

Kinesis Data Streams vs Firehose

Two Kinesis products often get confused:

Kinesis Data Streams — the raw stream. Custom consumers, custom logic, real-time. What this chapter covers;
Amazon Data Firehose (formerly Kinesis Data Firehose) — a managed delivery service that takes streaming data and writes it to S3, Redshift, OpenSearch, or HTTP endpoints. No custom consumer code. Use Firehose when you just need to land streaming data in storage. Use Data Streams when you need custom real-time processing.

Kinesis vs SQS vs SNS vs EventBridge

The full lineup for moving data between services:

SQS — simple queues, one consumer or consumer group, messages deleted after processing;
SNS — pub/sub fan-out, push-based, no replay;
EventBridge — content-routing across AWS and SaaS, schema-aware, with archive and replay;
Kinesis Data Streams — ordered, durable, high-throughput stream with multiple independent consumers and replay. The exam tests this matrix relentlessly. Memorize the discriminators:
"Need to replay?" → Kinesis or EventBridge with archive, not SNS or SQS;
"Need ordering?" → FIFO SQS, FIFO SNS, or Kinesis;
"Need 10,000+ events/second to multiple consumers?" → Kinesis;
"Routing events from many AWS services?" → EventBridge.

For the Exam

DVA-C02 hits:

Shard math — write 1 MB/s, read 2 MB/s;
Partition keys and hot shards;
The difference between Kinesis Data Streams and Data Firehose;
Enhanced Fan-Out — when to use it;
Lambda as a Kinesis consumer with event source mapping (batch size, batch window, parallelization factor).

Tout était clair ?

Merci pour vos commentaires !

Section 1. Chapitre 9

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 1. Chapitre 9