The DataStream API is Flink’s core API for building data stream processing programs. It lets you express transformations on potentially unbounded streams of data — filtering, mapping, aggregating, windowing — and connect sources to sinks to produce results. ADocumentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/flink/llms.txt
Use this file to discover all available pages before exploring further.
DataStream is an immutable, distributed collection of data. You cannot inspect its elements directly or mutate them in place. Instead, you chain API operations that describe a computation graph, then trigger execution with env.execute().
Program structure
Every DataStream program follows the same five-step pattern:Obtain a StreamExecutionEnvironment
The Use
StreamExecutionEnvironment is the entry point for every Flink program. It holds configuration and creates data sources.getExecutionEnvironment() in almost all cases. It detects context automatically: it creates a local, single-JVM environment when you run from an IDE, and connects to a cluster when you submit a JAR with bin/flink run.Two other factory methods exist for specific cases:Load or create initial data
Attach a source to the environment to get a
DataStream. Flink provides built-in sources for collections, files, sockets, and sequences, plus a connector ecosystem for Kafka, Kinesis, and more.Apply transformations
Transformations produce new
DataStream instances from existing ones. They are lazy — nothing executes until you call execute().Write results to a sink
Sinks consume a
DataStream and write records to external systems. For production use, prefer sinkTo() with the Sink API over the older addSink().Complete example: streaming word count
The following program reads words from a socket, counts them in 5-second tumbling windows, and prints results to stdout.WindowWordCount.java
Key concepts
Lazy evaluation: Calling.map(), .filter(), or .keyBy() builds a dataflow graph in memory. No data moves until execute() is called. This lets Flink optimize the full graph before execution begins.
Parallelism: Each operator runs as one or more parallel instances. Set job-level parallelism with env.setParallelism(n) or per-operator with .map(...).setParallelism(n).
Operator chaining: Flink automatically chains adjacent operators that can share a thread (for example, consecutive map() calls) to reduce serialization overhead. You can disable chaining globally or per operator.
Buffer timeout: By default, Flink buffers records for up to 100 ms before flushing to downstream operators. Lower this for latency-sensitive pipelines:
Explore further
Execution Mode
Switch between STREAMING and BATCH execution modes for bounded data.
Data Sources
Built-in sources, the FLIP-27 Source API, and FileSource.
Operators
map, flatMap, filter, keyBy, window, connect, and more.
Data Sinks
FileSink, print, custom sinks, and exactly-once delivery.
Event Time & Watermarks
WatermarkStrategy, TimestampAssigner, and windowing with event time.
Fault Tolerance
Checkpointing, restart strategies, and exactly-once guarantees.
Working with State
ValueState, ListState, MapState, and state TTL.
Side Outputs
Route records to multiple output streams from a single operator.

