Apache Hadoop YARN is a popular resource manager for data processing workloads. When you deploy Flink on YARN, Flink’s JobManager and TaskManagers run inside YARN containers, and Flink dynamically requests and releases containers from YARN’s ResourceManager.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/flink/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- A running YARN cluster (version 2.10.2 or later). Managed services like Amazon EMR or Google Cloud Dataproc work well.
- Flink distribution downloaded and unpacked.
HADOOP_CLASSPATHenvironment variable set:
Deployment modes
Application mode (recommended for production)
Application mode creates a dedicated Flink cluster on YARN for one application. The application’smain() method runs on the JobManager inside YARN. The cluster shuts down when the application finishes.
Session mode
Session mode starts a long-running Flink cluster on YARN that accepts multiple job submissions.- Attached (default): The
yarn-session.shclient keeps running and tracks cluster state. If the cluster fails, the client reports the error. - Detached (
--detachedor-d): The client submits the cluster and returns immediately.
YARN configuration
Key YARN-specific configuration options to set inconf/config.yaml:
-D:
Resource allocation
Flink on YARN requests additional TaskManagers when running jobs require more resources than are currently available. In Session mode, unused TaskManagers are released after a configurable timeout. Failed containers (including the JobManager) are automatically replaced by YARN. The number of allowed JobManager restarts is controlled byyarn.application-attempts, bounded by YARN’s yarn.resourcemanager.am.max-attempts setting (defaults to 2 in both).
High availability on YARN
HA on YARN combines YARN’s container restart capability with a Flink HA service (ZooKeeper or Kubernetes). YARN handles restarting the JobManager container; the HA service persists metadata and manages leader election.YARN version notes
- YARN < 2.4.0: All containers restart when the application master fails.
- YARN 2.4.0–2.6.0: TaskManager containers survive application master failure (faster recovery).
- YARN ≥ 2.6.0: Attempt failure validity interval is set to Flink’s Pekko timeout, preventing long-running jobs from depleting their restart attempts.
Firewall configuration
If your cluster is behind a firewall, configure a port range for Flink’s REST endpoint so the client can submit jobs from outside the cluster network:Hadoop configuration files
To pass additional Hadoop configuration to Flink:HADOOP_CLASSPATH by default.
User JARs and classpath
| Mode | User JAR recognition |
|---|---|
| Session mode | Only the JAR specified in the flink run command |
| Application mode | JAR specified in the command + all JARs in $FLINK_HOME/usrlib/ |
yarn.classpath.include-user-jar:
ORDER(default): added based on lexicographic orderFIRST: added at the beginning of the system classpathLAST: added at the end of the system classpathDISABLED: added to the user classpath instead

