YARN (Yet Another Resource Negotiator) is the resource-management layer of Apache Hadoop. It schedules jobs, allocates CPU and memory across the cluster, and tracks which work is happening where. YARN sits above the storage layer (HDFS) and below the computation framework (MapReduce or its successors like Spark), and its job is to keep the cluster’s compute resources fairly and efficiently distributed across whatever jobs are running.

The YARN architecture has three main components:

  • A central ResourceManager runs once per cluster. It receives job submissions, allocates resources, and tracks the global state.
  • Per-node NodeManagers run on each worker machine. They launch and monitor the containers that actually run computation, and report their available resources back to the ResourceManager.
  • A per-application ApplicationMaster is spawned for each job. It negotiates resources from the ResourceManager and coordinates the application’s tasks on the NodeManagers. This is the piece that knows what the job actually wants to do — MapReduce, Spark, and Tez each ship their own ApplicationMaster implementation.

When a user submits a MapReduce job (or a Spark application), the ResourceManager first launches an ApplicationMaster for it, which then requests containers — packaged-up resource allotments of CPU and memory — and runs tasks inside them on whichever nodes had capacity. When the job finishes (or fails), YARN reclaims the containers and reassigns the resources.

The architectural reason YARN exists is that the original Hadoop’s resource management was tightly coupled to MapReduce — a single JobTracker on the master combined resource allocation with MapReduce-specific job scheduling, and a TaskTracker on each worker ran the map and reduce tasks. As Hadoop matured (YARN landed with Hadoop 2.x around 2013) and people wanted to run other things on the same cluster — Spark, Hive, Tez, applications built on neither — the tight coupling was a bottleneck. YARN factored resource management out of the framework so the same cluster could host many computational frameworks simultaneously: the global ResourceManager handles allocation, and per-application ApplicationMasters handle framework-specific scheduling.

YARN is one of the three core components of Apache Hadoop alongside HDFS (storage) and MapReduce (computation). On a running cluster, the YARN-related daemons listed by jps are the ResourceManager (on the master) and the NodeManager (on every worker).