Node (computing)

A node in a distributed system is an individual machine that participates in the cluster. It can be a physical machine sitting in a rack (its own CPU, RAM, disks, network card) or a virtual instance running in the cloud. From outside the cluster the distinction usually doesn’t matter; what matters is that the node has its own compute and storage resources, and it communicates with other nodes over a network.

When we connect a group of nodes together so they cooperate as if they were one machine, we call the group a cluster. The cluster’s storage capacity is roughly the sum of its nodes’ disks (modulo replication); its computational throughput is roughly the sum of its nodes’ CPUs (modulo coordination overhead).

In a Hadoop cluster running HDFS, nodes split into two roles:

The NameNode is the bookkeeper, one (or a few, for high availability) per cluster, that maintains the metadata mapping file paths to block locations.
The DataNodes are the workers, many per cluster, that actually store and serve the blocks.

Other distributed systems have analogous role splits: a small number of coordinator nodes that track what’s where, and a much larger number of worker nodes that hold the data or run the computation.

At cluster scale, individual node failure is the rule, not the exception. With a thousand nodes, some node is almost always offline, rebooting, or misbehaving. Distributed systems are designed around this — data is replicated (in HDFS, every block has a Replication factor of 3), tasks are retried elsewhere when they fail, monitoring keeps watch continuously.

Idriss Rami — Notes

Explorer

Node (computing)

Graph View

Backlinks