An Amazon EMR stands for Amazon Elastic MapReduce. It is a web service used to process the large amounts of data in a cost-effective manner. The central component of an Amazon EMR is a cluster. Each cluster is a collection of EC2 instances and an instance in a cluster is known as node. Each node has a specified role attached to it known as a node type, and an Amazon EMR installs the software components on node type.
Following are the node types:
- Master node
A master node runs the software components to distribute the tasks among other nodes in a cluster. It tracks the status of all the tasks and monitors the health of a cluster. - Core node
A core node runs the software components to process the tasks and stores the data in Hadoop Distributed File System (HDFS). Multi-node clusters will have at least one core node. - Task node
A task node with software components processes the task but does not store the data in HDFS. Task nodes are optional.