Stream AWS EMR logs to AWS Cloudwatch

pavan kumar ceemala
3 min readMar 8, 2021

--

Photo by Zack Smith on Unsplash

AWS EMR:

Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data, it can be used to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB.

AWS Cloudwatch:

Amazon CloudWatch is a monitoring and management service that provides data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources.

AWS Elasticsearch:

Amazon Elasticsearch Service is a fully managed service, using Elasticsearch, we can build, monitor, and troubleshoot our applications. The service provides support for open source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS services, and built-in alerting and SQL querying.

Steps to send AWS EMR logs to Cloudwatch:

Step1: While creating an EMR cluster, an IAM role is required, attach these three AWS managed IAM policies to the EMR cluster role.

arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy arn:aws:iam::aws:policy/CloudWatchAgentAdminPolicy arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM

These policies will help EMR to communicate with SSM, and Cloudwatch services.

Step2:Create an SSM parameter, the parameter name should start with AmazonCloudWatch, and provide this agent file content as string to the parameter.

In the sample cloudwatch agent file here, we are trying to stream EMR cloudwatch related logs, hadoop yarn application stdout,stderr logs, hadoop yarn node logs, and hadoop hdfs logs

{
"agent": {
"metrics_collection_interval": 60,
"region": "eu-west-1",
"logfile": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log",
"debug": false,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log",
"log_group_name": "/emr/cwlogs",
"log_stream_name": "emr-cwlogs",
"timezone": "UTC"
},
{
"file_path": "/var/log/hadoop-yarn/*.log",
"log_group_name": "/emr/hadoop-yarn",
"log_stream_name": "emr-hadoop-yarn-logs",
"multi_line_start_pattern": "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}",
"timezone": "UTC"
},
{
"file_path": "/var/log/hadoop-yarn/**/**/**/stdout",
"log_group_name": "/emr/app-logs",
"log_stream_name": "app-stdout",
"multi_line_start_pattern": "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}.\\d{3}",
"timezone": "UTC"
},
{
"file_path": "/var/log/hadoop-yarn/**/**/**/stderr",
"log_group_name": "/emr/app-logs",
"log_stream_name": "app-stderr",
"multi_line_start_pattern": "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}.\\d{3}",
"timezone": "UTC"
},
{
"file_path": "/var/log/hadoop-hdfs/*.log",
"log_group_name": "/emr/hadoop-hdfs",
"log_stream_name": "emr-hadoop-hdfs-logs",
"multi_line_start_pattern": "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}",
"timezone": "UTC"
}
]
}
}
}
}

Step3: Install cloudwatch-agent on all nodes(master, core, task)

If you would like to install any custom application or run any script on your AWS EMR cluster during provisioning, EMR provides an option to declare these bootstrap actions in a shell script and it fetches this script from AWS S3 bucket, these details are provided during EMR launch.

Add these commands to the bootstrap script

#!/bin/bash########################
## 1. Install SSM agent
########################
sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
sudo systemctl status amazon-ssm-agent
sudo systemctl enable amazon-ssm-agent
sudo systemctl restart amazon-ssm-agent
########################################################
## 1.Install cw agent
## 2.Use SSM parameter to load the cw agent config file
########################################################
sudo yum install -y amazon-cloudwatch-agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:AmazonCloudWatch-cw-config
sudo amazon-cloudwatch-agent-ctl -a stop
sudo amazon-cloudwatch-agent-ctl -a start

line# 6–9 are to install SSM agent on the cluster nodes(master, core, task), so that the instances can be accessed without a bastion or ssh key,

line# 15–18 are to install cloudwatch agent service, which will stream the logs from the server to AWS Cloudwatch, and then start the cloudwatch agent using the ssm parameter as argument, and then it starts and stops the cloudwatch agent.

Step4: Validate the log creation on AWS cloudwatch by checking the log group names, and log stream which are provided in Step2 above.

Following the steps above, we can easily send AWS EMR logs to Cloudwatch,

--

--

pavan kumar ceemala
pavan kumar ceemala

No responses yet