Situation: Someone in my team has executed Spark application in EMR and the job failed. The user is new to EMR and does not have much idea how to check the Spark logs.
Now he has asked me to debug it and find the error. Only information I have is the yarn application_id.
In this post, I will share step by step approach to check the Spark logs in EMR and identify the error.
First thing I will do is to login into Amazon EMR console.
I will make a note of following information from the summary tab
- EMR ID: It is required to identify the logs on Amazon S3 if multiple Amazon EMR exists.
- Master public DNS : It is required if I have to connect to EMR via cli to get the logs.
- I will also click on “Connect to the Master Node Using SSH” link to get the command to connect to EMR.
- In “Configuration details” section I will also note “Log URI” to check log files in S3 path
- From Application user interfaces, I will click on “YARN timeline server” to open yarn application history.
*I could have clicked on “Spark history server” also to see if there is any error with Spark execution. However I want to see stdout file first to check the error first. Also from yarn run history you can reach spark history as well.
- In the application list, I will click on the application_id.
- If I click on “Tracking URL: History” it will open Spark History server for this job.
- I can see logs at the bottom for each attempt. Click on the Logs hyperlink for the first attempt i.e. the one with 000001 at the end.
- Once I click it, I see below page
- I will copy the Log Type path for stdout.gz file.
- I will click on the link which says “Click here for the full log” for stdout.gz
This will show the entire stdout log and I can clearly see the error. In this case , it was syntax error in the pyspark code.
Next time, I will not follow the same process. I will simply login using aws cli to fetch log file from S3 path.
S3 path for the log should be path(Step-d)/EMRid(Step-a)/LogTypepath(Step-j)
*change the application_id number in the path above.
Example:
aws s3 cp s3://<s3bucket>/<prefix>/<emrID>/containers/application_<appid>/container__<appid>_01_000001/stdout.gz ./
Next step is: gunzip stdout.gz and read the content of file.