The most common reason for namenode to go into safemode is due to under-replicated blocks. This is generally caused by storage issues on hdfs or when some jobs like Spark applications are suddenly aborted that leaves temp files which are under-replicated. If your namenode is in safemode then your hadoop cluster is in read-only mode till the time it replicates the under-replicated blocks or the CORRUPT blocks/files are removed.
You can also forcefully leave the safemode and then manually delete the CORRUPT files to resolve the issue.
How to identify if Namenode is in safemode ?
Use the below command to check the status of namenode
hdfs dfsadmin -safemode get
If the output is “Safe mode is ON” then you know that your hadoop cluster is in read-only mode.
How to turn off safemode ?
Use the below command to turn-off safemode
hdfs dfsadmin -safemode leave
The output of the above command should be “Safe mode is OFF“. If you still can’t get namenode out of safemode after running above command, please try re-starting your master node by using “sudo reboot” command. Make sure that you have back-up of important files before you issue reboot command.
How to identify CORRUPT blocks ?
If you will not remove CORRUPT blocks/files then namenode will go into safemode again. You must identify and remove the corrupt files manually. In order to check the corrupt blocks/files please use below command:
hdfs fsck / | egrep -v '^\.+' | grep -v replica | grep -v Replica
The output will show you list of CORRUPT files. Refer to below image:
If the files are not important then you can delete it by using below command:
hdfs fsck / -delete
If the above command does not work then you may want to manually run delete for identified files. You can use below command:
hdfs dfs -rm -r -skipTrash PATH_OF_CORRUPT_BLOCKS_FILES
Replace PATH_OF_CORRUPT_BLOCKS_FILES with the actual hdfs path of corrupt files. Once this is done, run the above mentioned command “hdfs fsck…” again to check the status.
If you see “The filesystem under path ‘/’ is HEALTHY” then you have successfully removed corrupt blocks and now namenode shall not go back into safemode for this reason.