A study on MapReduce job failures in Hadoop

A study on MapReduce job failures in Hadoop

Ehsan Shirzad, Hamid Saadatfar
Faculty of Electrical and Computer Engineering, University of Birjand, Daneshgah Blvd, Birjand, Iran
COMPUTER MODELLING & NEW TECHNOLOGIES 2019 23(1) 7-21

Today, many big companies such as Facebook, Yahoo, and Google are using Hadoop for a variety of purposes. Hadoop is an open source software framework based on MapReduce parallel programming model for processing big data. Due to the importance of big data systems such as Hadoop, many studies have been conducted on these systems in order to achieve various goals such as efficient resource management, effective scheduling, and cognition of failure causes. By studying the failure causes, we can discern and resolve them, increase system’s efficiency, and prevent from waste of resources and time. In this paper, we studied log files of a research cluster named OpenCloud in order to recognize job failures. OpenCloud has a long history of using Hadoop framework and has been used by researchers in various fields. Our study showed that different factors such as executing duration, number of executor hosts, volume of input/output data, and configurations affect the success or failure rate of the MapReduce jobs in Hadoop.