Paper Title
Data Position System to Enhance Performance of Big Data Processing in Hadoop Map-Reduce
Abstract
The Map Reduce framework has been used widely in many data science applications for high-performance
computing to process Terabytes of data or even more. An open source usage of Hadoop Map-Reduce along with Hadoop
Distributed File System (HDFS), are broadly used for cluster processing jobs which require less response time. The current
Hadoop system execution in a homogenous hubs cluster will not consider data locality and the position for propelling
theoretical preparing undertakings. The hubs in the cluster are expected to have same I/O speeds in spite of hubs being
configured with a new era of storage hardware. Additionally, consideration of network delays in the background has been
disregarded in the current Hadoop usage. Unfortunately, both the homogeneity and data locality presumptions in Hadoop are
idealistic, best case scenario is not achieved even under the least favorable conditions, conceivably presenting execution
issues in data servers. This paper investigates the modification to block position approach to enhance the performance of big
data processing in Hadoop Map-Reduce framework while keeping up an aggressive test precision. The algorithm fulfills a
significant speedup in big data analytics utilizing Map-Reduce while allocating additional data over the cluster to a hub
which is having higher I/O capacities reduces the data movement traffic between the hubs. This modification in the Hadoop
Map-Reduce framework will improve the overall speedup of default data position system of HDFS and further with Hadoop
balancer
Keywords- Balancer, Hadoop, Hadoop Distributed File System, I/O Subsystems, MapReduce.