Availability of the Jobtracker Machine in Hadoop/Map-Reduce Implementations

Availability of the Jobtracker Machine in Hadoop/Map-Reduce Implementations.

ABSTRACT

Due to the growing demand for Cloud Computing services, the need and importance of Distributed Systems cannot be underestimated.

However, it is difficult to use the traditional Message Passing Interface (MPI) ap- proach to implement synchronization, coordination,and prevent deadlocks in distributed systems.

This difficulty is lessened by the use of Apache’s Hadoop/MapReduce and Zookeeper to provide Fault Tolerance in a Homo- geneously Distributed Hardware/Software environment.

In this thesis, a mathematical model for the availability of the JobTracker in Hadoop/MapReduce using Zookeeper’s Leader Election Service is examined.

Though the availability is less than what is expected in a k Fault Tolerance system for higher values of the hardware failure rate, this approach makes coordination and synchronization easy, reduces the effect of Crash failures, and provides Fault Tolerance for distributed systems.

The availability model starts with a Markov state diagram for a general case of N Zookeeper servers followed by specific cases of 3,4,and 5 servers. Both software and hardware faults are considered in addition to the effect of hardware and software repair rates.

Comparisons show that, the system availability changes with change in the number of Zookeeper servers, with 3 servers having the highest availability.

The model presented in this study can be used to decide on how many servers are optimal for maximum availability and from which vendor they must be purchased. It can also help determine what time to use a Zookeeper coordinated Hadoop cluster to perform critical tasks.

Declaration i

Acknowledgement ii

List of Tables vi

List of Figures viii

Abstract ix

Introduction 1
- Problem Statement. 2
- Objectives…… 4
- Thesis Organization……. 4
Cloud Computing and Fault Tolerance 5
- Cloud Computing…….. 5
- Types of Clouds… 6
- Virtualization in the Cloud……. 7
  - Advantages of virtualization 7
- Fault, Error and Failure.. 7
- Fault Tolerance……. 9
  - Fault-tolerance Properties 9
  - K Fault Tolerant Systems 12
  - Hardware Fault Tolerance 13
  - Software Fault Tolerance 14
- Properties of a Fault Tolerant Cloud 15
  - Availability…… 15
  - Reliability….. 16
  - Scalability….. 17
- Hadoop/MapReduce Architecture 18
  - Hadoop/MapReduce…. 18
  - MapReduce…… 20
  - Hadoop/MapReduce versus other Systems 21
    - Relational Database Management Systems (RDBMS) 21
    - Grid Computing… 22
    - Volunteer Computing 23
  - Features of MapReduce……. 23
    - Automatic Parallelization and Distribution of Work . 23
    - Fault Tolerance in Hadoop/MapReduce 23
    - Cost Efficiency…. 24
    - Simplicity 24
  - Limitations of Hadoop/MapReduce 25
  - Apache’s ZooKeeper….. 25
    - ZooKeeper Data Model… 26
    - Zookeeper Guarantees… 27
    - Zookeeper Primitives……….. 28
    - Zookeeper Fault Tolerance… 29
  - Related Work……. 29
- Availability Model 32
  - JobTracker Availability Model 32
  - Model Assumptions…….. 33
  - Markov Model for a Multi-Host System 33
    - The Parameter λ_s(t)….. 35
  - Markov Model for a Three-Host (N = 3) Hadoop/MapReduce Cluster Using

Zookeeper as Coordinating Service.. 35

Numerical Solution to the System of

Differential Equations….. 41

Interpretation of Availability plot of the JobTracker . 41 6…. Discussion of Results 44

4.6.1 Sensitivity Analysis….. 44

Conclusion and Future Work 51
- Conclusion…. 51
- Future Work……… 52

Appendix 53

INTRODUCTION

The effectiveness of most modern information (data) processing involves the ability to process huge datasets in parallel to meet stringent time con- straints and organizational needs. A major challenge facing organizations today is the ability to organize and process large data generated by cus- tomers.

According to Nielson Online[1] there are more than 1,733,993,741 internet users. How much data these users are generating and how it is pro- cessed largely determines the success of the organization concerned.

Consider the social networking site Facebook; as at August 2011, it has over 750 million active users[2] who spend 700 billion minutes per month on the network.

They install over 20 million applications every day and interact with 30 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) each month. Since April 2010 when social plugins were launched, an average of 10,000 new websites has integrated with Facebook.

The amount of data generated in Facebook is estimated as follows [3]:12 TB of compressed data added per day 800 TB of compressed data scanned per day 25,000 map-reduce jobs per day 65 million files in HDFS. 30,000 simultaneous clients to the HDFS NameNode.

It was a similar demand to process large datasets in Google that inspired Engineers in Google to introduce MapReduce.

At Google MapReduce is used to build Index for Google Search, Article clustering for Google News and perform Statistical machine translations. At Yahoo!, it is used to build Index for Yahoo! Search and spam detection.

REFERENCES

http://hadoop-karma.blogspot.com/2010/03/how-much-data-is- generated-on-internet.html, http://www.nielsen.com/us/en.html

http://www.facebook.com/press/info.php?statistics http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds- largest-hadoop.html

Matei Zaharia, A Presentation on Cloud Computing with MapReduce and Hadoop, UC Berkeley AMP Lab, 2010.

Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Google, Inc. 2004.

Grant Mackey, Saba Sehrish, John Bent, Julio Lopez, Salman Habib, Jun Wang. Introducing Map-Reduce to High End Computing, Uni- versity of Central Florida, Los Alamos National Lab, Carnegie Melon University.

Availability of the Jobtracker Machine in Hadoop/Map-Reduce Implementations

Byadmin

ABSTRACT

TABLE OF CONTENTS

INTRODUCTION

REFERENCES

Related

By admin

Related Post

Scholarship from American University of Dubai, 2024

Scholarships at Waikato University 2024

Google Summer Work Experience 2024 | $3300 Allowance + Certification

Leave a Reply Cancel reply

You missed

Scholarship from American University of Dubai, 2024

Scholarships at Waikato University 2024

Google Summer Work Experience 2024 | $3300 Allowance + Certification

Thank You for Inviting Me to Your Birthday Party Quotes

Students and Scholarship