Location:  Home » Books » Hadoop: The Definitive Guide    

Hadoop: The Definitive Guide

Hadoop: The Definitive GuideAuthor: Tom White
Publisher: O'Reilly Media
Category: Book

List Price: $44.99
Buy New: $24.00
as of 9/7/2010 15:49 CDT details
You Save: $20.99 (47%)

In Stock


New (28) Used (13) from $19.74

Seller: bestever
Rating: 4.0 out of 5 stars 10 reviews
Sales Rank: 49,420

Media: Paperback
Edition: 1
Pages: 528
Number Of Items: 1
Shipping Weight (lbs): 1.6
Dimensions (in): 9.1 x 7 x 1.1

ISBN: 0596521979
Dewey Decimal Number: 005.74
EAN: 9780596521974
ASIN: 0596521979

Publication Date: June 5, 2009
Availability: Usually ships in 1-2 business days

Features:
  • ISBN13: 9780596521974
  • Condition: New
  • Notes: BUY WITH CONFIDENCE, Over one million books sold! 98% Positive feedback. Compare our books, prices and service to the competition. 100% Satisfaction Guaranteed

Also Available In:

  • Paperback - Hadoop: The Definitive Guide

Similar Items:


Editorial Reviews:

Product Description

Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters.

Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:

  • Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce
  • Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence
  • Discover common pitfalls and advanced features for writing real-world MapReduce programs
  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
  • Use Pig, a high-level query language for large-scale data processing
  • Take advantage of HBase, Hadoop's database for structured and semi-structured data
  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject.

"Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!




Customer Reviews:
Showing reviews 1-5 of 10



5 out of 5 stars Pigs and Elephants on the road to World Domination   July 13, 2009
Techie Evan
23 out of 25 found this review helpful

These days, one can't seem to attend technical conferences without hearing marketing-oriented speakers' world domination plans for their products. So imagine this: what if pigs and elephants are involved? Elephants would be Hadoop installations, and Pigs would be one of those animal-themed tools, smarter cousins of the elephants really, riding on top of Hadoops, directing them on how to perform their jobs. Would the world be a better place?

Hadoop is the brainchild of Doug Cutting, who named his creation after his kid's stuffed yellow elephant. Hadoop enables large datasets distributed over a cluster of machines to be processed in parallel. One machine or node in that cluster would usually house a JobTracker and a NameNode. The JobTracker schedules and manages processing jobs to be executed in the other machines, and the NameNode manages the metadata (e.g., file names and locations, etc) of the datasets to be processed. The processing jobs are programmed in the form of Map and Reduce functions. Inputs are usually split into blocks to be processed in parallel by two or more identical mappers. The close to final outputs are then fed to one or more identical reducers, whose job is to perform any final transformations on the intermediate data to produce data summaries in the expected format. Several companies are using Hadoop to extract knowledge from their extensive data.

I've read this book and Jason Venners' Pro Hadoop book. Although I like both, I like this book better for the following reasons: more comprehensive coverage of topics, and more insiders' information on design rationales and how certain Hadoop features really work behind the scenes.

Here's a breakdown of and some commentaries on the book's contents:

Chapter One introduces Hadoop, its history and how it's different from similar tools or frameworks. Kinda dry. Chapter Two introduces the MapReduce Programming model and its benefits when compared to, say, the use of Unix tools for achieving parallel processing of text files. This is also where readers are introduced to the concepts of: map, combiner, and reduce functions, shuffle and sort, streaming, etc. Chapters Three and Four are all about the Hadoop Distributed FileSystems and I/O and the design decisions that were made to address performance, reliability, and safety concerns.

Chapter Five shows you how to develop, configure, test, run and tune a MapReduce Application. Good chapter but Jason Venner's book has better materials on testing and debugging MapReduce applications.

Chapters Six through Eight discuss how MapReduce really works behind the scene, including advanced features. This is where you'll learn how flexible Hadoop is when it comes to handling different types of inputs and outputs in terms of numbers, sizes, formats, and usage scenarios. Excellent!

Chapters Nine and Ten are really good. They teach you how to set up and administer Hadoop clusters. There's even a brief but informative section on how to use Hadoop with Amazon EC2 servers.

Chapters 11-13 devote one chapter each on how to install and interact with frameworks built on top of Hadoop: Pig, HBase, and ZooKeeper. Chapter 14 provides Case Studies (e.g., How Facebook uses Hadoop to analyze ad campaign effectiveness, etc.).

Appendices A and B provide instructions on how to install Apache's Hadoop and Cloudera's distribution, respectively, and C gives you a runthrough of the steps to take when preparing to use the NCDC Weather Data used in the book.

Very thorough and well written book. 4.5 stars rating.



5 out of 5 stars Brilliant book to get started and keep going   May 19, 2010
Simon Reavely (Boston, MA United States)
1 out of 1 found this review helpful

I really enjoyed the book. It has everything you need to:
a) Get started running your own cluster and writing your own MR jobs
b) Understand how to administer the cluster
c) Troubleshoot your programs
d) Learn about really important side projects like Pig, Hive, Zookeeper and HBase (of which I think Hive is the most amazing)

One thing I wish I'd done is go through the cloudera online tutorials BEFORE reading this book. If I'd done that (instead of doing so afterwards) I think I'd have got through certain sections of the book much quicker; basically I would have 'got it' quicker. See [...]



5 out of 5 stars Don't understand all the other negative reviews   July 23, 2009
Timothy T. Wee (Chicago, IL USA)
3 out of 4 found this review helpful

This is the book to get if you are actually doing something with Hadoop. It's been a lifesaver, and has answered all our questions of, "I wonder if I can do x in Hadoop?"
It gives a lot of information about the internals of Hadoop, which you will want to know when things go wrong or when you just want to get more out of Hadoop.
I normally don't post reviews as much, but I think Tom White and this book deserves way more than 5 stars, so I'm not sure why it only has 3 stars on Amazon.



5 out of 5 stars First 25 Pages Have You Up And Running!   August 24, 2009
Jonathan Zdziarski (NH, USA)
5 out of 7 found this review helpful

I picked up this book to catch up on Hadoop, which the rest of my team has been using for several months. Unfortunately I was too busy with other projects to spend any time on MapReduce and thought it'd be a grueling process to be brought up to speed on it. Within the first 25 pages and about 3 hours, Tom had me up and running my first MapReduce job which I successfully adapted for a specific metric we were trying to generate. The book does a great job of breaking down Hadoop's complex pieces into easy to understand components, but doesn't try and pump you full of conceptual BS before it lets you touch real code.

If I were to make any suggestions it would be to start the book off with some simple instructions for installing and getting Hadoop up and running on a local machine, followed by some simple explanations of DFS and Hadoop's commands for managing the file system. I would also explain much earlier how to get your classes recognized by Hadoop for those a bit rusty at Java. Fortunately, the online Wiki was very good about providing instructions to get me going on a Mac, and that took a majority of OS-specific needs off the burden of the book. You will, no doubt, have to be intelligent to read this book, but if you're using Hadoop, there is already a prerequisite for technical proficiency you'll need to satisfy. Overall good job, Tom.



5 out of 5 stars Excellent book on all aspects of Hadoop   August 5, 2009
Miles Trebilco (Japan)
1 out of 3 found this review helpful

Excellent book. Covers a lot of ground on all aspects of Hadoop.

This book was my point of reference for setting up and testing up a small cluster. Best detailed description I've found yet on the flow of data through a map and reduce job.

Small negative is the content is a little scattered - need to flip back and forth between chapters.

Strongly recommend.


Showing reviews 1-5 of 10



Copyright © 2009 Web Development
cloud computing  data mining  ec2  hadoop  machine learning