Introduction to Amazon Elastic MapReduce
Articles,  Blog

Introduction to Amazon Elastic MapReduce


Whatever kind of industry you’re in, being able to obtain information
based on analysis of data coming from wide variety of sources can help you make
transformational decisions. To be able to make these decision
based on data of any scale, you need to be able to access
the right kind of tools to process and analyze your data. Software frameworks like Hadoop can help you store and process
large amounts of data at scale by distributing the data and
processing across many computers. But deploying, configuring
and managing Hadoop clusters can be difficult, expensive
and time consuming. Traditionally, you had to purchase the underlying
server and storage hardware, provision the hardware and then
deploy and manage the software. That’s even before you had a chance
to do anything with your data. Wouldn’t it be great
if there was an easier way? Amazon Elastic MapReduce
or Amazon EMR can make things much easier. Using the elastic infrastructure
of Amazon EC2 and Amazon S3, Amazon EMR provides
a managed Hadoop framework that distributes
computation of your data over multiple Amazon EC2 instances. Amazon EMR is easy to use. To get started, you can load your data and processing
applications into Amazon S3. Then you can launch
an Amazon EMR cluster in minutes and the cluster starts
processing your data. You don’t need
to worry about setting up, running or tuning the clusters We take care of that, so you can focus
on the analysis of your data. When your job is complete, you can
retrieve the output from Amazon S3. Amazon EMR monitors your job
and when it’s completed, shuts down the clusters,
so you stop paying. Or, you can leave
the cluster running, so it is available for additional
processing or querying. You can easily expand
or shrink your clusters to handle more or less data
or get answers more quickly. And if you store your data
in Amazon S3 it can be accessed
by multiple EMR clusters. This means users
can quickly spin up as many clusters as they need
to test new ideas and terminate the clusters
when they’re no longer needed. This can help speed up innovation and lower the cost
of experimentation. And you can even
optimize each cluster for a particular application. Amazon EMR is low cost and provides
a range of pricing options including hourly on demand pricing, the ability to reserve capacity
for a lower hourly rate or name your own price
for the resources you need, with spot instances. Or when you’re using Amazon EMR, you pay only for the resources
that you use. Amazon EMR automatically configures
security groups for the cluster and makes it easy
to control access and permissions. You can also launch clusters in
an Amazon Virtual Private Cloud VPC, a logically isolated network
that you define. With Amazon EMR,
you can run custom MapReduce code or use a variety of powerful
applications and frameworks such as Hive, Pig, HBase, Impala,
Cascading and Spark. You can use a variety of different
programming languages and we provide code
samples and tutorials to help get you up
and running quickly. Amazon EMR also supports multiple
Hadoop distributions and integrates with popular
third party tools. You can also install
additional software or further customize clusters
for your specific use case. To get started, you can visit
our Getting Started page or follow the step-by-step examples
in our documentation to launch your first cluster.

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *