Member-only story
On-prem to Cloud — Hadoop
Disclaimer: I have worked for a Hadoop vendor called Hortonworks who is Cloudera now and I have worked many customers in my technical pre-sales role where I have sold Hadoop and also, helped customers on comparing Hadoop vs EMR, Dataproc
Let’s talk about Cloud giant offerings on Hadoop. EMR from AWS, Dataproc from Google and HDInsight from Microsoft/Azure
EMR — https://aws.amazon.com/emr/
Dataproc — https://cloud.google.com/dataproc/
Azure — https://azure.microsoft.com/en-us/services/hdinsight/
Why do we need to move to Cloud-based Hadoop solution?
To reduce CapEx and OPEX cost. There is no need to keep servers up and running if there is no workload. Also, having a centralized data storage layer with the flexibility of spinning up the compute layer without moving the data around is a powerful idea if done right.
Imagine shutting down the on-prem Hadoop cluster during off-peak hours or idle time.
In this article, let’s take a look at GCP
1 — Move data to cloud buckets.