Elastic MapReduce

#aws #data #spark

Manged Hadoop cluster on EC2, including Spark, HBase, Presto, Flink, Hive and more.

  • Master node
  • Core node for storage
  • Task node for processing

Storage

  • HDFS
  • EMRFS (S3 as HDFS)
  • Local FS
  • EBS

Spark

  • Spark Streaming (w/ Kinesis)
  • Spark SQL
  • MLLib
  • GraphX
  • Spark Core

Notebooks

  • Zeppelin
  • Notebook