-
Sven Graupner authoredSven Graupner authored
H_README_repo.md 7.26 KiB
Apache Spark Standalone Cluster on Docker
The project was featured on an article at MongoDB official tech blog!
😱
The project just got its own article at Towards Data Science Medium blog!
✨
Introduction
This project gives you an Apache Spark cluster in standalone mode with a JupyterLab interface built on top of Docker. Learn Apache Spark through its Scala, Python (PySpark) and R (SparkR) API by running the Jupyter notebooks with examples on how to read, process and write data.
TL;DR
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
docker-compose up
Contents
Quick Start
Cluster overview
Application | URL | Description |
---|---|---|
JupyterLab | localhost:8888 | Cluster interface with built-in Jupyter notebooks |
Spark Driver | localhost:4040 | Spark Driver web ui |
Spark Master | localhost:8080 | Spark Master node |
Spark Worker I | localhost:8081 | Spark Worker node with 1 core and 512m of memory (default) |
Spark Worker II | localhost:8082 | Spark Worker node with 1 core and 512m of memory (default) |
Prerequisites
- Install Docker and Docker Compose, check infra supported versions
Download from Docker Hub (easier)
- Download the docker compose file;
curl -LO https://raw.githubusercontent.com/cluster-apps-on-docker/spark-standalone-cluster-on-docker/master/docker-compose.yml
- Edit the docker compose file with your favorite tech stack version, check apps supported versions;
- Start the cluster;