Install Apache Spark on MacOS X

Foolish assumptions:

  • you have a Mac, running reasonably recent version of MacOS X (Sierra, High Sierra or Mojave).
  • you know how to use the shell, e.g., bash or zsh etc.
  • you have homebrow installed. If not, follow these instructions https://brew.sh/
  • you have xcode-select installed. If not, in your terminal type xcode-select --install
  • you have Java installed on your machine. To check, type java -version. If you get an error install it, e.g., using brew cask install java.

Steps:

  • install scala: brew install scala
  • install apache-spark: brew install apache-spark
  • test it worked: type spark-shell. You should see the spark logo and the spark prompt. Type :quit to exit the spark-shell.

To use pyspark

  • install python 3 (if you prefer to it to using python 2): brew install python
  • add the line export PYSPARK_PYTHON=python3 to your shell profile, e.g., bash_profile.
  • restart the Terminal and type pyspark. You should see Python 3.7.0 printed in the first line of the output and the same spark shell as above. Type quit() to exit.

brew install hadoop

16/09/2018