In this post I’m gonna share the steps to install Apache Spark in Ubuntu 18.04. In order to install Spark, you will need java and scala installed in your machine.
Ensure java is installed
Follow the steps here to install java
Ensure scala is installed
If scala is not installed, follow the steps here to install scala
Download apache spark from officil page. I’m using version 2.4.3 as it is the latest.
Extract the file
tar xvf spark-2.4.3-bin-hadoop2.7.tgz
Move the extracted directory to the location you want. I’m moving it to /usr/local and changing the name of directory to spark
mv spark-2.4.3-bin-hadoop2.7 /usr/local/spark
Now, we need to tell the location of spark binary files to the terminal. For that we can export the path to binaries using .bashrc file. If .bashrc is not present in the home directory check for .bash_profile. If you couldn’t find both, create .bashrc in your home directory using the command “touch ~/.bashrc”
If bashrc is present, open it using any editor and add the following line in .bashrc file. I use vim for editing.
Save and exit
Run source command to read the new path.
You are all set now. Run the following command to open Spark Shell
If the installation was a success, you will get the following response
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://ip-60-10-10-9.us-west-2.compute.internal:4040 Spark context available as 'sc' (master = local[*], app id = local-1562066148909). Spark session available as 'spark'. Welcome to
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_//__/ .__/_,_/_/ /_/_\ version 2.4.3
/_/Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_212) Type in expressions to have them evaluated. Type :help for more information. scala>
You can access the webUI using port 4040