Writing a HelloWorld Spark application with IntelliJ IDE and Python 3 in Windows 10

5 min readFeb 16, 2021

Florence, Italy; source: https://flic.kr/p/2jPh9KA

Introduction and context

In this tutorial, I want to show you how to set up a minimum working environment to develop Apache Spark applications in your Windows machine. So, without any more wait let’s go!

Step 1: installing Java SE Development Kit 11

First, go to Oracle’s website and download the Java SE development kit 11 (JDK 11) installer file for Windows 64bit from there. Then run the file you just downloaded to install the JDK 11 on your computer. After installation finishes, you can check if Java 11 is available on your computer by executing `java -version` in the Windows Command Prompt or PowerShell.

Java 11 is successfully installed and available system-wide to use

Step 2: installing Python 3

Go to Python’s website and download the Windows binary installation file for Python 3.9. Then execute the downloaded file to install Python 3. After the installation finishes, you should be able to start Python 3’s interactive shell by executing `python` in the Windows Command Prompt or PowerShell.

Step 3: downloading and configuring Spark 3

First, create a folder named `BigData` in `C directory` of your Windows. Go to Apache Spark’s website and download Spark 3’s binary files (for Hadoop 2.7) to the path `C:\BigData` in your computer. Then extract the downloaded file in this directory (I mean in `C:\BigData`). After this step content of the BigData folder should look like this:

At the time of writing of this tutorial the newest stable version of Apache Spark was Spark 3.0.1

Now rename the spark-3.* folder to spark3. Then go to https://github.com/cdarlint/winutils and download the files in this git repository as a Zip file using the green `Code` button on the top right corner. (By default the downloaded file will be named `winutils-master.zip`.)

Downloading `winutils` binary files from https://github.com/cdarlint/winutils

Move the file you downloaded to path C:\BigData and extract it there.

If everything went correctly so far, you must have two folders named `spark3` and `winutils-master` in `C:\BigData`

Now you must set a few environment variables in Windows 10. To do so, type `edit the system environment variables` in the Windows search bar (on the bottom left corner) and press the ENTER. You must be able to see the `System Properties window` now (see the picture below).

You can add, remove, and edit the environment variables in Windows by pressing the environment variable button at the bottom of the `System Properties window`

Press the button with the label `Environment Variables` on the `System Properties window`. Now you must be able to see the `Environment Variables window` (see picture below).

You can add, remove, or edit environment variables for your user (or for all the users) from the `Environment Variables window`

Now press the New button on the top (`User variables`) and add the following environment variables.

Setting the `HADOOP_HOME` environment variable to `C:\BigData\winutils-master\hadoop-2.7.7`

Setting the `SPARK_HOME` environment variable to `C:\BigData\spark3`

Now select the `Path` environment variable in the top panel (or list) and press the `Edit button`.

Editing the Path environment variable for the user (the top panel)

Now add `%HADOOP_HOME%\bin` and `%SPARK_HOME%\bin` to the `Path` environment variable.

`%HADOOP_HOME%\bin` and `%SPARK_HOME%\bin` are added to the `Path`. (You can add new values to the `Path` using the `New` button.)

Now, open PowerShell and write spark-shell in it and press the ENTER. Wait until you enter the Spark’s interactive shell environment, then open http://127.0.0.1:4040 in your web browser.

Checking Spark installation by running spark-shell in Windows PowerShell

Apache Spark 3's shell Application UI; here you can monitor your Spark services and tasks

Step 4: installing IntelliJ IDE with Python plugin

Go to IntelliJ IDE’s website and download the IntelliJ IDE. You can download and use either the free community edition or the ultimate edition of the IDE. (Students can get a free license to use the ultimate edition of the IDE; they only need a university email to get a free license.) After the installation finished, open the settings window under the file tab and go to the plugins sub-menu. Make sure the Python plugin for IntelliJ is installed.

Ensure the Python plugin is installed; you can search it under the path `File/Settings/Plugins`

Step 5: Writing and executing a Hello World Spark application

In IntelliJ IDE create a new Python project (go to `File/New/Project`). And select Python 3.9 which you have already installed in the first step of this tutorial as the `Project SDK`. Then press the next button (then press the next again).

Select Python 3.9 you installed in the step one of this tutorial as the Project SDK to be used for the project

Now, pick a name for the project and a path to save the project files and press the finish button. (I chose the name PySparkHelloWork for my project and saved it in my Documents directory under a folder name IntelliJ, as you can see in the picture.)

Open the Terminal window in the button left corner of the project’s main window. And execute `pip install pyspark findspark` in it. Wait until `pip` finishes installing these three packages.

Now create a folder named `src` in the project and create a new Python script called main (or `main.py`) inside the `src` folder. Then copy the Python code you see in the box below into the `main.py` file and save it.

A HelloWorld Spark application in Python (original source: https://gist.github.com/dvainrub/b6178dc0e976e56abe9caa9b72f73d4a)

Run (or execute) the `main.py` script. You now should be able to see the results in the output.

The output of the HelloWorld Spark application after a successful execution

Congratulations! You developed your fairs Spark application in Python. Happy making (more!) Spark applications. :0)

Writing a HelloWorld Spark application with IntelliJ IDE and Python 3 in Windows 10

Introduction and context

Step 1: installing Java SE Development Kit 11

Step 2: installing Python 3

Step 3: downloading and configuring Spark 3

Step 4: installing IntelliJ IDE with Python plugin

Step 5: Writing and executing a Hello World Spark application

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Hassan Abedi

No responses yet