Help! My Apache Spark configuration on windows isn't working (I think)
Steps needed
Getting Apache Spark running on windows involves:
- Installing a JRE 8 (Java 1.8/OpenJDK 8)
- Downloading and extracting SPARK and setting SPARK_HOME
- Downloading winutils.exe and setting HADOOP_HOME
- If using the dotnet driver also downloading the Microsoft.Spark.Worker and setting DOTNET_WORKER_DIR if you are going to use UDF’s
- Making sure java and %SPARK_HOME%\bin are on your path
There are some pretty common mistakes people make (myself included!), most common I have seen recently have been having a semi-colon in JAVA_HOME/SPARK_HOME/HADOOP_HOME or having HADOOP_HOME not point to a directory with a bin folder which contains winutils.
To help, I have written a small powershell script that a) validates that the setup is correct and then b) runs one of the spark examples to prove that everything is setup correctly.
If you have a problem running spark on windows, run this script and it should tell you what is wrong:
https://raw.githubusercontent.com/GoEddie/dotnet-spark-examples/master/utils/checkconfig.ps1
If you get an issue that isn’t covered please let me know and I will add it in.