Help! My Apache Spark configuration on windows isn't working (I think)

Steps needed

Getting Apache Spark running on windows involves:

  • Installing a JRE 8 (Java 1.8/OpenJDK 8)
  • Downloading and extracting SPARK and setting SPARK_HOME
  • Downloading winutils.exe and setting HADOOP_HOME
  • If using the dotnet driver also downloading the Microsoft.Spark.Worker and setting DOTNET_WORKER_DIR if you are going to use UDF’s
  • Making sure java and %SPARK_HOME%\bin are on your path

There are some pretty common mistakes people make (myself included!), most common I have seen recently have been having a semi-colon in JAVA_HOME/SPARK_HOME/HADOOP_HOME or having HADOOP_HOME not point to a directory with a bin folder which contains winutils.

To help, I have written a small powershell script that a) validates that the setup is correct and then b) runs one of the spark examples to prove that everything is setup correctly.

If you have a problem running spark on windows, run this script and it should tell you what is wrong:

https://raw.githubusercontent.com/GoEddie/dotnet-spark-examples/master/utils/checkconfig.ps1

If you get an issue that isn’t covered please let me know and I will add it in.