If you write code to SQL Server then you might be interested in this: I have written a tSQLt tdd training course which has helped over 500 people learn both tSQLt and how to apply TDD practices to their SQL Server T-SQL development, you can join the course at https://courses.agilesql.club.

Help! My Apache Spark configuration on windows isn't working (I think)

Steps needed

Getting Apache Spark running on windows involves:

  • Installing a JRE 8 (Java 1.8/OpenJDK 8)
  • Downloading and extracting SPARK and setting SPARK_HOME
  • Downloading winutils.exe and setting HADOOP_HOME
  • If using the dotnet driver also downloading the Microsoft.Spark.Worker and setting DOTNET_WORKER_DIR if you are going to use UDF’s
  • Making sure java and %SPARK_HOME%\bin are on your path

There are some pretty common mistakes people make (myself included!), most common I have seen recently have been having a semi-colon in JAVA_HOME/SPARK_HOME/HADOOP_HOME or having HADOOP_HOME not point to a directory with a bin folder which contains winutils.

To help, I have written a small powershell script that a) validates that the setup is correct and then b) runs one of the spark examples to prove that everything is setup correctly.

If you have a problem running spark on windows, run this script and it should tell you what is wrong:

https://raw.githubusercontent.com/GoEddie/dotnet-spark-examples/master/utils/checkconfig.ps1

If you get an issue that isn’t covered please let me know and I will add it in.

Subscribe

* indicates required

Please select all the ways you would like to hear from Agile Sql Club:

You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices here.