How to connect spark to ms sql server without '[Error] [JvmBridge] java.sql.SQLException: No suitable driver'?
“[Error] [JvmBridge] java.sql.SQLException: No suitable driver” - unable to connect spark to Microsoft SQL Server.
In spark when you want to connect to a database you use Read()
passing in the format “jdbc” and include the options of url, driver and either dbtable or query.
DataFrame dataFrame = spark.Read()
.Format("jdbc")
.Option("url","jdbc:sqlserver://localhost;databaseName=dName;")
.Option("user", "user_name")
.Option("password", "password or secret")
.Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.Option("dbtable", "schema.table_name")
.Load();
The url tells jdbc that we want to connect to sqlserver (jdbc:sqlserver) and then the details of the server to connect to. For spark to be able to find a driver for “sqlserver” you need to do two things, firstly you need to pass the jar to the driver to spark and secondly pass in the name of the driver that can implement a connection to “sqlserver”.
The JAR
You can download the JAR from: https://docs.microsoft.com/en-us/sql/connect/jdbc/download-microsoft-jdbc-driver-for-sql-server?view=sql-server-2017
or you can use maven:
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>7.4.1.jre8</version>
</dependency>
https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc/7.4.1.jre8
Currently (October 2019) you can only use a jre 8 version even though Microsoft also provides jre 11 versions.
When you download it, the jar comes in two flavours, Windows and Linux (mac). There isn’t any difference, it is just the Linux (mac) version is a tar/gz rather than an exe installer that copies the files somewhere. I use the tar/gz version even on windows.
If you extract the driver somewhere, the jar for spark we need is called “mssql-JDBC-7.0.0.jre8.jar”, and so when we start our spark job (spark-shell or spark-submit) we need to pass in the full path to the jar file:
--jars "file:/C:/spark/drivers%20here/sqljdbc_7.0/chs/mssql-jdbc-7.0.0.jre8.jar"
This means that spark can now look inside that jar and find the driver for " com.microsoft.sqlserver.jdbc.SQLServerDriver".
Windows Authentication
If you are on windows you can also get windows authentication to work if you don’t want to use a username and password with SQL authentication. To get it to work you need to do two things, firstly include in the url that you want to use integrated security and remove the username and password options:
DataFrame dataFrame = spark.Read()
.Format("jdbc")
.Option("url","jdbc:sqlserver://localhost;databaseName=dName; integratedSecurity=true;")
.Option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
.Option("dbtable", "schema.table_name")
.Load();
The JDBC driver ships with a dll called “sqljdbc_auth.dll”, the driver has a 32-bit or 64-bit version, and hopefully you are using the 64-bit version of the JRE but whichever version you have you need to make sure that the folder that contains the dll is on your path environment variable.
For example, my “sqljdbc_auth.dll” is in this folder:
“C:\spark\drivers here\sqljdbc_7.0\chs\auth\x64”
So my path statement when I start spark (spark-submit or spark-shell) has to have “C:\spark\drivers here\sqljdbc_7.0\chs\auth\x64” on it.
SET PATH=C:\spark\drivers here\sqljdbc_7.0\chs\auth\x64;%PATH%
Once that is all done I can connect using windows authentication, I am not sure how it would work on Mac or Linux so submitting to a cluster might be hard (impossible to use windows auth??), be prepared to use SQL auth if you have to - the databricks docs for connecting to SQL server use a username and password so I doubt it is possible, would love to be corrected if anyone knows better 😊