tSQLt Test Adapter for Visual Studio 2022+

I have had a number of requests for me to update the tSqlt Test Adapter over the years so it would work with more recent versions of Visual Studio. I finally got around to doing this and I am pleased to say that the new version works with VS 2022, 2025 and should continue to work in future releases.

Between Visual Studio 2017 and 2019 the changes required were minimal but between 2017 and 2022 the changes meant a rewrite of the test adapter visual studio integration parts which meant it wasn’t a simple change.

Github Copilot in SSMS can include data in its memory, simple prompt injections ahead

SQL Server Management Studio (SSMS) has recently added support for Github Copilot. This is a great feature that can help with writing SQL queries and scripts sql development. However, there is a potential security risk that means that I will only be using it for local development with a database that has no externally provided data.

When evaluating GitHub copilot I think there are three core risks:

  1. Destructive commands being run for me without me asking - I don’t want to be the first person whose copilot broke something.
  2. Sensitive/PII data being read and sent to GitHub or somewhere else.
  3. Prompts in the data causing malicious code to be generated.

The first one is most scary, if I ask a question like “I need to speed up this database quickly”, what is to stop copilot from deleting the data? That would fulfil the request quickly. I am told that the connection that copilot uses is read only so this cannot happen but there is no visibility of this. What I really need here is to be able to configure the connection so I can force the connection to be read only (likely using a lower privileged user).

Delta Lake over Spark Connect

All Spark Connect Posts

I have just finished an update for the spark connect dotnet lib that contains the DeltaTable implementation so that we can now use .NET to maintain delta tables, over and above what we get out of the box by using DataFrame.Write.Format("delta"), this is an example of how to use the delta api from .NET:

    var deltaTable = DeltaTable.ForPath(spark, deltaPath);
    deltaTable.History().Show(10, 1000);

    deltaTable.Update("id < 10", (Col("id"), Lit(0)));
    
    deltaTable.ToDF().Show(20);

    var source = spark.Range( 5).WithColumn("name", Lit("teddy")).Alias("src");
    deltaTable
        .As("tgt")
        .Merge(source, "src.id = tgt.id")
        .WithSchemaEvolution()
        .WhenNotMatchedInsertAll()
        .WhenMatchedUpdateAll()
        .Execute(spark);
    
    deltaTable.ToDF().Show(50);

    deltaTable.RestoreToVersion(1);
    
    deltaTable.ToDF().Show(50);
    deltaTable.Delete(Lit(true));
    deltaTable.ToDF().Show(50);
    
    deltaTable.Detail().Show();

    var newDeltaTable = DeltaTable
        .CreateOrReplace(spark)
        .TableName("deltatablefuntest")
        .AddColumn(
            new DeltaTableColumnBuilder("col_a")
                .DataType("int")
                .Build()
        )
        .AddColumn(
            new DeltaTableColumnBuilder("col_b")
                .DataType("int")
                .GeneratedAlwaysAs("1980")
                .Nullable(false)
                .Build()
        )
        .Execute();
    
    newDeltaTable.ToDF().Show();
    
    var sourceDataFrame = spark.CreateDataFrame((new List<(int, long)>()
    {
        (1, 1980), (2, 1980), (3, 1980), (4, 1980), (5, 1980), (6, 1980)
    }).Cast<ITuple>(), new StructType((List<StructField>) [new StructField("this_is_cola", new IntegerType(), false), new StructField("colb", new BigIntType(), false)])).Alias("source");

    sourceDataFrame.Show();
    newDeltaTable.As("target")
        .Merge(sourceDataFrame, "source.this_is_cola = target.col_a")
        .WhenMatchedUpdate(
            (Col("target.col_a"), Col("source.this_is_cola")), 
            (Col("target.col_b"), Col("source.colb")))   
        .WhenNotMatchedInsert(
            (Col("target.col_a"), Col("source.this_is_cola")), 
            (Col("target.col_b"), Col("source.colb")))
        .WithSchemaEvolution()
        .Execute(spark);
    
    newDeltaTable.ToDF().Show();
    
    newDeltaTable.Optimize().ExecuteCompaction();
    
    newDeltaTable.Vacuum();

There isn’t really too much to say, the only thing worth pointing out is how nice the delta extensions work with spark, it was really super easy to implement the functions once I had figured out the first one.

ADF: Publish suddenly includes everything where it used to be incremental changes since the last publish

I recently encountered an interesting issue with ADF where the publish feature suddenly attempted to republish every single object, claiming they were new, despite having incrementally published changed objects for some time.

We were using the publish feature where you work on a branch until you are happy, then you raise a PR to main, merge to main, and then switch back to ADF and click publish to push the changes to the adf_publish branch.

Spark Connect Dotnet November 2024 Where are we?

All Spark Connect Posts

Introduction

There have been quite a few changes in the last couple of months and I just wanted to give a quick update on the current state of the project. In terms of usage I am starting to hear from people using the library and submitting pr’s and requests so although usage is pretty low (which is expected from the fact that the Microsoft supported version usage wasn’t very high) it is growing which is interesting.

Spark Connect Dotnet Variant Data Type

All Spark Connect Posts

I recently published the latest version of the Spark Connect Dotnet library which includes support for the new Variant data type in Apache Spark 4.0 here. One of the new features of Spark 4.0 is the Variant data type which is a faster way of processing Json data (see here).

Sample data

For this post I used a copy of the sample data from Adobe (https://opensource.adobe.com/Spry/samples/data_region/JSONDataSetSample.html).

Creating the Variant Column

The first thing we need to do is get the data into spark, normally you would have a file and do a spark.read on he json file, in this example I will have a hard coded string and use CreateDataFrame:

Delta Lake from .NET in a Spark Connect gRPC world

UPDATE - I have implemented delta in spark-connect-dotnet

All Spark Connect Posts

Code

What to do?

At some point we will want to do something with delta lake and so I wanted to explore the options. Before we do that there is a little explaining to do about delta lake and Spark. There are two completely separate sides to this, the first is getting Spark to read and write in delta format and the second is performing operations on the factual files directly without using Spark, operations like Vaccum etc.

Apache Spark from PHP - it is not just a .NET thing

All Spark Connect Posts

Code

Goal of this post

I wanted to explore what the Spark Connect API looked like from other languages, I am not a php developer - I used it a long time ago and read up on some of the modern changes but apologies if I insult any php-ers! I will say that I quite like php.

Setup

The instructions are from https://grpc.io/docs/languages/php/quickstart/ with an extra step. There are a load of dependencies that we will need in the example repo so I thought the simplest way to get this running was to get the current php gRPC example running and then add my code as an extra example in-place and then use the existing dependencies to build and run this code.

Implementing functions and more fun in Spark Connect using gRPC and .NET

All Spark Connect Posts

Code

Goal of this post

The goal of this post is to look at creating a SparkSession and a DataFrame that will wrap the Range relation and then we will use the WithColumn function to add a column to the DataFrame and then we will use the Show function to show the DataFrame.

We won’t have a builder but we are moving towards:

var spark = SparkSession
    .Builder
    .Remote("http://localhost:15002")
    .GetOrCreate();

var dataFrame = spark.Range(1000);
dataFrame.Show();

var dataFrame2 = dataFrame
    .WithColumn("Hello",
        Lit("Hello From Spark, via Spark Connect"));

I’m pretty much going to leave the code as-is from the previous post but will move things about a bit and add a SparkSession and a DataFrame class. Also, instead of passing the session id and client around i’m going to wrap them in the SparkSession so that we can just pass a single object and also use it to construct the DataFrame so we don’t even have to worry about passing it around.

Moving towards the DataFrame API using the Spark Connect gRPC API in .NET

All Spark Connect Posts

Code

Goal of this post

So there are two goals of this post, the first is to take a look at Apache Arrow and how we can do things like show the output from DataFrame.Show, the second is to start to create objects that look more familiar to us, i.e. the DataFrame API.

I want to take it in small steps, I 100% know that this sort of syntax is possible: