Deploy SSDT INSERTS in Batches
I have added my first actual deployment contributor to my deployment contributor project:
https://github.com/DacFxDeploymentContributors/Contributors
The idea is basically for people to share ideas about writing deployment contributors for SSDT and help each other. If you don’t know what a deployment contributor is then you are missing a real treat. A deployment contributor lets you modify the t-sql that SSDT creates to upgrade a database from one version to the next.
What is the contributor?
In SSDT sometimes you get a table migration which is where you want to do something like add a column in the middle of a table, instead of SSDT just adding it at the end it creates a new table, inserts the data from the old table then drops the old table and renames the new table to match the old ones name.
I thought it would be interesting to write a contributor that searched for these table migrations and changed the insert portion into a batched insert that does the insert in batches so your transaction log growth is manageable on very large tables.
Why bother?
I think there is a use for this, I do not think that it is something that I would lose any sleep over if I did not have, it was more to see how to do it and give an example of writing a deployment contributor that did something quite advanced and show a few things like how I set up integration tests to make testing easy as unit / integration testing deployment contributors is quite hard if you are new to it. I do not know if what I have done is the best way but it works for me but I would love to hear suggestions on how to improve things so please either send some pointers or a pull request.
Solution overview
The first thing is to look at the solution in visual Studio, as you can see from the screen shot I have three projects:
AgileSqlClub.BatchedTableMigration - C# Dll Project
TestDacpacDeploy - SSDT Project, this will create the dacpac I will deploy from
IntegrationTests - C# Dll Project with NUnit via nuget
IntegrationTests
I’ll start with the integration tests as these are probably the most useful part of this whole thing. I have created a folder called “sqlpackage” and added the dlls and exe needed to run sqlpackage.exe. This at first seems like a bit of an odd thing to do but it makes testing easier, I add the files and set the msbuild property “Copy to Output Directory” to “Copy if newer” - this means in my tests I can copy the deployment contributor into .\sqlpackage and then run .\sqlpackage\sqlpackage.exe and know that my dll will be in the right place to be loaded for the tests.
If you don’t do this you will need to copy your dll into the program files directory which is bad because of a) permissions and b) you can only run a single build at a time (without using something like https://turbo.net!).
If you do this in your integration tests your life will be easier writing deployment contributors :)
I have also added two helper classes DacpacDeploy and Database. The Database class connects to a database and sets up the schema as I want it, it feels odd writing deploy scripts but it is only small. If I don’t do this when I run sqlpackage.exe to do the compare I won’t know what state it is in and might not get the table migration I need.
DacDeploy basically copies the dll that is built by the main c# project into the sqlpackage folder and runs sqlpackage.exe with the args needed to load the contributor and also passes the batch size to use as a parameter.
Finally I have the test itself which is pretty simple it uses DacpacDeploy to create a deployment script and then it checks to see whether the script has the new table migration script in it.
I should really add integration tests that actually run the deployment script and validate the data but it is just gone midnight so I think that is enough for now :)
AgileSqlClub.BatchedTableMigration
This is the actual contributor and there are a couple of things that help, in “DeploymentFilter.cs” you need to have an attribute that declares that you have a deployment contributor:
[ExportDeploymentPlanModifier(Name, Version)] public class DeploymentFilter : DeploymentPlanModifier { private const string Name = "AgileSqlClub.BatchedTableMigration"; private const string Version = "0.1.0.0"; ...
I use fields for the name and version as there is a way to output messages from your contributor and I like to make it clear where the message has come from so I re-use these fields for that.
The next part is the method that is actually called when a deployment plan has been generated which we can parse and modify if we like (we like):
protected override void OnExecute(DeploymentPlanContributorContext context) { if (context.Arguments.ContainsKey("EasyDebugBatchedTableMigration")) { MessageBox.Show("Breaking to let you attach a debugger"); } ....
This is really useful for debugging contributors, I have a parameter you can pass in which just shows a message box. You can run sqlpackage.exe and set this message box and when it appears attach visual studio, set your breakpoints and then press OK and you can step through your code. There are other ways to get to step into your code when running outside of a process you control but this always seems the simplest to me.
The “context” argument that is passed in is really useful it gives you access to all the options that were set for the comparison as well as the source and target models and any arguments that were passed in.
I then iterate through all the steps in the plan. The steps are typed as DeploymentStep but typically are other types that inherit from DeploymentStep so to find a table migration I just check whether the step is a SqlTableMigrationStep:
var next = context.PlanHandle.Head; while (next != null) { var current = next; next = current.Next;
if (current is SqlTableMigrationStep)
{
var batched = new MigrationStepBatcher(current as SqlTableMigrationStep, rowCount).BatchStep();
Remove(context.PlanHandle, current);
AddBefore(context.PlanHandle, next, batched);
}
}
The MigrationStepBatcher is what actually kicks off the process and Remove takes away the original table migration and AddBefore adds my new one - these methods are provided to us by inheriting from “DeploymentPlanModifier”.
The MigrationStepBatcher creates a DeploymentStep of our own called a BatchedSqlTableMigrationStep, to implement your own step you just need to inherit from DeploymentStep and override GenerateTSQL() which returns a list of strings which are each batch in your step. A step can contain one or more batches which are automatically separated for you using GO commands.
When we have a table migration to change into a batch there is a whole load of information we need like the table names, the column definitions etc so I ask the original table migration for it’s t-sql by calling GenerateTSQL on it and then parsing that with the ScriptDom - I then also use the ScriptDom to create the output t-sql.
Using the ScriptDom isn’t straight forward, it is quite complicated to know exactly what you need to do when so to help I wrote the ScriptDom visualizer which you give it some t-sql and it builds a tree of exactly what types of object the ScriptDom expects are which point and what properties you need to set - if you do anything with the ScriptDom try using this as it saved me a lot of time. If you do write stuff for the ScriptDom and you have something better, please share it!
The actual changes
The idea of this is to take this code that is generated by default in SSDT:
INSERT INTO [dbo].[tmp_ms_xx_ForcedTableMigration] SELECT [count] FROM [dbo].[ForcedTableMigration]
into:
BEGIN
WHILE (SELECT count()
FROM [dbo].[ForcedTableMigration]) > 0
BEGIN
WITH to_delete
AS (SELECT TOP 1480 [count]
FROM [dbo].[ForcedTableMigration])
DELETE to_delete
OUTPUT deleted. INTO [dbo].[tmp_ms_xx_ForcedTableMigration] ([count]);
END
Why not just use regex and a string builder?
You could certainly do that but if you have ever tried doing it that way you will know it is brittle. Do this sort of thing with the ScriptDom, it is more work up front but you can make changes easily and safely and the more people use and share stuff built using the ScriptDom the easier it will get for everyone.
The interesting thing here is the use of the OUTPUT INTO clause, without that I would have only really been able to do it to tables with a unique key as I could select a certain amount of rows but wouldn’t have been able to guarantee that I delete those same rows. Using DELETE and OUTPUT INTO means all I need is the column definitions - also deleting from a CTE deletes from the underlying tables, every day is a school day and one I was very pleased to attend today.
Finally
To use this contributor, build the project, take the dll and put it in your sqlpackage folder or wherever you need to for the tool you use to deploy and add these arguments:
/p:AdditionalDeploymentContributors=AgileSqlClub.BatchedTableMigration /p:AdditionalDeploymentContributorArguments=BatchedTableMigrationBatchSize=1480;
1480 is the batch size, you will probably want to make this bigger.
Enjoy, as always any questions please ask away :)
Comments:
Ken
November 4, 2015 - 19:17
Alternative to MessageBox.Show
Hi Ed,
Great post, I’m really enjoying reading about deployment contributors.
This comment is just an aside: an alternative to MessageBox.Show() is to use Debugger.Launch() as documented here https://msdn.microsoft.com/en-us/library/system.diagnostics.debugger.launch(v=vs.110).aspx the added advantage over using MessageBox.Show is that when Debugger.Launch() is hit a window comes up with a list of running instances of Visual Studio so you can attach directly to your open solution (you also have the option to start a new instance of Visual Studio).
Keep up the good work!
Ed Elliott
November 4, 2015 - 20:12
Thanks Ken, it was our tweets
Thanks Ken, it was our tweets recently that spurred me on so I should thank you!
I will add this when I get a chance.
ed