SQLServer

Learn how to unit test SQL Server T-SQL code

  • Posted on: 5 September 2017
  • By: Ed Elliott

A free email course on how to use tSQLt including the technical aspects of writing unit tests AND the art of writing repeatable, useful unit tests for even the most complicated T-SQL code

UPDATE: I thought that if I got a certain number by October 1st I would run the course but in two days I had three times the amount of people subscribe than my initial target so I have closed the first course, sign up if you want to join the wait list or the next course!

Unit testing helps us to write better code, make rapid changes to our code and has been generally seen as a good idea for about 10 years. Writing tests for T-SQL code is made much easier by using tSQLt but there is quite a high barrier to entry both in terms of the technical skills in getting tSQLt running and also how to approach large code bases of, sometimes, unfriendly T-SQL code and taming the code with unit tests.

I have successfully unit tested T-SQL code in a number of different environments including clean greenfield environments as well as legacy projects and I have written this course to help people get started with unit testing but also help them to turn unit testing into a part of their development process that they can use everyday to improve the quality of their work and the speed at which deployments can be made.

Are you any of these people?

  • An application developer experienced with other testing frameworks for testing application code?
  • A T-SQL developer with no testing experience?
  • A -TSQL developer with testing experience in other languages or frameworks?

If you are then you should sign up (https://www.getdrip.com/forms/317903840/submissions/new) and let me help you learn tSQLt unit testing for SQL Server.

Why an email course?

I thought it would be an interesting way to provide actionable information regularly and to allow a level of assistance and feedback that I don't think is possible with blogging or writing articles.

How do I sign up?

Run over to: https://www.getdrip.com/forms/317903840/submissions/new and pop in your details.

The course is going to start on the 1st of October and as it is the first one I am limiting the amount of people who can start it. If the first one is a success then I will run it again but it won't be until at least 2018.

What will be the format?

The course will be one email a week which will include an overview of the weeks topic, some detail into the parts that need it and an excercise for the week which can be done on a demo database or any SQL Server database code you have.

and it is free?

yep, gratis. I am not open sourcing yet - maybe in the future but the course itself is free, aka "no service charge"

SSIS ForEach Enumerator File Order

  • Posted on: 4 September 2017
  • By: Ed Elliott

I saw on slack recently a question about how the ssis file enumeraror orders (or more specifically doesn't order) files. I have been thinking about ssis quite a lot lately and whil I am in no hurry to start using it day to day it is quite an interesting tool.

So anyway, I saw this question that went like:

"does anyone know in what order files are processed in 'Foreach File Enumerator'?
I used to think it was alphabetically but after some testing this is not always the case?
Second part is there anyway to specify the order by say size or date?"

So how does SSIS order files or doesn't order files?

The answer to this is pretty simple and I thouhgt I knew the answer but wanted to confirm it. In my mind I thought, "how do they get a directory listing?", and my mind responses "probably using the win32 api's find file etc", my mind then wondered somewhere else before writing a quick package that:

  • 1. Has a ForEach loop and a breakpoint set at pre-execute
  • 2. Has a single task in the ForEach loop and a breakpoint set at pre-execute
  • 3. A variable to hold the file name

Pretty simple, the ssis package looked like:

Pretty simple hey :)

I set the file path for the enumerator to c:\ssisSearch and put a load of files and directories in (because the win32 find functions have a buffer and you need to call it multiple times - I wanted to make sure we covered cases where there were multipl find calls). Then I reached for my favorite tool of all procmon.exe (I say favorite, it used to be then I had a job where I used it literally every single days for hours and hated it so stopped using it but now i'm back with it!). In procmon I set a filter on the c:\cssisSearch folder and also DtsDebugHost.exe and ran my package - the files were returned in alphabetical order.

I then went into procmon and to the properties of the "QueryDirectory" operation on that folder and (when the symbols had loaded) I could see that the call ssis was making was from the ForEachFileEnumerator.dll (native not .net so we can't grab reflector) and that calls "FindFirstFileW".

A quick hop skip and jump to msdn and FindFirstFile discusses the ordering of files here:

The FindFirstFile function opens a search handle and returns information about the first file that the file system finds with a name that matches the specified pattern. This may or may not be the first file or directory that appears in a directory-listing application (such as the dir command) when given the same file name string pattern. This is because FindFirstFile does no sorting of the search results. For additional information, see FindNextFile.

FindNextFile has this:

The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. If the data must be sorted, the application must do the ordering after obtaining all the results.

So basically ntfs is alphabetical, fat date but don't rely on either.

Just a final thought, ssis runs on linux so no idea the order there :)

TSQL Tuesday - Databases and DevOps

  • Posted on: 13 June 2017
  • By: Ed Elliott

DevOps isn't running SQL Server in a container and pushing code to it from Jenkins

When we talk about DevOps we envision that we have the ability to check-in code, spin up a new environment, deploy, test and push that code to production and be in the pub at 4.

We know that by having the right tooling in place we can make releases more reliable and more frequent enabling us to deploy the changes that the business want when they want them rather than every x days/weeks/months/years/decades. This outcome is best for everyone, no one loses and the path to fun and profit is that, fun and profitable.

So what do we need to do, run SQL Server in containers and write and deploy our code using SSDT? Yes do it, but you don't need to you can do DevOps and work on doing frequent releases with a standard sql server instance and manually written deploy scripts that are emailed around.

So what is DevOps if you can do it without source control?

DevOps is about enabling frequent releases - that is the core element of it and to enable frequent releases we need:

  • A way to deploy code (a DBA wearing out the F5 key in SSMS is a way to deploy code)
  • A way to be confident about the changes we are about to make (hint tests, lots of them)
  • A way to know when there is a problem with production (monitoring and alerting)
  • The ability to identify bottlenecks, work together and make improvements to the process

The last point is most important, for me it stems from kanban and the kaizen approach of identifying bottlenecks and working together to remove the bottlenecks.

If you look at your existing approach to making changes what are your bottlenecks? How can these be improved? When you deploy changes and they go wrong what stopped you finding out about those problems earlier? When you look at the different stages of a change from business analysis to troubleshooting issues reported by customers, how many of those and how much time and money could have been saved by not having that issue or by identifying it in developer tests or when it was rolled out rather than when the user complained about it.

If you truly start looking at bottlenecks in your entire release process it will more than likely lead you to an end position of a DevOps culture and practices including the tools required to do it but without the underlying kaizen approach, to continually remove bottlenecks in your processes, you will simply pay for tooling you don't need and covering your laptop with stickers but not deliver the value that the business needs.

Which one of these are you?

or

SSDT: Unable to connect to master or target server.

  • Posted on: 12 June 2017
  • By: Ed Elliott

error in sssdt: Unable to connect to master or target server. the server displayed isn't the server but the database

Every now and then I come across this error in SSDT, normally when trying to publish and the odd thing is that the description never matches the actual cause (if you desperate for the cause it is because you can't connect). The thing I like about the description is the way it tries to tell you what is wrong and what server you are connecting to but it fails at both and instead tells you about an unrelated error and a database name instead of a server name.

What we have is:

"Unable to connect to master or target server '{0}'. You must have a user with the same password in master or target server '{0}'.\"

Twice it tries to tell you the server name but both time it actually tells you the database name. I thought I would dig into it a little with reflector to try and see where the error is coming from and whether or not it would ever show the servername. So in reflector I found the error and found where it was used. If we look in, what is surely to be everyone's favourite ssdt dll Microsoft.Data.Tools.Schema.Sql.dll, we can see that the error message is used in SqlDeploymentEndpointServer.OnInit and we have something like:


catch (ExtensibilityException exception)
{
Tracer.TraceException(TraceEventType.Verbose, TraceId.CoreServices, exception, "Error loading DSP families");
throw new DeploymentFailedException(string.Format(CultureInfo.CurrentCulture, DeploymentResources.InvalidServerEndpoint, new object[] { targetDBName }), exception);
}

they are indeed passing targetDBName into the InvalidServerEndpoint error message so yes indeed the error message will only ever show the database name.

I had a quick look at what can cause this and it is something to do with opening the SqlConnection which is wrapped in lots of retry logic that is different for Azure compared to other types of SQL - lots of interesting stuff maybe for another post but basically SSDT wasn't able to open a connection - check the server name, ports, database etc (i.e. maybe your default database is not available to that user), connect via ssms then when it works there come back to ssdt.

Footnote, I did think about raising a connect but then couldn't be bothered, if anyone does i'll be happy to vote for it!

SSDT Dev in Visual Studio Code

  • Posted on: 27 April 2017
  • By: Ed Elliott

I have been quite interested by vs code and have been using it more and more recently. I use it for all my GO (#golang FTW) work and also powershell and I have been toying with the sql tools team's sql extension which is great. For a long time I have thought about bringing the SSDT experience to other IDE's like Jetbrains IntelliJ but because I have been using vscode quite a lot recently and separately I have been doing more and more javascript and typescript I thought it would be interesting to see how hard it would be to write a vscode extension that lets me build dacpac's.

The general goals of this are not to re-created the ssdt experience in visual studio but to provide a lighter, faster way of developing sql code, if I can have an extension that:

  • is fast
  • is light weight - memory is important for dev machines and 2 gb for a large db project is limiting
  • gives us the important things like pre,post deploy scripts, refactoring and the ability to generate dacpac

I am not really interested in providing ui's like the schema compare - for that use SSDT or spend some money on the Redgate tools.

I am also not interested in replacing what the sql tools team are doing, I am happy to leave them to do the harder, important but less interesting things to me like t-sql formatting so with that in mind I have started a new project that is very hacky at the moment, more an experiment to see if it will work but a vs code extension that builds dacpacs:

https://github.com/GoEddie/vscode-ssdt/

This is basically just a wrapper around the DacFx so there shouldn't be anything too hard and also because it is windows only for now (until the DacFx is cross platform it will only ever be windows, but I hold out hope for cross platform DacFx one day!).

This works similarly to the sql tools team extension in that there is a .net app that is called by the vs code extension (typescript running on node.js) so if you wanted to try this, download the repo, run the SSDTWrap exe (not under a debugger or you will face t-sql parsing first chance exception performance hell). Then in vs code open the folder "src\s2" and the "extension.ts" file and F5 - this will open a new vs code window - open a folder with your .sql files and it will create a t-sql model and report any errors.

If you do ctrl+shift+p to open the command pallette and then do "build dacpac" it will generate a dacpac for you from the sql files. You will need to put this ssdt.json file in the root of the directory you open in vscode:


{
"outputFile": "c:\\dev\\abc.dacpac",
"SqlServerVersion": "Sql130",
"references":[
{
"type": "same",
"path": "C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\Common7\\IDE\\Extensions\\Microsoft\\SQLDB\\Extensions\\SqlServer\\140\\SQLSchemas\\master.dacpac"
}
],
"PreScript": "",
"PostScript": "",
"RefactorLog": "",
"ignoreFilter": "*script*",
"PublishProfilePath": "C:\\dev\\Nested2\\Nested2.publish.xml"
}

It doesn't really support a lot of things at the moment but I will add in the support needed to build a dacpac including refactorlog.xml, pre/post deploy scripts and references that we all know and love.

I tested with a folder of 2000 procedures, I tried testing 10,000 but I couldn't get ssdt to load them into a project without crashing (on a top of the range i7, 32gb ssd etc laptop) - in the end I settled for 2000 procs and to build the dacpac the times were:

App time (milliseconds)
Visual Studio / SSDT 5630
VS Code 2051

so as well as handling larger projects it is faster as well, a small project (one proc/table) was about 17 seconds to build the dacpac.

Anyway, it is all a bit of fun and pretty hacky at the moment but I like using vs code anyway and am finding it much more light weight than visual studio so will likely invest some more time in it.

If you feel like trying it out, good luck :)

My SQL Server Development Team Maturity Levels

  • Posted on: 24 April 2017
  • By: Ed Elliott

A teams maturity shows in its choice of tools.

I have seen quite a few different development teams in wildly different environments and the single fact that really stands out is that you can tell how good a team is by the tools that they use. It isn't always the specific choice of which tools they use, although that can be important, it is the fact that they evaluate and chose to either use or ignore new tools.

This is basically my personal maturity model for sql server developers, I think it is quite important because it is a measure of effectiveness of a team. It should be pointed out that some teams have no need to be effective whereas other teams are vital to how an organisation runs.

If we take a few examples, the first shows where a team has no need to be mature at all:

Team one, no one cares

Team one supports a vendor supplied application and the database is SQL Server, the vendor supplies an upgrade script to run every 3 months and the team is not allowed to make any changes to the application. In this scenario, there isn't really any benefit in the customer having the database in source control - any problems are dealt with by the vendor and any scripts can equally be run in SSMS or sqlcmd.exe or a custom application. Even though the application is critical, the vendor supplies all support for the application and the team just need to keep it updated.

The second one, is a team that is important.

Team two, everyone cares

Team two develop and support an in-house electronic medical record application. It was first written in the late 90's and has evolved to a include a number of SQL Server databases and .net services. All the code from the .net services, the databases and the tooling that the developers use is critical and hacking together a release is not going to wash it when a nurse might not be sure if something needs to be adminstered to a patient or not.

Team three, not critical but the team care

Team three develop a website that generates revenue for the company. Revenue is the most important factor but the team has good people in it who care and want to do the right thing for the company, not at the loss of revenue but with the idea to increase revenue and deployments.

Maturity Levels

OK so this is pretty simple, we have these levels:

  • Low
  • Medium
  • High

Wow. Just WOW

That is an amazing list, how did you come up with it? Did it come from some phd study on the effectiveness of lists in the internet age? No.

So a little more detail...

Low

The low maturity team is one that uses basic tools or no tools at all. They probably do all their development in SSMS without any of the refactoring or intellisense helpers that are available. If they do use source control then their database is not likly deployable in any form just purely from source control.

This could well be perfect for team one, there is literally no need for them to spend any time on anything that other teams might require as an absolute basic. There is nothing wrong with this if the development and deployment is genuinely not important.

If you have a database and you don't care about it then the best maturity level for you is low. Don't feel bad about it, it is awesome in its own way and we love awesome.

Medium

This is where it starts to get a little bit exciting, the team probably has some tooling - they might have sql prompt for ssms or use another ide like jetbrains datagrip. For deployments they could use a tool such as redgate readyroll or perhaps they have written their own deployment tool.

Why is ready roll in the Medium maturity level?

The main reason is that in a medium maturity team deployments, both creating and managaging them are critical to how the team develops and deploys changes. Readyroll helps teams to generate deployments, it is a tool to manage changes and deployments.

Thinking about deployments and having deploying changes is not necessarily a bad thing - if you are not team one then you should absolutely be thinking about deployments and if you did have a low maturity it is a great step to getting more mature in your SQL Server development.

Why is datagrip in the Medium maturity level?

Ok, since you ask - if it was another database then I would put it in the advanced section but the tooling for SQL is great so unfortuntly it goes into Medium. If there was a genuine reason why a team used it I would be tempted to throw them a High maturity level bone but no guarantees.

High

For a high maturity team I want to see good IDE's being used - so SSDT or datagrip or something similar. It doesn't have to be SSDT but if not then there needs to be a valid reason why. I also want to see code checked in (not changes checked in) and then the code being built, tested and deployed either straight to production or possibly prepared to be deployed later. If it is to be deployed later I want to see a plan in place that will get them to continuous deployment.

Where do teams two and three fit?

I would hope between Medium and High - if not you have to ask questions as to why not. It is 2017 and there are pleanty of resources available out there.

Show me a chart

This is really begging for a cool matrix helping people show what all the criteria are (there must be more of course) and where they fit, but hey, this is a low-medium maurity blog so maybe one day in the future.

SQL Server Continuous Deployment "In a Box"

  • Posted on: 7 March 2017
  • By: Ed Elliott

What is this?

Well if you read the name aloud "SQL Server Continuous Deployment in a box" then, if I have done my work correctly choosing the title for the blog, give a hint :)

what is the big idea?

There is really some great tooling for SQL Server - second to none really when it comes to RDBMS's and setting up Continuous Deployment pipelines is actually pretty simple once you know which parts to plug together. I even did it in 55 minutes once (https://www.youtube.com/watch?v=9YJQTx3bPek).

What I wanted to do was to allow people to spin up a vm, install ssdt (or visual studio with ssdt), install a local dev instance of SQL Server (or use localdb), run a script and add some parameters and have everything they need to be able to make changes to some code and have that code automatically deployed to a production database.

Now a little word about that word "production", you could set this up to point to your production database but what I would suggest for the demo is that you use a copy of your production database or something which you will call "production" - the tools here can all be used in a real life setup but you wouldn't normally host everything on your development machine.

How does it work?

The idea is that anyone who can download from the internet can do this, so setup the pre-requisites (ssdt and sql) and then either clone the repo (https://github.com/GoEddie/SQLServer-Continuous-Deployment-In-A-Box/) or download the latest zip from:

https://github.com/GoEddie/SQLServer-Continuous-Deployment-In-A-Box/arch...

Note to get the zip, you don't need to use git or sign up for an account or anything (other than clicking the link)

Once you get that then extract the folder, open powershell as an administrator, change to the src folder and run these two little commands:

Unblock-File *.ps1
.\ContinuousDeploymentFTW.ps1

What does this do?

Unblock-File *.ps1 - removes a flag that windows puts on files to stop them being run if they have been downloaded over the internet.
.\ContinuousDeploymentFTW.ps1 - runs the install script which actually:

  • Downloads chocolatey
  • Installs git
  • Installs Jenkins 2
  • Guides you how to configure Jenkins
  • Creates a local git repo
  • Creates a SSDT project which is configured with a test project and ssdt and all the references that normally cause people problems
  • Creates a local Jenkins build which monitors your local git repo for changes
  • When code is checked into the repo, the Jenkins job jumps into action and...

If you check into the default branch "master" then Jenkins:

  • Builds the SSDT project
  • Deploys the project to the unit test database
  • Runs the tSQLt unit tests
  • Generates a deployment script for the "production" database

and what you have there is continuous delivery in a box, now I know that isn't what you were sold by the title but I wanted to show a couple of different approaches to this so if you use git to create a release branch and check-in on it by changing to the directory with the SSDT project in powershell and doing:

git checkout -b release

Make a change and then...

git add .

git commit -m "a change that will go straight to production \o/"

You will see that a "Release" jenkins build is created automatically because the job we set up initially is a "Jenkins Multi-branch pipeline" - don't worry about that but what you see is Jenins:

  • Builds the SSDT project
  • Deploys the project to the unit test database
  • Runs the tSQLt unit tests
  • Deploys the SSDT project to the "production" database

Nice hey?

Why the choice of technology?

SSDT - this doesn't need ssdtm you could do this with readyroll, dbup etc anything
Git - aren't most people moving to git nowadays?
Jenkins 2 - for the multi-branch pipelines which means it automatically creates builds from the Jenkinsfile which is checked into source control

Sounds hard to setup?

It isn't all you need to do is configure Jenkins, create a user and give my script the username and token and also the connection details to the unit test and production databases. If you like when you get the SSDT project you can import from your production database which will then be deployed to your unit test database, or you can leave it empty, or add a couple of objects - whatever suits you!

Prerequisites

I would create a VM, install SSDT or Visual Studio with ssdt in (2015 or 2017), install a local SQL Server 2008+ and restore a copy of your production database that should be it.

I made a video to show the awesoness of all of this:

https://the.agilesql.club/assets/videos/SQLCDINABOX.mp4

I made the video to see how much fun it was to make videos, it was very fun but this will be the only one ;)

Enjoy and good luck!

ScriptDom parsing and NoViableAltExceptions

  • Posted on: 2 March 2017
  • By: Ed Elliott

If you have ever tried to debug a program that used the TSql Script Dom to parse some T-SQL you will know that the process is extremely slow and this is due to the volume of NoViableAltExceptions (and others) that are thrown and then caught. Because these are first chance exceptions they are being handled and it is the way that the script dom interacts with Antlr and the Lexer that they use. When you debug a program what happens is you have two processes, process one is the debuger, this starts (or attaches) to process two, the debugee.

The debugger calls a windows function WaitForDebugEvent typically in a "while(true)" loop (everyone should write a windows debugger at some point in their lives you learn so much, in fact put down ssms and go write your first debugger loop: https://msdn.microsoft.com/en-us/library/windows/desktop/ms681675(v=vs.85).aspx). The debugee app is then run and when something interesting like an exception or a dll is loaded/unloaded the debuggee is paused (i.e. all threads stopped), then WaitForDebugEvent returns and the debugger can look at the child process and either do something or call WaitForDebugEvent again. Even if the debugger doesn't care about the exceptions the debugee is still paused and when you parse T-SQL under a debugger, even if you tell Visual Studio to ignore the exceptions, the debugee is still paused for every exception just so Visual Studio (or your home baked debugger) can decide that it wants to ignore that exception and the debugee is started up again.

What this means for an app that throws lots of first chance exceptions is a constant start, stop, start, stop which is so so painful for performance - it is basically impossible to debug a TSql Script Dom parse on a large project, I typically debug a project with like one table and one proc and hope it gives me everything I need or do other tricks like letting the parsing happen without a debugger attached then attach a debugger at the right point after the parsing has happened but then again I don't have to debug the TSql Lexer's!

So where is this leading?

I was wondering what effect these first chance exceptions had on T-SQL and even in normal operations where we don't have a debugger attached, is there something we can do to speed up the processing?

The first thing I wanted to do was to try to reproduce a NoViableAltException, I kind of thought it would take me a few goes but actually the first statement I wrote caused one:

"select 1;"

This got me curious so I tried just:

"select 1"

Guess what? no NoViableAltException the second time - this didn't look good, should we remove all the semi-colon's from our code (spoiler no!).

Ok so we have a reproducable query that causes a first chance exception, what if we parse this like 1000 times and see the times and then another 1000 times with the semi-colon replaced with a space (so it is the same length)?

Guess what? The processing without the semi-colon took just over half the time of the queries with semi-colons, the average time to process a small query with a semi-colon in took 700ms and the query without the semi-colon took 420ms so much faster but who cares about 300 milli seconds? it is less than 1 second and really won't make much difference in the overall processing time to publish a dacpac.

I thought I would just have one more go at validating a real life(ish) database so I grabbed the world wide importers database and scriptied out the objects and broke it into batches, splitting on GO and either leaving semi-colons or removing all semi-colons - when I had semi-colons in the time it took to process was 620 ms and there were 2403 first chance exceptions. The second run without semi-colons which would likely create invalid sql in some cases - took 550 ms and there were still 1323 first chance exceptions, I think if we could get rid of all teh first chance exceptions the processing would be much faster but ho hum - to handle the first chance exceptions you just need a fast CPU and not to be a process that is being debugged.

SqlPackage Deploy Performance - IgnoreXX are not your friend!

  • Posted on: 2 March 2017
  • By: Ed Elliott

Following on from yesterdays blog I was wondering about the comparison of objects that were the same and how the IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements flags affected the comparison. To be fair I was only interested in IgnoreWhitespace but actually it turns out that those four are very closely related.

When the deploy happens, where a script in the source and target are compared the process is:

  • 1. Loads of things we will skip
  • 2. Any cmd variables in the scripts are replaced with their appropriate values
  • 3. If both the scripts are null - the comparison returns true. This has to be the best for performance but the worse for functionality ;)
  • 4. If one script is null but not the other then the comparison returns false. This actually has to be the best for comparison performance but worse for deploy performance!
  • 5. We then get a good old fashioned String.Equals, the standard .net compare goes: if both strings are not null and the lengths are the same do a check on each byte in the strings
  • 6. If the strings are equal we have a match, happy days no more action required

It is what happens if the strings do not match that it starts to get a bit more interesting, if the strings are not equal and any of those four ignore options are True then we then fall down into doing a further comparison but after the scripts have been normalized using the script dom and antlr which is an expensive operation in itself (this also happens to be my next topic!).

Once the normalization has been done we end up in the actual compare which goes like this:

  • 1. Turn the script into a stream of tokens
  • 2. If the token is a comment and ignore comments is set, skip it
  • 3. If the token is whitespace and ignore whitespace is set, skip it
  • 4. If the token is a semi-colon and ignore semi-colon's is set, skip it
  • 5. Then compare the tokens which itself does things like use IgnoreKeywordCasing and removes quotes around quoted identifiers - it isn't a straightforward String.Equals
  • 6. If any of the tokens don't match then it is a failure and the script needs to be changed

So what?

So blunt. Anyway, basically what this means is that the default options in the sqlpackage.exe are set to allow things like different cased keywords and whitespace and to allow that we end up taking longer to do deployments where we actually make use of the default features.

huh?

If you have a database with lots of code and you have the code in SSDT but you do things like change the comments when you deploy and rely on IgnoreComments (this is a real life secenario I have seen, someone adding a custom comment header) then you will have slower deployments and as slower deployments are the opposite of what we want you should:

  • Have the same code in your database as you have in your project
  • Have the same code, including the same case of keywords, comments and whitespace in your database that you have in your project
  • Disable the defaults and set IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements all to false

What effect does it have?

If your database a project code are exactly the same, then no effect you neither gain nor lose anything.

If your database and code are different by comments, case, semi-colons etc and you have lots of files that are different then you will gain quite a bit. On my machine here I created a database with 1,000 stored procedures like "select '----'" (I used replicate to make it large) I then imported the procs into SSDT and added a space between the select and the text and did a deploy using sqlpackage (in fact I did a few to get an average time), with the default IgnoreWhitespace=true the deploy took about 24 seconds (remember this is on a machine following yesterdays recommendations. lots of memory, fast CPU and SSD) - when I removed the defaults and set them to false - firstly the deploy took 34 seconds because naturally it had to deploy the procs then re-running it took around 17 seconds - about 7 seconds from a 24 second deploy which i'll take.

The thing that you will really gain is that your project code and database will be the same which should really be the end goal, if you can honestly say that you have to have:

  • Different whitespace
  • Different keyword casing
  • Different semi-colons
  • Different comments

I would be inclined to find out why as it sounds like an interesting project :)

SSDT Deploy / Publish Performance

  • Posted on: 1 March 2017
  • By: Ed Elliott

Publishing dacpac's is a little bit of a pain when you have multiple databases, it can easily start to take minutes to hours to deploy changes depending on how many databases and the size of those databases. I wanted to understand more about the publish process and what we can do to speed it up as much as possible so I did some digging and this is a randomish post about how it all works and what we can do to make it faster.

Process Overview

Roughly speaking the process for deploying dacpacs is:

  • 1. Open dacpac (zip file)
  • 2. Open database
  • 3. Compare dacpac to database
  • 4. Come up with a list of changes
  • 5. Generate the T-SQL for the changes
  • 6. Run the deploy script

1. Open the dacpac (zip file)

The dacpac contains the xml representation of the model of the database and the original source code. To be able to deploy changes we need the original source code - the model contains the definitions of the objects such as which columns the table has and the parameters a stored procedure has but not the body of the stored procedure including comments.

The dacpac is a zip file, this means to read from it the deployment must create a file handle to the dacpac and uncompress anything it wants to read (or compress anything it wants to write), the larger the dacpac logically the longer it will take to decompress and also read from disk so there are our first two important things to note, the deploy will need:

  • CPU to decompress the contents of the dacpac
  • Fast disk as it has to read from disk

When you run the publish, by default the model is read from disk into memory, however, for very large dacpac's you might find that you don't have enough memory for it - a windows system with low memory is bad, two things happen:

  • The paging file is used
  • Applications are told to free memory

This is painful for an application and when you get paging, especially on non-ssd drives, you are going to find everything about the whole system slow.

If you have a large model in your dacpac and you don't have much memory then you will probably need to load the model from disk rather than into memory, but you best make sure you have an ssd to run it from!

For more details on how to do that see: https://redphoenix.me/2013/06/14/why-is-sqlpackage-using-all-the-build-s... (Richie Lee FTW!)

2. Open database

The database contains all sorts of goodies like the version that will be deployed to, the database settings and the definitions that will be compared to the dacpac to see what changes need to be made. If your SQL Server responds slowly then the publish will be affected by that so make sure your SQL Server box is up to scratch and expect slower deploy times on resource limited instances.

3. Compare dacpac to database

I had always wondered how they compared the two different objects so I bust out reflector and used it to have a poke about. What happens is a model is build of the source and target - a model is a representation of each object, it has its name, its properties and relationships to other objects for example a table has a relationship to its columns and each column has a relationship to its type (not a property of the type).

If you think about this it means, for each deploy we actually need to store the model of both the database and the dacpac so doubling the size requirements - this isn't to say an exact double of the size of the source code becauuse what it compares is things like the object properties and relationships but also the tokens that make up the body of the object, so there are a lot more objects that are created around the code. In short if your source code is 100 mb you will need some multiple of 2 * 100mb to upgrade an existing database - I am not sure if there are any figures to show what that multiple is but if you do a publish and you run low on memory or you get lots of garbage collection in your deploy then consider either storing the model on disk or throwing in some more memory.

There are a couple of interesting extra things that happen, in particular to do with the size of code in objects and how much to store in memory, to store in memory compressed or to write to disk. If for example you had a stored proc that was less that 500 characters long, the publish will store it in memory.

If the string representation of a procedure was over 500 characters but less than 16 thousand characters the it will be stored in memory but compressed first.

Finally if the stored proc (I am using procs as a example but I think it is any code unit minus the create + name so everything after "as") is over 16 thousand characters then a temp file is generated in your user temp folder and the contents are written to that.

So three different behaviours depending on the size of your objects:

  • Under 500 chars - you need memory
  • Over 500 chars, under 16,000 you need memory and CPU
  • Over 16,000 chars you need memory and fast disk for your temp folder

I created three demo databases and deployed them over and over, the first contains a couple of hundred procedures with slightly under 500 characters, the second with slightly over 500 characters and the last with slightly over 16,000 characters.

When I ran the publish this is what they looked like:

Under 500 char procs:


under 500 chars procs, fastest to deploy

Slightly over 500 char procs:

just over 500 chars procs, similar to under 500 chars but includes some time to compress some strings

What we can see here is that the under 500 char procs, takes the least amount of time but the interesting thing is the database with slightly over 500 chars procs has a little bit of extra umpf from the CPU - I have showed this by using the pink circle - both deploys follow the same pattern for sqlpackage.exe CPU usage - it goes from around 90% cpu up to 100% and then drops down sharply to around 80% and then back up to 90% but the second project with the slightly larger procs - also has an extra splurge of CPU which I am putting down to the extra compression that those strings are getting - it certainly tallies up, even if it is not scientific :)

Aside from the extra CPU usage they are pretty similar, the black highlighted line is garbage collection and the way this counter works is every time it changes, it is doing some garbage collection - the first version does I think 3 collections and the seconds does 4 which sort of makes sense as once the strings has been compressed the original strings can be thrown away.

It is when we start to look at the database with lots of objects over 16,000 characters long we see some interesting things and some other forces come into play:

over 1600 chars procs, lots of disk activity

The first thing to note is the garbage collection, there are like 8 changes to the line so 8 sets of garbage collection that happen which is expensive for a .net app. We get much more processor usage for a more prolonged time and we really start to see some disk activity, write activity just has one peak while the files are written but reading stays quite high throughout the whole process. It could be that it is sql reading from disk (my demo was from a single ssd with a local sql instance) but we can see after the sql cpu settles down the green disk read line stays high so it is unlikely to be pure sqlserver.exe.

What does this tell us?

Well if you have large objects, multi-thousand line stored procs then you better get yourself some serious ssd's to deploy from. If you have smaller objects then you'll need CPU and memory - don't scrimp here!

4. Come up with a list of changes

So when the properties and tokens in the source / destination object have been compared they are added to a list of modifications - this is fairly straight forward but that is not the final list of modifications because there are lots of options to customize the deployment such as ignoring certain things - what happens next is that there is some post-processing done on the list to remove anything that the options say should be removed.

We then get the opportunity to interrupt the deploy process with a contributor and remove steps as we want - if we do that anything we do is overhead so be warned, be fast in what you do or slow down the whole process!

There are some things like default constraints, these are stored differently in SQL Server to our project - so when a difference is found, as part of the post-processing each constraint is "Normalized" using the script dom and a visitor to find out if the changes are actually different or just appear different - this means that we don't keep getting constraints deployed over and over (a relatively new feature as this was something that used to happen a lot - a great new feature added by the SSDT BTW!) but it does mean for every default constraint we need to go through this normalization using the script dom which isn't exactly lightening fast.

Takeaways from this are that the deploy does a lot of things to help us - if it was a straight string comparison then it would be faster but the deploy options would not be as configurable. The more objects you have, the longer the deploy will take!

5. Generate the T-SQL for the changes

6. Run the deploy script

I'm going to stop this post here - I think these are a bit more straight forward, the more things you have to change, the more time it will take :)

Summary

If the deploy is slow, use perfmon measure the cpu, disk and garbage collection of sqlpackage.exe (or whatever is doing the deploy) - have a think about whether streaming the model from disk would be better and if you are CPU bound add a more powerful CPU, if you are low on memory, add more memory and if you are on an old slow disk - put it on an ssd.

If you are low on CPU you might be tempted to add more CPU's - understand that a single deploy happens in serial so adding more CPU's won't make a deploy quicker if this is the only thing running on a box - if it is fighting for resources then more CPUs may help.

Pages