Ed Elliott's blog

SQL Server Continuous Deployment "In a Box"

  • Posted on: 7 March 2017
  • By: Ed Elliott

What is this?

Well if you read the name aloud "SQL Server Continuous Deployment in a box" then, if I have done my work correctly choosing the title for the blog, give a hint :)

what is the big idea?

There is really some great tooling for SQL Server - second to none really when it comes to RDBMS's and setting up Continuous Deployment pipelines is actually pretty simple once you know which parts to plug together. I even did it in 55 minutes once (https://www.youtube.com/watch?v=9YJQTx3bPek).

What I wanted to do was to allow people to spin up a vm, install ssdt (or visual studio with ssdt), install a local dev instance of SQL Server (or use localdb), run a script and add some parameters and have everything they need to be able to make changes to some code and have that code automatically deployed to a production database.

Now a little word about that word "production", you could set this up to point to your production database but what I would suggest for the demo is that you use a copy of your production database or something which you will call "production" - the tools here can all be used in a real life setup but you wouldn't normally host everything on your development machine.

How does it work?

The idea is that anyone who can download from the internet can do this, so setup the pre-requisites (ssdt and sql) and then either clone the repo (https://github.com/GoEddie/SQLServer-Continuous-Deployment-In-A-Box/) or download the latest zip from:

https://github.com/GoEddie/SQLServer-Continuous-Deployment-In-A-Box/arch...

Note to get the zip, you don't need to use git or sign up for an account or anything (other than clicking the link)

Once you get that then extract the folder, open powershell as an administrator, change to the src folder and run these two little commands:

Unblock-File *.ps1
.\ContinuousDeploymentFTW.ps1

What does this do?

Unblock-File *.ps1 - removes a flag that windows puts on files to stop them being run if they have been downloaded over the internet.
.\ContinuousDeploymentFTW.ps1 - runs the install script which actually:

  • Downloads chocolatey
  • Installs git
  • Installs Jenkins 2
  • Guides you how to configure Jenkins
  • Creates a local git repo
  • Creates a SSDT project which is configured with a test project and ssdt and all the references that normally cause people problems
  • Creates a local Jenkins build which monitors your local git repo for changes
  • When code is checked into the repo, the Jenkins job jumps into action and...

If you check into the default branch "master" then Jenkins:

  • Builds the SSDT project
  • Deploys the project to the unit test database
  • Runs the tSQLt unit tests
  • Generates a deployment script for the "production" database

and what you have there is continuous delivery in a box, now I know that isn't what you were sold by the title but I wanted to show a couple of different approaches to this so if you use git to create a release branch and check-in on it by changing to the directory with the SSDT project in powershell and doing:

git checkout -b release

Make a change and then...

git add .

git commit -m "a change that will go straight to production \o/"

You will see that a "Release" jenkins build is created automatically because the job we set up initially is a "Jenkins Multi-branch pipeline" - don't worry about that but what you see is Jenins:

  • Builds the SSDT project
  • Deploys the project to the unit test database
  • Runs the tSQLt unit tests
  • Deploys the SSDT project to the "production" database

Nice hey?

Why the choice of technology?

SSDT - this doesn't need ssdtm you could do this with readyroll, dbup etc anything
Git - aren't most people moving to git nowadays?
Jenkins 2 - for the multi-branch pipelines which means it automatically creates builds from the Jenkinsfile which is checked into source control

Sounds hard to setup?

It isn't all you need to do is configure Jenkins, create a user and give my script the username and token and also the connection details to the unit test and production databases. If you like when you get the SSDT project you can import from your production database which will then be deployed to your unit test database, or you can leave it empty, or add a couple of objects - whatever suits you!

Prerequisites

I would create a VM, install SSDT or Visual Studio with ssdt in (2015 or 2017), install a local SQL Server 2008+ and restore a copy of your production database that should be it.

I made a video to show the awesoness of all of this:

https://the.agilesql.club/assets/videos/SQLCDINABOX.mp4

I made the video to see how much fun it was to make videos, it was very fun but this will be the only one ;)

Enjoy and good luck!

ScriptDom parsing and NoViableAltExceptions

  • Posted on: 2 March 2017
  • By: Ed Elliott

If you have ever tried to debug a program that used the TSql Script Dom to parse some T-SQL you will know that the process is extremely slow and this is due to the volume of NoViableAltExceptions (and others) that are thrown and then caught. Because these are first chance exceptions they are being handled and it is the way that the script dom interacts with Antlr and the Lexer that they use. When you debug a program what happens is you have two processes, process one is the debuger, this starts (or attaches) to process two, the debugee.

The debugger calls a windows function WaitForDebugEvent typically in a "while(true)" loop (everyone should write a windows debugger at some point in their lives you learn so much, in fact put down ssms and go write your first debugger loop: https://msdn.microsoft.com/en-us/library/windows/desktop/ms681675(v=vs.85).aspx). The debugee app is then run and when something interesting like an exception or a dll is loaded/unloaded the debuggee is paused (i.e. all threads stopped), then WaitForDebugEvent returns and the debugger can look at the child process and either do something or call WaitForDebugEvent again. Even if the debugger doesn't care about the exceptions the debugee is still paused and when you parse T-SQL under a debugger, even if you tell Visual Studio to ignore the exceptions, the debugee is still paused for every exception just so Visual Studio (or your home baked debugger) can decide that it wants to ignore that exception and the debugee is started up again.

What this means for an app that throws lots of first chance exceptions is a constant start, stop, start, stop which is so so painful for performance - it is basically impossible to debug a TSql Script Dom parse on a large project, I typically debug a project with like one table and one proc and hope it gives me everything I need or do other tricks like letting the parsing happen without a debugger attached then attach a debugger at the right point after the parsing has happened but then again I don't have to debug the TSql Lexer's!

So where is this leading?

I was wondering what effect these first chance exceptions had on T-SQL and even in normal operations where we don't have a debugger attached, is there something we can do to speed up the processing?

The first thing I wanted to do was to try to reproduce a NoViableAltException, I kind of thought it would take me a few goes but actually the first statement I wrote caused one:

"select 1;"

This got me curious so I tried just:

"select 1"

Guess what? no NoViableAltException the second time - this didn't look good, should we remove all the semi-colon's from our code (spoiler no!).

Ok so we have a reproducable query that causes a first chance exception, what if we parse this like 1000 times and see the times and then another 1000 times with the semi-colon replaced with a space (so it is the same length)?

Guess what? The processing without the semi-colon took just over half the time of the queries with semi-colons, the average time to process a small query with a semi-colon in took 700ms and the query without the semi-colon took 420ms so much faster but who cares about 300 milli seconds? it is less than 1 second and really won't make much difference in the overall processing time to publish a dacpac.

I thought I would just have one more go at validating a real life(ish) database so I grabbed the world wide importers database and scriptied out the objects and broke it into batches, splitting on GO and either leaving semi-colons or removing all semi-colons - when I had semi-colons in the time it took to process was 620 ms and there were 2403 first chance exceptions. The second run without semi-colons which would likely create invalid sql in some cases - took 550 ms and there were still 1323 first chance exceptions, I think if we could get rid of all teh first chance exceptions the processing would be much faster but ho hum - to handle the first chance exceptions you just need a fast CPU and not to be a process that is being debugged.

SqlPackage Deploy Performance - IgnoreXX are not your friend!

  • Posted on: 2 March 2017
  • By: Ed Elliott

Following on from yesterdays blog I was wondering about the comparison of objects that were the same and how the IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements flags affected the comparison. To be fair I was only interested in IgnoreWhitespace but actually it turns out that those four are very closely related.

When the deploy happens, where a script in the source and target are compared the process is:

  • 1. Loads of things we will skip
  • 2. Any cmd variables in the scripts are replaced with their appropriate values
  • 3. If both the scripts are null - the comparison returns true. This has to be the best for performance but the worse for functionality ;)
  • 4. If one script is null but not the other then the comparison returns false. This actually has to be the best for comparison performance but worse for deploy performance!
  • 5. We then get a good old fashioned String.Equals, the standard .net compare goes: if both strings are not null and the lengths are the same do a check on each byte in the strings
  • 6. If the strings are equal we have a match, happy days no more action required

It is what happens if the strings do not match that it starts to get a bit more interesting, if the strings are not equal and any of those four ignore options are True then we then fall down into doing a further comparison but after the scripts have been normalized using the script dom and antlr which is an expensive operation in itself (this also happens to be my next topic!).

Once the normalization has been done we end up in the actual compare which goes like this:

  • 1. Turn the script into a stream of tokens
  • 2. If the token is a comment and ignore comments is set, skip it
  • 3. If the token is whitespace and ignore whitespace is set, skip it
  • 4. If the token is a semi-colon and ignore semi-colon's is set, skip it
  • 5. Then compare the tokens which itself does things like use IgnoreKeywordCasing and removes quotes around quoted identifiers - it isn't a straightforward String.Equals
  • 6. If any of the tokens don't match then it is a failure and the script needs to be changed

So what?

So blunt. Anyway, basically what this means is that the default options in the sqlpackage.exe are set to allow things like different cased keywords and whitespace and to allow that we end up taking longer to do deployments where we actually make use of the default features.

huh?

If you have a database with lots of code and you have the code in SSDT but you do things like change the comments when you deploy and rely on IgnoreComments (this is a real life secenario I have seen, someone adding a custom comment header) then you will have slower deployments and as slower deployments are the opposite of what we want you should:

  • Have the same code in your database as you have in your project
  • Have the same code, including the same case of keywords, comments and whitespace in your database that you have in your project
  • Disable the defaults and set IgnoreWhitespace, IgnoreComments, IgnoreKeywordCasing and IgnoreSemiColonsBetweenStatements all to false

What effect does it have?

If your database a project code are exactly the same, then no effect you neither gain nor lose anything.

If your database and code are different by comments, case, semi-colons etc and you have lots of files that are different then you will gain quite a bit. On my machine here I created a database with 1,000 stored procedures like "select '----'" (I used replicate to make it large) I then imported the procs into SSDT and added a space between the select and the text and did a deploy using sqlpackage (in fact I did a few to get an average time), with the default IgnoreWhitespace=true the deploy took about 24 seconds (remember this is on a machine following yesterdays recommendations. lots of memory, fast CPU and SSD) - when I removed the defaults and set them to false - firstly the deploy took 34 seconds because naturally it had to deploy the procs then re-running it took around 17 seconds - about 7 seconds from a 24 second deploy which i'll take.

The thing that you will really gain is that your project code and database will be the same which should really be the end goal, if you can honestly say that you have to have:

  • Different whitespace
  • Different keyword casing
  • Different semi-colons
  • Different comments

I would be inclined to find out why as it sounds like an interesting project :)

SSDT Deploy / Publish Performance

  • Posted on: 1 March 2017
  • By: Ed Elliott

Publishing dacpac's is a little bit of a pain when you have multiple databases, it can easily start to take minutes to hours to deploy changes depending on how many databases and the size of those databases. I wanted to understand more about the publish process and what we can do to speed it up as much as possible so I did some digging and this is a randomish post about how it all works and what we can do to make it faster.

Process Overview

Roughly speaking the process for deploying dacpacs is:

  • 1. Open dacpac (zip file)
  • 2. Open database
  • 3. Compare dacpac to database
  • 4. Come up with a list of changes
  • 5. Generate the T-SQL for the changes
  • 6. Run the deploy script

1. Open the dacpac (zip file)

The dacpac contains the xml representation of the model of the database and the original source code. To be able to deploy changes we need the original source code - the model contains the definitions of the objects such as which columns the table has and the parameters a stored procedure has but not the body of the stored procedure including comments.

The dacpac is a zip file, this means to read from it the deployment must create a file handle to the dacpac and uncompress anything it wants to read (or compress anything it wants to write), the larger the dacpac logically the longer it will take to decompress and also read from disk so there are our first two important things to note, the deploy will need:

  • CPU to decompress the contents of the dacpac
  • Fast disk as it has to read from disk

When you run the publish, by default the model is read from disk into memory, however, for very large dacpac's you might find that you don't have enough memory for it - a windows system with low memory is bad, two things happen:

  • The paging file is used
  • Applications are told to free memory

This is painful for an application and when you get paging, especially on non-ssd drives, you are going to find everything about the whole system slow.

If you have a large model in your dacpac and you don't have much memory then you will probably need to load the model from disk rather than into memory, but you best make sure you have an ssd to run it from!

For more details on how to do that see: https://redphoenix.me/2013/06/14/why-is-sqlpackage-using-all-the-build-s... (Richie Lee FTW!)

2. Open database

The database contains all sorts of goodies like the version that will be deployed to, the database settings and the definitions that will be compared to the dacpac to see what changes need to be made. If your SQL Server responds slowly then the publish will be affected by that so make sure your SQL Server box is up to scratch and expect slower deploy times on resource limited instances.

3. Compare dacpac to database

I had always wondered how they compared the two different objects so I bust out reflector and used it to have a poke about. What happens is a model is build of the source and target - a model is a representation of each object, it has its name, its properties and relationships to other objects for example a table has a relationship to its columns and each column has a relationship to its type (not a property of the type).

If you think about this it means, for each deploy we actually need to store the model of both the database and the dacpac so doubling the size requirements - this isn't to say an exact double of the size of the source code becauuse what it compares is things like the object properties and relationships but also the tokens that make up the body of the object, so there are a lot more objects that are created around the code. In short if your source code is 100 mb you will need some multiple of 2 * 100mb to upgrade an existing database - I am not sure if there are any figures to show what that multiple is but if you do a publish and you run low on memory or you get lots of garbage collection in your deploy then consider either storing the model on disk or throwing in some more memory.

There are a couple of interesting extra things that happen, in particular to do with the size of code in objects and how much to store in memory, to store in memory compressed or to write to disk. If for example you had a stored proc that was less that 500 characters long, the publish will store it in memory.

If the string representation of a procedure was over 500 characters but less than 16 thousand characters the it will be stored in memory but compressed first.

Finally if the stored proc (I am using procs as a example but I think it is any code unit minus the create + name so everything after "as") is over 16 thousand characters then a temp file is generated in your user temp folder and the contents are written to that.

So three different behaviours depending on the size of your objects:

  • Under 500 chars - you need memory
  • Over 500 chars, under 16,000 you need memory and CPU
  • Over 16,000 chars you need memory and fast disk for your temp folder

I created three demo databases and deployed them over and over, the first contains a couple of hundred procedures with slightly under 500 characters, the second with slightly over 500 characters and the last with slightly over 16,000 characters.

When I ran the publish this is what they looked like:

Under 500 char procs:


under 500 chars procs, fastest to deploy

Slightly over 500 char procs:

just over 500 chars procs, similar to under 500 chars but includes some time to compress some strings

What we can see here is that the under 500 char procs, takes the least amount of time but the interesting thing is the database with slightly over 500 chars procs has a little bit of extra umpf from the CPU - I have showed this by using the pink circle - both deploys follow the same pattern for sqlpackage.exe CPU usage - it goes from around 90% cpu up to 100% and then drops down sharply to around 80% and then back up to 90% but the second project with the slightly larger procs - also has an extra splurge of CPU which I am putting down to the extra compression that those strings are getting - it certainly tallies up, even if it is not scientific :)

Aside from the extra CPU usage they are pretty similar, the black highlighted line is garbage collection and the way this counter works is every time it changes, it is doing some garbage collection - the first version does I think 3 collections and the seconds does 4 which sort of makes sense as once the strings has been compressed the original strings can be thrown away.

It is when we start to look at the database with lots of objects over 16,000 characters long we see some interesting things and some other forces come into play:

over 1600 chars procs, lots of disk activity

The first thing to note is the garbage collection, there are like 8 changes to the line so 8 sets of garbage collection that happen which is expensive for a .net app. We get much more processor usage for a more prolonged time and we really start to see some disk activity, write activity just has one peak while the files are written but reading stays quite high throughout the whole process. It could be that it is sql reading from disk (my demo was from a single ssd with a local sql instance) but we can see after the sql cpu settles down the green disk read line stays high so it is unlikely to be pure sqlserver.exe.

What does this tell us?

Well if you have large objects, multi-thousand line stored procs then you better get yourself some serious ssd's to deploy from. If you have smaller objects then you'll need CPU and memory - don't scrimp here!

4. Come up with a list of changes

So when the properties and tokens in the source / destination object have been compared they are added to a list of modifications - this is fairly straight forward but that is not the final list of modifications because there are lots of options to customize the deployment such as ignoring certain things - what happens next is that there is some post-processing done on the list to remove anything that the options say should be removed.

We then get the opportunity to interrupt the deploy process with a contributor and remove steps as we want - if we do that anything we do is overhead so be warned, be fast in what you do or slow down the whole process!

There are some things like default constraints, these are stored differently in SQL Server to our project - so when a difference is found, as part of the post-processing each constraint is "Normalized" using the script dom and a visitor to find out if the changes are actually different or just appear different - this means that we don't keep getting constraints deployed over and over (a relatively new feature as this was something that used to happen a lot - a great new feature added by the SSDT BTW!) but it does mean for every default constraint we need to go through this normalization using the script dom which isn't exactly lightening fast.

Takeaways from this are that the deploy does a lot of things to help us - if it was a straight string comparison then it would be faster but the deploy options would not be as configurable. The more objects you have, the longer the deploy will take!

5. Generate the T-SQL for the changes

6. Run the deploy script

I'm going to stop this post here - I think these are a bit more straight forward, the more things you have to change, the more time it will take :)

Summary

If the deploy is slow, use perfmon measure the cpu, disk and garbage collection of sqlpackage.exe (or whatever is doing the deploy) - have a think about whether streaming the model from disk would be better and if you are CPU bound add a more powerful CPU, if you are low on memory, add more memory and if you are on an old slow disk - put it on an ssd.

If you are low on CPU you might be tempted to add more CPU's - understand that a single deploy happens in serial so adding more CPU's won't make a deploy quicker if this is the only thing running on a box - if it is fighting for resources then more CPUs may help.

SSDT and Friends - .net meetup video

  • Posted on: 10 February 2017
  • By: Ed Elliott

I did a talk at the london .net meetup if you want to get an overview of what SSDT is and how to get started then I would recommend it:

https://skillsmatter.com/skillscasts/9274-londondot-net-january-meetup

This was aimed at .net developers rather than DBA's so there isn't much talk about "why you should use source control" etc as everyone in the room used source control already :)

Devops without management buy in?

  • Posted on: 10 February 2017
  • By: Ed Elliott

I was talking to someone at a meetup recently who was really keen on doing continuous deployment for their database but they had a number of issues, the main was that because management wasn't sold on the idea and the DBA's had complete control to push back on all and every idea he had - there was no way he could deploy continuously.

The accepted route for devops is management buy-in, if you do not have management buy-in then you can't do devops. I agree totally with this, I have seen how an executive can set the direction and all of a sudden hurdles that were there are now removed or being removed but what do you do when you don't have management buy-in?

In short you do as much as you can, for sql databases that means you:

- use source control
- use an IDE
- write unit tests
- run your tests on check-in / merge
- generate your deployment scripts

You do as much as you can, in an ideal world you would check into a shared source control, your build server will checkout the code build, deploy and test it before deploying to the prod server - but if you have zero resources you can still use git to create a local repo, you can install jenkins locally and point it at your local git repo and you can write tSQLt tests plus you can do it all from the free version of SSDT. When you have done that there is nothing stopping you having a continuous delivery pipeline locally.

Is it ideal? No - does it work in practice? Yes.

If you are just one developer then you can do it all locally, if you are part of a team then find a server to host git and jenkins - the phrase "do it first then ask for forgiveness later" springs to mind

(warning: you know your environment, don't blame me for you actions) :)

SQLCover Fixes and Download location

  • Posted on: 30 September 2016
  • By: Ed Elliott

There have been a couple of fixes in SQLCover this week, kindly submitted by John Mclusky (https://github.com/jmclusky):

Code coverage not reported correctly for CTEs at the end of a stored procedure if the 'with' is immediately preceded with a semicolon

and

DeclareTableVariableStatement statements cannot be covered, so report falsely as missing coverage

I have also changed where the releases can be downloaded from to use the github releases:

https://github.com/GoEddie/SQLCover/releases

The previous version is still available but I would recommend the latest version.

Finally I have started a seperate page to bring all the links together and will add a FAQ at some point:

https://the.agilesql.club/Projects/SQLCover

Happy testing!

Refactoring in SQL Server Data Tools - SSDT

  • Posted on: 27 September 2016
  • By: Ed Elliott

In this post I will talk about the in-built refactoring support in SSDT – the language is slightly different from my normal style as originally it was going to be published else but rest assured it is written by myself

What is refactoring?

In programming , the term ‘refactoring’ essentially means taking some code and improving it without adding features and without breaking the code. When we refactor code we ideally want to make small improvements over time, using an IDE that automates as many of the tasks as possible for us.

While SQL Server Data Tools (SSDT) cannot guarantee that we do not break any code it helps us make small improvements and, as an IDE, it offer us some refactoring abilities that do not exist in either SQL Server Management Studio (SSMS) or Notepad.

Just so we don’t get distracted by some of the third-party add-ins that give more comprehensive support for refactoring in SSDT or SSMS, this article will talk only about what is out-of-the-box with SSDT.

What refactoring support does SSDT have?

SSDT helps us to refactor code by automating the actions of:

  • Expanding wildcards
  • Fully qualifying object names
  • Moving objects to a different schema
  • Renaming objects

Aside from this list SSDT also, of course, helps us to refactor code manually with its general editing facilities.

Expand Wildcards

SSDT allows you to highlight a “*" from a SELECT statement and have it replace the “*" with a comma-delimited list of the column names in the table. As an example if we take these table definitions:


CREATE SCHEMA hr
GO
CREATE TABLE dbo.person
( person_id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
first_nmae VARCHAR(25) NOT NULL, --typo is on purpose!
last_name VARCHAR(25) NOT NULL
)

CREATE TABLE hr.departments
( id INT NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
funny_name_ha_ha_I_am_so_cool_my_last_day_is_tomorrow VARCHAR(25) NOT NULL
)

CREATE TABLE hr.department
( person_id INT,
department_id INT
)

And the following stored procedures:


CREATE PROCEDURE hr.get_employee ( @employee_id INT )
AS
SELECT *
FROM person p
JOIN hr.department dep ON p.person_id = dep.person_id
JOIN hr.departments deps ON dep.department_id = deps.id;
GO
CREATE PROCEDURE prod_test
AS
EXEC hr.get_employee 1480;
GO

To improve this code the first thing we will do is to change the “SELECT *" in the hr.get_employee procedure to specify just those columns that we need. If we open the stored procedure in SSDT and right click the “*" we can choose ‘refactor’ and then ‘expand wildcards’:

We are then given a preview of what will change, and we can either cancel or accept this:

Now that we have the actual columns, we can remove the ones we do not need, save and check-in our changes. The procedure should look like:


CREATE PROCEDURE hr.get_employee ( @employee_id INT )
AS
SELECT [p].[first_nmae], [p].[last_name], [deps].[funny_name_ha_ha_I_am_so_cool_my_last_day_is_tomorrow]
FROM person p
JOIN hr.department dep ON p.person_id = dep.person_id
JOIN hr.departments deps ON dep.department_id = deps.id;
GO

Fully qualify object names

We need to make sure that our SQL Code uses fully qualified object names. This is because …. In this example, the get_employee stored procedure references the person table without a schema which means that the user must have dbo as their default schema. To fix this we right click anywhere in the stored procedure and choose Refactor and then ‘Fully-qualify names’, we could also use the default shortcut of ctrl+r and then q. Again get a preview window:

If we accept the preview, then we end up with the following code:


CREATE PROCEDURE hr.get_employee ( @employee_id INT )
AS
SELECT select [p].[first_nmae], [p].[last_name], [deps].[funny_name_ha_ha_I_am_so_cool_my_last_day_is_tomorrow]
FROM [dbo].person p
JOIN hr.department dep ON p.person_id = dep.person_id
JOIN hr.departments deps ON dep.department_id = deps.id;
GO

This doesn’t seem like a massive deal as we could just have written ‘dbo.’ but if we had a more than one to update or a number of different tables then it would have saved more work for us.
The ‘fully qualify object names’ goes further than just table names, it will fill in tables in join statements and also columns where it is needed.

For example if I had the following query:


SELECT first_nmae
FROM [dbo].[person] p
JOIN hr.department d ON department_id = p.person_id

When using the refactoring we are offered the chance to fully qualify first_nmae and the department_id in the join:

If we decided we did not want to apply the refactoring to one or the other we could uncheck them in the dialog and only apply the ones that we actually required.

If we apply both of the ‘refactorings’, we end up with:

SELECT [p].first_nmae
FROM [dbo].[person] p
JOIN hr.department d ON [d].department_id = p.person_id

Move objects to a different schema

We can refactor the schema that an object belongs to. This is a three-stage process that:

  • 1. Changes the schema that the object belongs to
  • 2. Changes all references to the original object to specify the new schema
  • 3. Adds an entry to the refactorlog.refactorlog to help the deployment process

If we want to move the person table from the dbo schema into the hr schema, we can simply right-click the table and then choose ‘refactor’ and then ‘move schema’:

If we look at the preview we can see that, as well as changing the schema on the object itself, it is also going to change the references to the table everywhere else in the database:

This refactoring also adds a new file to our SSDT project, the Refactor.refactorlog file in order to assist in a correct deployment of the change that preserves the data in the table. Inside the refactorlog is some xml:

<?xml version="1.0" encoding="utf-8"?>
<Operations Version="1.0" xmlns="http://schemas.microsoft.com/sqlserver/dac/Serialization/2012/02">
<Operation Name="Move Schema" Key="5df06a6f-936a-4111-a488-efa0c7f66576" ChangeDateTime="11/10/2015 07:10:39">
<Property Name="ElementName" Value="[dbo].[person]" />
<Property Name="ElementType" Value="SqlTable" />
<Property Name="NewSchema" Value="hr" />
<Property Name="IsNewSchemaExternal" Value="False" />
</Operation>
</Operations>
</code>

What this does is to save the fact that an object has changed from one schema to another. SSDT reads the refactorlog when generating a deployment script. Without the refactorlog, SSDT would look at the source dacpac and the target database and drop the table ‘dbo.person’, deleting all its’ data, and create a new empty ‘hr.person’ as they are different objects. Because of the refactorlog, SSDT generates a schema transfer rather than a drop / create:

SSDT stops the change happening again by recording, in the target database, that the refactor key has been run so you could create a new table called hr.people and it would not get transferred as well:

Renaming Objects

The final type of built-in refactoring is to rename objects, this works similar to the move schema object but it allows us to rename any object such as a procedure, table, view or column. SSDT renames all the references for us:

SSDT also adds an entry into the refactorlog:

And finally generates a “sp_rename" as part of the deployment rather than a drop/create:

Renaming objects really becomes pretty simple and safe so you can go though and correct small mistakes like spelling mistakes or consistency mistakes. Without SSDT or another IDE to do it for you it is really difficult to rename objects as part of a quick refactoring session.

Other ways SSDT helps to refactor

Aside from the in-built refactors that SSDT has it helps us to refactor because it allows us to find where we have references to an object. For example if you wanted to add a column to a table but did not know whether there were stored procedures that did a select * from the table and then did something that would be broken by adding a new table you could right click on the table name and do “Find All References":

We can also do the same thing for column names and so we can really easily get a picture of how and where objects are used before we change them.

General advice on refactoring SQL Server databases

Refactoring SQL Server databases is really helped by using SSDT but there are two things that you can do which really give you the freedom to refactor your code as you go along:

  • Use stored procedures / views rather than access tables directly
  • Have a suite of unit/integration tests
  • Use stored procedures / views rather than access tables directly

If you access your data from your application using the base tables it means you cannot change anything in your tables without also changing the application. It also makes it really hard to find references to your objects to know where they are used. Instead you should use stored procedures as an API to expose your data. If you do this then instead of having to manually find references you simply need to find the places where the stored procedure is called.

Have a suite of unit/integration tests

If you have both unit and integration tests then, as well has having a tool that helps you to refactor your code, you also get the confidence of knowing that you can make changes without breaking anything else. Without these test suites it is hard to known whether you have broken that year-end critical task that is so easily forgotten about.

Conclusion

SQL Server Data Tools have the basics of refactoring tools, but it isn’t really what one comes to expect from a SQL IDE. What about a tool to automatically insert a semi-colon after every statement, if there is none? Why is there nothing that changes the case of keywords according to SQL Conventions? One looks in vain for a way of reformatting code correctly. One could be more ambitious and ask for ways of finding variables that aren’t used, splitting tables, encapsulating code within a stored procedure, or checking for invalid objects. Fortunately, there a number of add-ins that fill the gap, and in the next article we take a look at SQL Prompt, which is the leading

Database Deployments in Uncontrolled Environments

  • Posted on: 20 September 2016
  • By: Ed Elliott

The ideal is to make a change and see that change deployed to production, in a perfect world we would be told to work on something, write the code + tests, deploy to a test environment, prove it works and deploy - this is the cycle time and the faster you can get this the easier many things become.

The cycle time is easy to measure - it is the time the ticket arrives in the backlog to the time it moves to the done column, if your issue tracking system can't tell you this easily then use something else! The tickets are moved into the "Done" state when they have been deployed into production - if you do nothing but investigate and try to reduce your cycle time you will make a massive difference to your development process.

There have been a few discussions on stack overflow recently about how to manage deployments in uncontrolled environments, specifically data migrations. The questions were from an SSDT perspective, I don't think that SSDT is a great choice for these uncontrolled environments and there are some additional requirements for these uncontrolled environments that need some additional thought and care when creating release scripts (whether manually or using a tool).

What is an uncontrolled environment?

I define it as a database that is managed by a customer, typically a vendor sells a product and it includes a database. The database is on a customer server and the customer is sysadmin and can make changes.There is a difference to databases where customers are allowed to make changes and ones where they are not allowed - but in either case you still need to take extra care, even if it is only to add additional logging and warnings to the output so any upgrade scripts help your support diagnose issues rather than displaying an error like "DBUpdate Error" - yes I have seen that with a vendor product once!

When you own and manage the deployment for your database application you can do these things because you can take for granted:

You can Because
Drop objects not in source code If it isn't in your source it does not exist
Rely on scripts generated / created by your dev team If someone wants to create an object called X they can see if an object called X already exists or not
Ensure each deployment happens successfully Run each deployment, run tests and verify the results
Write data migration scripts using accurate data You have the data
Use truncate on tables to clear data The script author knows there are not any foreign keys pointing to the table and that the data can be restored by backup rather than a transactopn

If you do not control the environment then you cannot do these things because:

You can not Because
Drop objects not in source code Who knows what the user has changed
Rely on scripts generated / created by your dev team Users may have made non compatible changes, you want to create a new view called "users_blah"? Well it turns out they have a audit stored procedure called users_blah
Ensure each deployment happens successfully Run each deployment, run tests and verify the results
Write data migration scripts using accurate data You have the data
Drop objects not in source code If it isn't in your source it does not exist
Use truncate on tables to clear data The script author does not know there are not any foreign keys pointing to the table and that the data can be restored by backup

So what can we do?

I really don't think that there is a 1-sized fits all solution here so you will need to look at your database and what changes you need to make but some randomish thoughts are:

  • Compare / Merge type deployments will drop any changes the customer has made - that is bad
  • If you had each version of the source, you could verify whether there have been any changes before deploying - that sounds good but potentially a support nightmare
  • The migrations approach sounds better but you need to ensure that every change is actually deployed
  • Adding additional logging and verification code is a must, instead of "truncate table" then "print truncating, check for things that will block this, truncate" - making sure that a un-reversable command like truncate has forced the user to backup or at least make sure the users understand that they need a back (even in 2016 this isn't guaranteed!)
  • Taking simple precautions like not using "select *" and using column lists and "order by column" rather than "order by ordinal" will help you in the long run with odd issues that will be hard to disagnose!

I guess the end answer is to offer a hosted solution and move to continuous deployment in a controlled environment as that actually makes a lot of these things simpler!

Controlling a dacpac deployment

  • Posted on: 21 August 2016
  • By: Ed Elliott

I have been thinking quite a lot recently (ok not that much but my thinking has changed) about how to deploy dacpac's. In particular I am talking about how to control the deployment, what tool is used to deploy the dacpac, what connection details are passed in, which of the hundreds of parameters are passed in and how.

I think that as I configured and deployed databases using dacpac's in a number of different environments my approaches are been:

In the beginning there was sqlpackage.exe

sqlpackage is great, it gives you a well documented command line and you can pass whatever arguments you need. The amount of times I have typed sqlpackage /action:script or /action:publish, the a,c,t keys on my keyboard are a little bit faded (they aren't but you get the picture!).

What I used to do was to check into my unit test ssdt projects, a directory with sqlpackage and all the dependent dll's so I could run the deploy from any build agent without having to have the DacFx or SSDT installed. This worked really well but there were a couple of downsides, firstly, the version checked in would hardly ever be updated and with an almost monthly (actual?) release cycle for ssdt builds this means you get behind pretty quickly. The other main issue is that you have to build up a lot of command line arguments so invariably I would end up with a powershell script or piece of .net code to manage that complexity. This is made even more complex when you have to consider the pain that is spaces in windows and an automated deployment process. There is an alternative to passing arguments on the command line that is to use a publish profile - I completly discounted these, I am not sure why but it was a long time ago - let's just agree that the important thing is that this approach to using sqlpackage.exe left me feeling a little uneasy in a few places.

After sqlpackage.exe there came code

The next approach was to write some code, this would either be powershell or c#. Typically powershell would be as part of a production ci/cd process - I would write a script to call the DacFx and then call the script from jenkins/vsts etc. I also found myself calling the DacFx from c# but this was typically limited to integration tests for tools that I was writing.

I liked the scripted approach because it meant that I could still pass some argumenst like the server name, user credentials etc on the command line (or encrypt them where necessary) and put a whole load of arguments in the scruipt to be shared by each environment. There were still a couple of problems, firstly as with the sqlpackage.exe approach I still had the problem that the DacFx needed to be installed and available so I would check the files into source control (or make them available etc). There was one additional problem that I did not forsee, that is when you use sqlpackage.exe you can load contributors from a sub folder called "Extensions", when you used the DacFx yourself you had to install them into program files which went against my (pretty strong) desire to be able to deploy from any build agent (windows for now, but i'm hoping!).

Then came the flip-flopping

For a while I meandered between the two approaches until the ssdt team announced that they had released a nuget package with the DacFx in and I decided that I would move over to that as it meant that I no longer had to check in the dll's into source control which in itself is a big win. I also decided to fix the extensions thing and so figured out a (slightly hacky) way to get the DacFx dll's in the nuget package to behave like sqlpackage and allow a sub-directory to be used to load dll's - I fixed that using this powershell module that wraps a .net dll (https://the.agilesql.club/blogs/Ed-Elliott/DacFxed-Nugetized-DacFx-Power...). Now I have the problem of not having to check in dll's and still being able to load contributors without having to install into program files sorted BUT I still had the problem of lots of command line args which I was sharing in powershell scripts and passing in some custom bits like server/db names etc.

The final piece of the puzzle

<blink>Publish Profiles<blink> ta daaaa.

I had actually been using publish profiles, I normally had at least one as part of my unit test project that I would use to dpeloy my whole database locally before running tests before I did a git push (obviously during dev I would do a quick deploy rather than constant deploys). My current grand scheme is to put everything I need in publish profiles and use those to deploy dacpac's. I realise that this isn't a great revelation but I am happy with them and pleased with where I am now, who knows where I will end up!

*I know blink doesn't work :)

Pages