Learn how to use tSQLt to professionally unit test T-SQL code. To enroll in the free self-paced course visit here

xSQLServer is dead long live SqlServerDsc

  • Posted on: 15 December 2017
  • By: Ed Elliott

I have had a problem with DSC in Azure Automation for a few months now, there was a change made to the xSqlServer DSC resource which meant that the paths internally were too long to be compiled on Azure Automation, I don't even really want to think why because the fact that path lengths are an issue in 2017 (almost 2018!) makes me want to cry, so if you want to know more look at:


The fix was to deploy the old version to Azure Automation and use that but yesterday when I tried to use it, it failed as well and I was about to start crying when I found out that a version 10 had been written but not deployed, except it has been deployed so I switched my dsc script form xSqlServer to SqlServerDsc and everything is now cushty.

In the changelog there is a list of breaking changes and I had to change my resources form the old style to the new doing things like changing:

xSqlServerLogin LoginThing{
SQLServer = ''
SQLInstanceName = ''


xSqlServerLogin LoginThing{
ServerName = ''
InstanceName = ''

but once that was done and a typo or two fixed then compiling on azure automation works again and my life is now complete.

What The Tool? Multiple choice quiz to help you choose a tool for SQL Server

  • Posted on: 14 December 2017
  • By: Ed Elliott

There are a few different choices for tools to use when it comes to SQL Server, while the "experienced" amongst us have used query analyzer, enterprise manager and probably ssms it isn't always clear what tool you should use. If you consider sql operations studio and vscode it is a literal minefield of possible options.

I was quite pleased with how my https://the.agilesql.club/WhatTheIO/ tool worked out and so I thought I would build a simple multiple choice quiz to help people decide between SSMS, SSDT, SOS, VSCode + mssql add-in:


I will probably add the command line tools and maybe some third parties if I feel so inclined.


Azure Virtual Machine + Premium Disk Throughput Calculator

  • Posted on: 2 December 2017
  • By: Ed Elliott

I keep having to refer to the virtual machine size page and the disks pricing page to work out how best to stripe disks for whichever type of virtual maching in Azure to work out what sort of throughput we can get so I thought I would save myself some ink and automate it, hopefully someone else finds this useful.

To get to the calculator see:


What does it do?

You choose how many volume you want, whether you want caching or not and then how many disks choosing the amount and type of disks and then the calculator will tell you what your max throughput is and also when you have exceeded the maximum throughput of the virtual machine.

What does it look like?

Like this:

You can switch between vm types, in this example we can see that a single P20 exceeds the total throttle limit of the E2s_v3 so striping the disk or using a faster disk is pointless (unless we are using it just to get more space rather than performance).

Let me know if you get any issues, I've only tested on chrome and will probably add to it in the future, maybe adding in non-premium disks but for the moment it is just premium disks and vm's that support premium disks.

The calculator itself is available:


The instructions:



Fixing SQL AG Routing "Cannot find host" using packet analysis (and a spell checker)

  • Posted on: 30 November 2017
  • By: Ed Elliott

I was setting up an availability group listener recently and when I tried to connect to the listener I got an error message to say "The host cannot be found" after the usual 15 seconds connection delay.

I checked the usual thing like TCP was enabled on the replicas and that routing was configured, but every time I tried I kept getting the "The host cannot be found" error. Now, I don't know about you, but I find error messages like this completely infuriating especially when I knew the hostname of the listener resolved because I could ping the damn thing (and then connect straight to the IP address.

Now with a listener, you connect and then either connected or you are routed to one of the other hosts. I knew I could connect so guessed that I was being redirected to somewhere else but I didn't know where. Of course, if the error had just said "The host (hostname) cannot be found" I wouldn't be writing this today :)

So I decided what any self-respecting lover of network packets would do, I opened a command prompt google'd "netsh trace start command line not rubbish thingy" to get to this blog: https://blogs.msdn.microsoft.com/canberrapfe/2012/03/30/capture-a-networ... and ran "netsh trace start capture=yes tracefile=.\path\to\trace.etl".

When the trace was running, I quickly reproduced the error and then ran "netsh trace stop" and copied the etl file to my laptop where I had "Microsoft Packet Analyzer" installed. A quick hop skip and a jump over to the etl file in the packet analyzer tool and I wanted to see if I could see a failed DNS request so I went to the "Filter" window and added the word "dns" then clicked "Apply":

This did show a request for the wrong machine in the wrong domain, but how did it get there? When I saw this I went to SQL and the read-only routing URL in sys.availability_read_only_routing_lists and of course it was totally wrong. That was the fix, but not the end of this story dear reader.

Now I had a network trace of routing in action I thought to myself, as I am sure everyone who has ever worked with SQL Server has thought at one stage or another: "I wonder if the routing is handled via a TDS ENVChange token?".

I changed my filter from "dns" to "tds" and then I could see all the tds traffic:

I also changed the columns to include source and destination ports so I could see which packets were from the client and which were from the server. Traffic to the server will have a destination port of 1433 (unless you are mental and change it or aren't mental but are using an instance), the source will be different for every connection.

Looking at the source / destination pairs I could see there was more than one set of connections intermingled so I guessed mine was the one with the source port of 52264 as 62084 was sending SQLBatch tds packets and I wasn't getting that far, so a change of the filter from "tds" to "tcp.port == 52264" and we have a trace with just our traffic:

So what is all this? Well, a SQL Server connection goes like this (forget about silly things like local connections and named pipes, this is just TCP connections obviously):

1 - Create a TCP connection:

Client - Hi Server, I want to create a TCP connection to port (probably) 1433
Server - OK, go on then
Client - Thanks, I will then

The "S" in the "Flags" part of the summary for each packet is short for SYNCHRONIZE and it is TCP's way of saying Hello, the client sends a S, the server responds to the S with an A for ACKNOWLEDGEMENT and sends its own S for SYNCHRONIZE, the client then hilariously replies to the server's S with its own A. To put it another way:

"The SYN, SYN ACK, ACK" TCP handshake.

2 - Create a TDS connection

Now this is where life jumps into the fact lane, go back to google grab the [MS-TDS] spec (https://msdn.microsoft.com/en-us/library/dd304523.aspx) and read it :). Done? Good so you know the sequence to create a TDS connection over the previous TCP connection is:

First the client sends a PRELOGIN packet, this is unencrypted and works with basically any decent version of TDS from 4.2 upwards.

After the PRELOGIN packet, we get some TLS traffic so SQL can encrypt the LOGIN7 packet. This can be "man in the middle'd" unless you have "Trust server certificate" set to false, so if your not on a LAN make sure you always do that.

We'll skip the TLS setup stuff and can't see the response to the LOGIN7 packet as it is encrypted but we do see the response to the LOGNI7 packet which includes some NTLM setup guff which we don't care about here.

Once the connection is validated and the user logged on, we get the response to the login:

Now what happens in a login response is SQL says lots of nice things to us like "your database is now XXX" and "your language is XXX". It does this by sending a series of tokens that are the ENVCHANGE (environment change) token and also an INFO token which is the message you see written to the "messages" window in SSMS.

If we look at some of the tokens we see our first token is an ENVCHANGE with a token type of 1 = DATABASE which is a change database context token:

I'll leave you to go through any other TOKENS you like and jump onto the last token in the packet which is again an ENVCHANGE but packet analyzer doesn't recognise the TOKEN which is of type 20 (0x14). Jump back to the [MS-TDS] docs for ENVCHANGE TOKEN 20 and we see:

HOOPLA!, token 20 is an ENVCHANGE token that routes the client from one host to another, if we examine it we see the name of the server we will be redirected to:

Anyway, now to put the correct domain name in the routing list!

PS, Just to finish off the connection we can see the F flag sent from the client, the server responding with its own F and an A and then the client sends its own A and then nothing else happens. I will leave you to figure what this last bit was about and for bonus points have a look at why people don't use R or RESET (or RST) to close connections!

You have your database in source control now what?

  • Posted on: 16 November 2017
  • By: Ed Elliott

This post is for a specific type of person if you are:

  • New to source control
  • Are getting started on your path to the continuous delivery nirvana
  • Have been able to get your database into some sort of source control system
  • Have more than one person checking in code to the database source
  • You are unsure what yo do next

Then this post is for you!

Choosing a source control system and tools to manage your database code is a great hurdle to have got past, well done!

Actually getting your database into that source control system is another difficult hurdle to get last, double well done!

Now you should be starting to think about the next steps, and the sort of things that are on the horizon is, not immediately but not too far away (1-6 months):

  • Generating deployment scripts
  • Provisioning test databases to deploy to
  • Writing some tests
  • Automating the running of those tests
  • Using a build server to do all this for you
  • Sacking your dba's (jokes ha ha)

But what about now?

The thing that I would do now is to:

  • Determine your branching and release strategy
  • Start checking in and merging changes
  • Generate/gather deploy scripts
  • Nothing else

These things are so important to get right early on. It is hard to choose a correct system until you are where you are now. Until you have chosen a process and started using it, it is hard to know whether it works for you.

Determine your branching and release strategy

How are you going to manage your code while changes are going on? Typically you will want to be able to:

  • Check new features into the code repository
  • Check-in hotfixes to the current production and not lose those changes when the latest features are checked in
  • Allow multiple developers to check code in at the same time

There are a few different approaches, and my favourite is the git approach of creating a branch for each feature then merging back to master when the feature is ready to go to production but this favours getting changes out to production pretty quickly so you may not be ready for this yet.

Have a look at different branching strategies that your source control system uses and decide on one that you and your team can understand and work with.

When you have decided on your branching strategy, stick to it and make sure everyone follows the system. If you have had no source control and suddenly have source control it takes quite a lot of discipline as a development team to ensure you follow the process. Someone will forget to check something in, and someone will skip the process. Keep on eye on check-ins, make sure everyone follows the process - it is a big change in how SQL developers work so understand that this in and of itself is a major change for your team.

Start checking in and merging changes

Checking in code is the next step and having multiple developers means that you will have to start merging each other’s changes. Let the developers check in code and merge their changes, try out different tools for merging changes. If you have TFS, you probably have visual studio which has a decent merge tool built into it. If you are using git look at SourceTree or git Kraken.

You will get problems when you merge your changes, do the first few merges together and see where doing things like reformatting long stored procedures causes extra changes that are more challenging to deal with when other smaller changes to the procedures are merged.

Generate/gather deploy scripts

The next thing you will want to do is start to see some value from all this work, and I’m not suggesting that you start pushing all your changes to production yet (you haven’t even any tests yet!). Whatever type of tool you change chosen (automatically generate scripts/manage migration scripts etc.) find a way to generate those or gather them together, so you no longer have to spend x hours every release “gathering the scripts”. This starts to show value and has the benefit that people can’t bypass the source control system easily.


I don’t mean actually nothing, get on and do some actual work (you lazy little...)! What I mean is nothing more on your ci/cd process for now. You have a good 1-3 months work to do to perfect what you have, to work and get this bit nailed :)

SSDT How To Fix Error SQL17502

  • Posted on: 6 November 2017
  • By: Ed Elliott


If you build an SSDT project you can get an error which says:

"SQL71502: Function: [XXX].[XXX] has an unresolved reference to object [XXX].[XXX]."

If the code that is failing is trying to use something in the "sys" schema or the "INFORMATION_SCHEMA" schema then you need to add a database reference to the master dacpac:

Add a database reference to master:

  • Under the project, right-click References.
  • Select Add database reference....
  • Select System database.
  • Ensure master is selected.
  • Press OK.

Note that it might take a while for VS to update.

(Note this was copied verbatim from the stack overflow question with my screenshots added: https://stackoverflow.com/questions/18096029/unresolved-reference-to-obj... - I will explain more if you get past the tldr but it is quite exciting! )


I like this question on stack overflow as it has a common issue that anyone who has a database project that they import into SSDT has faced. It might not affect everyone, but a high percentage of databases will have some piece of code that references something that doesn't exist.

The question has a few little gems in it that I would like to explore in a little more detail because I don't feel that a comment on stack overflow really does them justice.

If we look at the question it starts like this:

The first thing is the error:

unresolved reference to object

What does this mean? When you write some code in SSDT, a stored procedure, function, foreign key (admit it, you weren't expecting foreign key were you!) SSDT validates that when the code references another object that the other object exists and is usable in that scenario. So if you have a table called "dbo.abc" and you made a mistake and typed "select a from dbo.ab3" then SSDT will notice this and give a warning or an error (more on this later).

After the unresolved reference we have



What is the INFORMATION_SCHEMA? Well, one of the most exciting things about modern RDBMS's (I say modern but basically forever) is that the SQL language specification dictates that the language used by an RDBMS is capable of querying and interacting with the environment as well as the data. Think of an RDBMS as being like a DevOps dream decades before Gene Kim et al. sat down at a typewriter and spilt the Pheonix project onto paper. The INFORMATION_SCHEMA is a way to query things about the database environment. In the example in the question, they are using it to find the number of tables that exist that have a specific name (tip: it will either be 1 or 0).

In SQL Server, the INFORMATION_SCHEMA is in the master database and is special in that you can type "SELECT * FROM INFORMATION_SCHEMA.SOMETHING" and you don't have to type the name of the master database first like "SELECT * FROM master.INFORMATION_SCHEMA.SOMETHING". This is nice as it saves our little typing fingers but also a little hard for SSDT as it means that it has something else to handle.

Moving on we have:

Within Visual Studio, the schema and name for the view are both marked with a warning.

There are two possible outcomes in SSDT when it can't find a referenced object, the default is what has happened here there is a warning, and you get a squiggly line under the code:

The last piece of the question is interesting,

I also tried changing the implementation to use sys.objects instead of this view, but I was given the same warning for this view as well.

The sys schema is similar to the INFORMATION_SCHEMA schema except it is SQL Server (and Sybase ha ha) specific rather than cross-platform as INFORMATION_SCHEMA is supposed to be.

The final thing about the question is it is by someone called Sam with (at today's count) 17 thousand stack overflow points, that is some serious stack overflowing!

The Answers

If we look at the first answer we have:

The answer is spot on and fixes the error for the view Sam is talking about in the INFORMATION_SCHEMA. I like that stack overflow lets someone ask and then answer their own question. He was stuck, he asked, he found the answer and instead of leaving a page without an answer like DenverCoder9, he found the answer and left it for the world:


Answer 1 Comments

The first comment is by the great Peter Schott (B | T):

To give a bit of background on this, Peter has a team where some people have installed Visual Studio to the C drive and some to the E drive, and they were getting errors when referencing the master.dacpac.

What happens with database references is that you get a block of XML that looks something like this added to your .sqlproj file:

When SSDT tries to resolve a reference, it looks at the "Include" attribute on the "ArtifactReference", and if it doesn't find that then it looks in the "HintPath" and failing that I think it looks in the build bin directory, but we won't rely on that working.

This reference is from a Visual Studio 2017 project and you can see that the "HintPath" is not hardcoded to the C drive but instead uses a variable to get to the dacpac. This was not always the case, a while back this was hardcoded using ".." relative paths (WTF!), a full discussion on this pain was:


So this fixes the shared dacpac from different system dacpac's but what if the code we are referencing was in a user dacpac? How do we set the reference and allow people to map their source code folders or build servers to the same projects?

There are two approaches. The first is to put the dacpac's in a known location "c:\dacpacs", "\\dacpacserver\dacpacs" etc. and always use that path. The second option is to check the dacpac's into source control with the solution and referencing using relative paths.

Both ways work, do what is best for you.

The next two comments are:

Veysel chose to copy the dacpac's into the solution folder, yay to Veysel.

Now Orad seems to have got stuck as they have already referenced the master dacpac. As a guess, I would say Orad's problem is either that they are referencing an object in another schema and has the same error but not for the sys or INFORMATION_SCHEMA schemas or Orad has mistyped a referenced object and the warning or error is actually valid.

They should re-check the code is correct. It is possible that they are referencing an object in the INFORMATION_SCHEMA that isn't in the dacpac. If this was the case then Orad could declare the missing object themselves, and it would probably just work, but we would need to keep his reference to master, otherwise, Orad would lose their "create schema INFORMATION_SCHEMA" statement in his master.dacpac.

After Orad's comment we have:

Martin is happy :)

NYCdotnet less happy but they do mention something interesting that there is a setting under "Online Editing". SSDT has a couple of different usage scenarios. The first, and in my opinion, the main reason for using SSDT is the offline editing where you edit T-SQL code offline, build and validate before you get anywhere near any a database.

The second scenario is online where you connect to a database and change the code like an old school DBA with a live production database in SSMS. The setting in this comment talks about the online version and doesn't fix the warning or error in the offline scenario which is probably being mentioned in this question (if this isn't about the offline version then I am going to hang up my technology boots).

The final comment on the first answer is by Scarl:

Scarl I guess either doesn't have an SSDT database project open or the installation has failed, try re-installing SSDT and creating a new database project from scratch.

Answer 2

techfield has suggested this:

In Sam's original question, they mention that they get a warning and the project still deploys but techfield points out that this can cause an error and if you get an error you can't deploy a project as it won't build. This can result in much frustration.

The difference between this error being displayed as a warning which can be ignored (unless you have "treat all warnings as errors" which of course you should but don't) and an error which stops the dacpac from being built is this setting. You can either edit the .sqlproj file and change the XML, or you can go to the properties of the project and tick the box:

Jowen then correctly said you shouldn't turn off errors willy nilly - personally I always check this and enforce it unless I am working with an "inherited" database where it isn't possible but over time make it possible and then check it. When it is possible, I then work towards clearing the warnings if practical.

Final Answer

The final answer is by Russell:

What has happened in Russell's case is that he has a table in a project that is in a .sql file but the build properties of the file "Build Action" is not set to build so although the file exists and the table is valid it is invisible to SSDT. Set the "Build Action" on the file properties in Visual Studio to "Build" and it will be included and the reference can be validated.

Try for yourself

I have setup a GitHub repo to show this error in all of its glory in case you would like to play along at home.

If you would then get yourself a command prompt that can run git and do:

git clone https://github.com/GoEddie/SQL71502.git

I know I called the project the error number, this is like inception in real life. If you go ahead and open the SQL71502.sln solution when you build you should get a warning and also the reference to INFORMATION_SCHEMA should show a blue squiggly line:

Note about the warning, the warnings are only displayed when a file is compiled, if you build then build again without changing a file it won't get re-compiled so all your warnings will disappear! The warning is still there, add a new line or something to the file to cause a rebuild and the warning to be re-displayed. Because of this feature, I sometimes do a visual studio clean and re-build when using SSDT to get a full overview of the issues.

Let's go hardcore and set the project to treating T-SQL warnings as errors. Go and run:

git checkout 1-errors-as-warnings

You will probably need to reload the solution in Visual Studio and then when you build you will get an error instead of a warning, and the dacpac won't build.

To fix the error, we now need to add the database reference to the master database, follow the advice from the stack overflow question above to add the reference and then when you build you should get a successful build. If you didn't want to add it yourself you could run:

git checkout 2-fix-error

When running git checkout, if you have changed any files you will get an error running "git checkout -- ." will reset git and let you switch branches (it throws away any changes so don't do it unless you want exactly that).

With the database reference added we could rebuild and there should be no error or warning:

Now, if instead of a system reference we had a missing user reference, we would get a similar thing with the same error number. Switch to the next branch:

git checkout 3-broken-user-reference

you will see this error:

We fix it by creating the object that it was expecting:

and then it will build successfully:

Happy days.

I hope no one minds we copying and pasting from stack overflow, I guess it is the modern way :)

Where do you install tSQLt?

  • Posted on: 19 October 2017
  • By: Ed Elliott

The question of where to install tSQLt is probably the most common question I get when I talk about unit testing T-SQL, so much so that I thought that it would be a good topic for a blog. I mention tSQLt in the title, but this covers all unit test code for T-SQL.

So to be more specific:

Where do you install unit tests and unit tests frameworks for SQL Server, which databases should have that code in them?

If we look at what databases might exist in a development process:

  • Local developer database
  • CI database on a build server
  • Test database for testers
  • QA database for user acceptance testing
  • Prod mirror or Pre-Production database for a production standby database
  • Production database

If we have tests written in tSQLt and the tSQLt framework itself then where do you install that and those tests and execute those tests?

TLDR: The answer is quite simple, in these two database types:

  • Local developer database
  • CI database on a build server


The first thing to say is, what are unit tests?

There are two top goals for unit tests:

The first is that unit tests are small pieces of code that validate that other small pieces of code work successfully.

The second goal is that unit tests guard against other developers breaking the actual pieces of code covered by the tests - you might well be the other developer in this case so I would always recommend not getting too worked up about this :).

In the list of databases, these two types are the only database types that are private and the type of databases that can typically be thrown away on a whim.

A local developer database should sit on the developer's machine and allow a developer to deploy and test their code locally without affecting anyone else. I have worked on projects with this setup, and it is easily the best to work with as a developer.

If you aren't able to have your database locally but have to either use a remote database or even worse, a shared database for testing, then you will likely get hit by other peoples breaking changes or other people debugging their code in some way blocking you from working.

When you have a shared database you have to coordinate all sorts of things like the best time to deploy and that itself is a massive time sink.

The second type of database, one for a CI build is typically only used for a single build at a time. Because a SQL Server database is, mostly, free to create on an existing server, there is no need to share CI databases. When you think about the fact that you can use Local DB to quickly and cheaply spin up a new clean, empty, fresh, happy database, why wouldn't you use this unless there was an excellent reason not to?

It is in these private databases that we can do things like deleting all the data in a table, setting all the dates of birth to a developer's favourite date or create a long-running transaction that blocks out all of the other users.

Once we move further right in our list of environments, we are less likely to want to do things such as deleting all the data in a table, and once we hit production, it is critical we don't leave test code like that in the database to be accidentally called.

Tests for a production database should be a series of smoke tests which can operate in and around the existing live production calls and data so they do not take extra transactions than the standard database does and they certainly don't modify data, other than their own.

These smoke tests are also likely to be carried out via the application, so a deploy happens and a set of application tests are typically executed to validate that the deploy has been a success - this is a long way from where we use unit tests to validate and guard code against breaking changes.

Anyway, enjoy!

tSQLt Test Adapter for Visual Studio 2017

  • Posted on: 3 October 2017
  • By: Ed Elliott

So Visual Studio 2017 has been released a while and I had created a version of the test adapter (which lets you run tSQLt tests from within Visual Studio with SSDT, ReadyRoll or just plan sql scripts using the test window but I held off on pushing it to the marketplace as it didn't work with Visual Studio 2015.

I spent quite a lot of time trying to work out how to build a single vsix that worked with VS 2015 and 2017 and in the end gave up and now there are two versions, one for 2015 and one for 2017 - I am not putting any fixes into the 2015 version so I would upgrade to 2017 if you want any updates.

I also fixed a couple of issues and one nice little problem with my .net tSQLt client where if you passed in the name of a non-existant schema or test, the tests would return success:


These now report a failure and handle extra select's in the output from tests. Because the test adapter uses the tSQLt client, that gets the benefit of any fixes I make there (isn't hosting individual repos on github awesome for even sharing your own work!)

If you want to use visual studio test window to run your tSQLt tests and have the aweomsness of this:

Then grab it from inside visual studio "Extensions and "Updates" on the tools menu.

Learn how to unit test SQL Server T-SQL code

  • Posted on: 5 September 2017
  • By: Ed Elliott

A free email course on how to use tSQLt including the technical aspects of writing unit tests AND the art of writing repeatable, using unit tests for even the most complicated T-SQL code

UPDATE: I thought that if I got a certain number by October 1st I would run the course but in two days I had three times the number of people subscribe than my initial target so I have closed the first course, sign up if you want to join the waitlist or the next course!

NEW UPDATE: Enroll in the course here: https://courses.agilesql.club/p/unit-testing-sql-server-code-using-tsqlt

Unit testing helps us to write better code, make rapid changes to our code and has been generally seen as a good idea for about 10 years. Writing tests for T-SQL code is made much easier by using tSQLt but there is quite a high barrier to entry both in terms of the technical skills in getting tSQLt running and also how to approach large code bases of, sometimes, unfriendly T-SQL code and taming the code with unit tests.

I have successfully unit tested T-SQL code in a number of different environments including clean greenfield environments as well as legacy projects and I have written this course to help people get started with unit testing but also help them to turn unit testing into a part of their development process that they can use every day to improve the quality of their work and the speed at which deployments can be made.

Are you any of these people?

  • An application developer experienced with other testing frameworks for testing application code?
  • A T-SQL developer with no testing experience?
  • A -TSQL developer with testing experience in other languages or frameworks?

If you are then you should sign up (https://courses.agilesql.club/p/unit-testing-sql-server-code-using-tsqlt) and let me help you learn tSQLt unit testing for SQL Server.

Why an email course?

I thought it would be an interesting way to provide actionable information regularly and to allow a level of assistance and feedback that I don't think is possible with blogging or writing articles.

How do I sign up?

Run over to: https://courses.agilesql.club/p/unit-testing-sql-server-code-using-tsqlt and pop in your details.

The course is going to start on the 1st of October and as it is the first one I am limiting the amount of people who can start it. If the first one is a success then I will run it again but it won't be until at least 2018. available https://courses.agilesql.club/p/unit-testing-sql-server-code-using-tsqlt

What will be the format?

The course will be one email a week which will include an overview of the weeks topic, some detail into the parts that need it and an excercise for the week which can be done on a demo database or any SQL Server database code you have.

and it is free?

yep, gratis. I am not open sourcing yet - maybe in the future but the course itself is free, aka "no service charge", you can pay if you want to be able to just sit down and do the course but free if you are happy to wait

SSIS ForEach Enumerator File Order

  • Posted on: 4 September 2017
  • By: Ed Elliott

I saw on slack recently a question about how the ssis file enumeraror orders (or more specifically doesn't order) files. I have been thinking about ssis quite a lot lately and whil I am in no hurry to start using it day to day it is quite an interesting tool.

So anyway, I saw this question that went like:

"does anyone know in what order files are processed in 'Foreach File Enumerator'?
I used to think it was alphabetically but after some testing this is not always the case?
Second part is there anyway to specify the order by say size or date?"

So how does SSIS order files or doesn't order files?

The answer to this is pretty simple and I thouhgt I knew the answer but wanted to confirm it. In my mind I thought, "how do they get a directory listing?", and my mind responses "probably using the win32 api's find file etc", my mind then wondered somewhere else before writing a quick package that:

  • 1. Has a ForEach loop and a breakpoint set at pre-execute
  • 2. Has a single task in the ForEach loop and a breakpoint set at pre-execute
  • 3. A variable to hold the file name

Pretty simple, the ssis package looked like:

Pretty simple hey :)

I set the file path for the enumerator to c:\ssisSearch and put a load of files and directories in (because the win32 find functions have a buffer and you need to call it multiple times - I wanted to make sure we covered cases where there were multipl find calls). Then I reached for my favorite tool of all procmon.exe (I say favorite, it used to be then I had a job where I used it literally every single days for hours and hated it so stopped using it but now i'm back with it!). In procmon I set a filter on the c:\cssisSearch folder and also DtsDebugHost.exe and ran my package - the files were returned in alphabetical order.

I then went into procmon and to the properties of the "QueryDirectory" operation on that folder and (when the symbols had loaded) I could see that the call ssis was making was from the ForEachFileEnumerator.dll (native not .net so we can't grab reflector) and that calls "FindFirstFileW".

A quick hop skip and jump to msdn and FindFirstFile discusses the ordering of files here:

The FindFirstFile function opens a search handle and returns information about the first file that the file system finds with a name that matches the specified pattern. This may or may not be the first file or directory that appears in a directory-listing application (such as the dir command) when given the same file name string pattern. This is because FindFirstFile does no sorting of the search results. For additional information, see FindNextFile.

FindNextFile has this:

The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. If the data must be sorted, the application must do the ordering after obtaining all the results.

So basically ntfs is alphabetical, fat date but don't rely on either.

Just a final thought, ssis runs on linux so no idea the order there :)


Site Search with Duck Duck Go