ScriptDom

ScriptDom parsing and NoViableAltExceptions

  • Posted on: 2 March 2017
  • By: Ed Elliott

If you have ever tried to debug a program that used the TSql Script Dom to parse some T-SQL you will know that the process is extremely slow and this is due to the volume of NoViableAltExceptions (and others) that are thrown and then caught. Because these are first chance exceptions they are being handled and it is the way that the script dom interacts with Antlr and the Lexer that they use. When you debug a program what happens is you have two processes, process one is the debuger, this starts (or attaches) to process two, the debugee.

The debugger calls a windows function WaitForDebugEvent typically in a "while(true)" loop (everyone should write a windows debugger at some point in their lives you learn so much, in fact put down ssms and go write your first debugger loop: https://msdn.microsoft.com/en-us/library/windows/desktop/ms681675(v=vs.85).aspx). The debugee app is then run and when something interesting like an exception or a dll is loaded/unloaded the debuggee is paused (i.e. all threads stopped), then WaitForDebugEvent returns and the debugger can look at the child process and either do something or call WaitForDebugEvent again. Even if the debugger doesn't care about the exceptions the debugee is still paused and when you parse T-SQL under a debugger, even if you tell Visual Studio to ignore the exceptions, the debugee is still paused for every exception just so Visual Studio (or your home baked debugger) can decide that it wants to ignore that exception and the debugee is started up again.

What this means for an app that throws lots of first chance exceptions is a constant start, stop, start, stop which is so so painful for performance - it is basically impossible to debug a TSql Script Dom parse on a large project, I typically debug a project with like one table and one proc and hope it gives me everything I need or do other tricks like letting the parsing happen without a debugger attached then attach a debugger at the right point after the parsing has happened but then again I don't have to debug the TSql Lexer's!

So where is this leading?

I was wondering what effect these first chance exceptions had on T-SQL and even in normal operations where we don't have a debugger attached, is there something we can do to speed up the processing?

The first thing I wanted to do was to try to reproduce a NoViableAltException, I kind of thought it would take me a few goes but actually the first statement I wrote caused one:

"select 1;"

This got me curious so I tried just:

"select 1"

Guess what? no NoViableAltException the second time - this didn't look good, should we remove all the semi-colon's from our code (spoiler no!).

Ok so we have a reproducable query that causes a first chance exception, what if we parse this like 1000 times and see the times and then another 1000 times with the semi-colon replaced with a space (so it is the same length)?

Guess what? The processing without the semi-colon took just over half the time of the queries with semi-colons, the average time to process a small query with a semi-colon in took 700ms and the query without the semi-colon took 420ms so much faster but who cares about 300 milli seconds? it is less than 1 second and really won't make much difference in the overall processing time to publish a dacpac.

I thought I would just have one more go at validating a real life(ish) database so I grabbed the world wide importers database and scriptied out the objects and broke it into batches, splitting on GO and either leaving semi-colons or removing all semi-colons - when I had semi-colons in the time it took to process was 620 ms and there were 2403 first chance exceptions. The second run without semi-colons which would likely create invalid sql in some cases - took 550 ms and there were still 1323 first chance exceptions, I think if we could get rid of all teh first chance exceptions the processing would be much faster but ho hum - to handle the first chance exceptions you just need a fast CPU and not to be a process that is being debugged.

Convert select statements in scripts into Inline-TVF's automatically

  • Posted on: 25 November 2015
  • By: Ed Elliott

Update new download location is https://visualstudiogallery.msdn.microsoft.com/435e7238-0e64-4667-8980-5...

In-line TVF's are awesome, they are a great way to reuse T-SQL code without the performance penalties of multi line TVF's or horror of horrors Scalar Functions. Taking select statements and turning them into in-line TVF's is actually really really simple using the TSql ScriptDom so here is another tool for the SSDT Dev Pack that lets you highlight a select statement in SSDT and generate a TVF and replace the statement with a call to the TVF.

It isn't perfect, I totally ignore parameters and if you have a join and only want to pull out one of the statements you will need to do it manually but this will help in other situations and be a good start by creating a TVF using the correct form (returns table as select ...).

So for example if we have this code:

highlight select statement in SSDT

If you choose Tools-->SSDT Dev Pack-->Extract into TVF (or do the right thing and map CTRL+K,CTRL+E) and you are given the following dialog which lets you give the function a name and a location you want it to be created in SSDT:

choose a name and location for the new tvf

Finally, the TVF is created and the original statement is replace with a call to the function (whoop whoop):

the tvf is created and the original statement changed to call the tvf

Hopefully someone finds this useful, I know it is good for me :)

To grab the devpack get it from:

https://the.agilesql.club/Projects/SSDT-Dev-Pack

ScriptDomVisualizer - Now Displays TSqlParserToken's

  • Posted on: 25 November 2015
  • By: Ed Elliott

New ScriptDom visualizer now also shows the tokens that are found in a sql script. It also highlights the part of the query that is covered by the token so it is really easy to see where you are in the script.

Honestly if you are doing anything with the T-SQL Script Dom this is really useful to help reverse engineer t-sql into ScriptDom objects :)

Grab it from:

https://github.com/GoEddie/ScriptDomVisualizer/tree/master/release

Enjoy!

Changing case of T-SQL Keywords

  • Posted on: 24 November 2015
  • By: Ed Elliott

Update new download location is https://visualstudiogallery.msdn.microsoft.com/435e7238-0e64-4667-8980-5...

Bert likes his T-SQL keywords in UPPERCASE, Ernie likes his in lowercase which causes some internal struggle as it is actually one person "Mr Bert Ernie" anyway I digress, whether you are working by yourself or in a team you sometimes want to change the case of all the keywords in some code.

If you are using SSDT you can use my new SSDT-DevPack tool to do it for you:

If you grab the devpac, any T-SQL keywords in the active document will be made UPPERCASE when you do "tools->SSDT Dev Pack-->Upper Case Keywords" and I will leave it up to your imagination as to what happens when you do "tools->SSDT Dev Pack-->Lower Case Keywords".

As always, you can map these to keyboard shortcuts, I use Ctrl+K,L for lowercase and ctrl+k,u for UPPERCASE.

Any problems please shout, grab the installer from:

https://github.com/GoEddie/SSDT-DevPack/blob/master/release/SSDTDevPack....

Enjoy :)

Using the ScriptDom without losing comments

  • Posted on: 24 November 2015
  • By: Ed Elliott

The ScriptDom is cool, it lets you parse T-SQL, play around with the AST (tree of statements) and then generate T-SQL again (probably in a different format etc). This is really cool but if you round-trip the T-SQL into an AST and back you lose some important information like comments so, if you do something like:


var original = @"create procedure [dbo].[get_employee](@employee_id int)
as
--ssssss

select 100;

select * from a a1 where a1.a like '%fff%';
";
var parser = new TSql120Parser(false);

IList errors;
var fragment = parser.Parse(new StringReader(original), out errors);

var outputScript = new StringBuilder();

var generator = new Sql120ScriptGenerator();
generator.GenerateScript(fragment, new StringWriter(outputScript));

What you would end up with is instead of:


create procedure [dbo].[get_employee](@employee_id int)
as
--ssssss

select 100;

select * from a a1 where a1.a like '%fff%';

You get:


CREATE PROCEDURE [dbo].[get_employee]
@employee_id INT
AS
SELECT 100;
SELECT *
FROM a AS a1
WHERE a1.a LIKE '%fff%';

So if you wanted to do something like change all the keywords to upper-case it would be really simple but you would lose the comments which I know some people are quite attached to.

To get around this there are two ways that I can think of, if anyone has another way please shout!

Approach 1 - replace just the parts you want to

When you parse the t-sql you get back a TSqlFragment, all of the ScriptDom statements like CreateProcedureStatement inherit from TSqlFragment (by way of a TSqlStatement and probably a whole load of other things) what this means is that as well as being able to maniplulate the contents of the statement, we get the offset into the start of the string that was parsed and the length of the fragment. So for example if we wanted to change the type of the parameter, we would use the ScriptDom to find the parameter, we would then have the StartOffset and the Fragment length. Using those two values we can do a replace on the sub-string with whatever we want to put in its place.

Approach 2 - modify the TSqlToken on the fly

The second approach is to get the parser to parse the script and create the TSqlFragment but instead of using the AST and modifying the TSqlStatements, you can iterate through all of the tokens and modify the ones you don't like. For instance if you wanted to change:


create procedure abc
as
select 100;

into:


alter procedure abc
as
select 'ABC';

You could simply do a find and replace on "create procedure" but what about cases like "create /*frrrrrrr*/ procedure" or something else entirely, manually parsing it isn't trivial so instead we can use my second approach which is to modify the tokens. In this case to change the create into an alter we can do something like:


bool haveCreate = false;
var output = new StringBuilder();

foreach (var token in fragment.ScriptTokenStream)
{
if (!haveCreate && token.TokenType == TSqlTokenType.Create)
{
var alterToken = new TSqlParserToken(TSqlTokenType.Alter, "alter");
output.Append(alterToken.Text);
haveCreate = true;
continue;
}
output.Append(token.Text);
}
return output.ToString();

So "round-tripping" t-sql via a ScriptDom parser and generator does lose some of the content which may be useful but all is not lost, here are a couple of approaches, if anyone has some more please share!

How to get started with the ScriptDom

  • Posted on: 7 November 2015
  • By: Ed Elliott

What is the ScriptDom?

The ScriptDom is an api for taking t-sql scripts, converting them into an AST or taking an AST and generating t-sql.

What is an AST?

Read this: https://en.wikipedia.org/wiki/Abstract_syntax_tree

Think about this:

select col from tablea
select col from tablea
select col /*from not_table*/ from tablea
select --not_col
col from tablea

and then something more complicated like:


select col from (select a as col from something /*else*/) a

with ab as (select col from (select a as col from something /*else*/) a)
select top 1 * from ab

Now in all cases how do you parse the t-sql and pull out the name of the table that is being read from? not so easy is it - fun obviously but not straight forward.

If we want to reliably find the name of the table then we can use the ScriptDom to parse the text and retrieve a list of statements and each type of statement is typed so we know that instead of some text we have a series of select statements and if we want to know what the table is we can say "hey, on the SelectStatement what is the FromClause, oh it is a List of NamedTableReferences which have a SchemaObjectName which has a 4 part identifier but in this case we just have the last part the table name" - can you see how that is better than regex's or building your own lexer & parser.

So is it for me?

If you need to parse t-sql, modify it and spit out t-sql again then I would say yes.

How do I get started?

To take t-sql text and generate an object you can do something with have a look at you will need to use one of the parsers that implement TSqlParser which corresponds to the version of t-sql that you are parsing and it returns a TSqlFragment which is where the fun really begins.

Lets look at the example above, we want to get the table name so we create a parser and get a TSqlFragment that we can explore:

parse some text

The "false" that I passed into TSqlParser120 tells it whether you have quoted identifiers on or not, this and the implicit version that we select by choosing the specific parser is the only thing we need to include.

What we get back is a TSqlFragment which doesn't actually tell us that much, what we need to do is to pass it an object that inherits from TSqlConcreteFragmentVisitor.

TSqlConcreteFragmentVisitor

So this is an interesting thing, what happens is you inherit from it and override one of a whole load of methods. There are 2 methods for each type of statement the ScriptDom understands (SelectStatement, Procedure, StringLiteral, MergeActionClause etc). If we wanted to have a class that lets us retrieve the select statement from the string above we need to create a class that looks something like this:

class SelectVisitor : TSqlConcreteFragmentVisitor
{
public override void Visit(SelectStatement node)
{

}

}

when you create your visitor there are two methods for each statement so you also have ExplicitVisit(SelectStatement node) - explicit visit lets you decide whether or not to bass the base method so you can control whether child objects are parsed or not. In this case and every case I have personally seen, using Vist rather than ExplicitVisit has been the right choice but I guess it might speed up parsing in some cases?

So parsing t-sql into something you can use becomes a sort of two phase thing, firstly you create the parser and get it to create the TSqlFragment which you then pass a visitor that does something with each statement, in this case all we want our visitor to do is to expose the SelectStatement so I will create a public List and add each one to that list which I can then examine when the whole TSqlFragment has been enumerated (visited):

class SelectVisitor : TSqlConcreteFragmentVisitor
{
public readonly List SelectStatements = new List();

public override void Visit(SelectStatement node)
{
SelectStatements.Add(node);
}

}

To get the visit method to be called, I create an instance of the SelectVisitor and pass it to TSqlFragment.Accept:

var visitor = new SelectVisitor();
fragment.Accept(visitor);

After the line "fragmment.Accept(visitor)" completes the visitor has added any select statements in the TSqlFragment to the list of statements so we can just do:


var selectStatement = visitor.SelectStatements.FirstOrDefault();

Now we have a SelectStatement to find the table we first need to look at the property on SelectStatement called QueryExpression which is of type QuerySpecification but what we get in QueryExpression is a the base type QueryExpression which doesn't give us what we want. This is extremely common with the ScriptDom and in many ways is one of the harder things about creating or modifying objects, knowing what can and can't be used for each property.

As an aside to help with this, what I do is use my ScriptDom Visualizer to parse some t-sql and show the actual object types which really helps.

So I do a cast of the QueryExpression to QuerySpecification (you will need to add your own defensive code here) and then we can grab the FromClause which in our case is a list of 1 NamedTableReferene:


var selectStatement = visitor.SelectStatements.FirstOrDefault();
var specification = (selectStatement.QueryExpression) as QuerySpecification;

var tableReference = specification.FromClause.TableReferences.FirstOrDefault() as NamedTableReference;

The full listing is:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Microsoft.SqlServer.TransactSql.ScriptDom;

namespace Table
{
class TableNameFinder
{
static void Main(string[] args)
{
var query = "select col /*from not_table*/ from tablea";
var parser = new TSql120Parser(false);

IList errors;
var fragment = parser.Parse(new StringReader(query), out errors);

var visitor = new SelectVisitor();
fragment.Accept(visitor);

var selectStatement = visitor.SelectStatements.FirstOrDefault();
var specification = (selectStatement.QueryExpression) as QuerySpecification;

var tableReference = specification.FromClause.TableReferences.FirstOrDefault() as NamedTableReference;
Console.WriteLine(tableReference.SchemaObject.BaseIdentifier.Value);

}
}

class SelectVisitor : TSqlConcreteFragmentVisitor
{
public readonly List SelectStatements = new List();

public override void Visit(SelectStatement node)
{
SelectStatements.Add(node);
}

}

}

One thing I find is that I keep expecting the ScriptDom to not have an object or be able to do something like parse table hints or options or something that isn't used very often (merge statements!) but it seems to be pretty complete.

Finally

Writing your own parsers and generators for t-sql is a fun thing to do but do it as a hobby for any code that matters use the ScriptDom - there are a few annoyances and a few things to get your head around but it is the right thing to do :)

SQL Server Edition aware SSDT deployment scripts

  • Posted on: 5 November 2015
  • By: Ed Elliott

Another day another deployment contributor for the project: https://github.com/DacFxDeploymentContributors/Contributors.

This one came about because I was talking to Peter Schott (b|t) over at the deployment contributor gitter room and he has dacpac he wants to deploy to different editions of sql server and enable different options when deploying to enterprise edition.

One of the things I found out is that from a contributor, although you have the source and destination models you do not have access to any information about the server other than the version. To get around this you can either pass in an arg to say which edition the compare is running against or if you leave that off then the create index statement is duplicated and wrapped on an check against @@version to determine the edition which is more t-sql than I would like but should work.

One other thing I have realised is that I have by default used the generators for the 120 version of SQL but in the contributors I could get the version from the context and use the correct one - maybe a task for the next one.

Finally I have changed the message box at the beginning to a Debuger.Launch which works much better - thanks to Ken (blog)

After a random twitter conversation I have a great new contributor to write tomorrow - I can't wait :)

Deploy SSDT INSERTS in Batches

  • Posted on: 4 November 2015
  • By: Ed Elliott

I have added my first actual deployment contributor to my deployment contributor project:

https://github.com/DacFxDeploymentContributors/Contributors

The idea is basically for people to share ideas about writing deployment contributors for SSDT and help each other. If you don't know what a deployment contributor is then you are missing a real treat. A deployment contributor lets you modify the t-sql that SSDT creates to upgrade a database from one version to the next.

What is the contributor?

In SSDT sometimes you get a table migration which is where you want to do something like add a column in the middle of a table, instead of SSDT just adding it at the end it creates a new table, inserts the data from the old table then drops the old table and renames the new table to match the old ones name.

I thought it would be interesting to write a contributor that searched for these table migrations and changed the insert portion into a batched insert that does the insert in batches so your transaction log growth is manageable on very large tables.

Why bother?

I think there is a use for this, I do not think that it is something that I would lose any sleep over if I did not have, it was more to see how to do it and give an example of writing a deployment contributor that did something quite advanced and show a few things like how I set up integration tests to make testing easy as unit / integration testing deployment contributors is quite hard if you are new to it. I do not know if what I have done is the best way but it works for me but I would love to hear suggestions on how to improve things so please either send some pointers or a pull request.

Solution overview

The first thing is to look at the solution in visual Studio, as you can see from the screen shot I have three projects:

AgileSqlClub.BatchedTableMigration - C# Dll Project
TestDacpacDeploy - SSDT Project, this will create the dacpac I will deploy from
IntegrationTests - C# Dll Project with NUnit via nuget

3 projects, 2 c# dlll, 1 ssdt

IntegrationTests

I'll start with the integration tests as these are probably the most useful part of this whole thing. I have created a folder called "sqlpackage" and added the dlls and exe needed to run sqlpackage.exe. This at first seems like a bit of an odd thing to do but it makes testing easier, I add the files and set the msbuild property "Copy to Output Directory" to "Copy if newer" - this means in my tests I can copy the deployment contributor into .\sqlpackage and then run .\sqlpackage\sqlpackage.exe and know that my dll will be in the right place to be loaded for the tests.

If you don't do this you will need to copy your dll into the program files directory which is bad because of a) permissions and b) you can only run a single build at a time (without using something like https://turbo.net!).

If you do this in your integration tests your life will be easier writing deployment contributors :)

I have also added two helper classes DacpacDeploy and Database. The Database class connects to a database and sets up the schema as I want it, it feels odd writing deploy scripts but it is only small. If I don't do this when I run sqlpackage.exe to do the compare I won't know what state it is in and might not get the table migration I need.

DacDeploy basically copies the dll that is built by the main c# project into the sqlpackage folder and runs sqlpackage.exe with the args needed to load the contributor and also passes the batch size to use as a parameter.

Finally I have the test itself which is pretty simple it uses DacpacDeploy to create a deployment script and then it checks to see whether the script has the new table migration script in it.

I should really add integration tests that actually run the deployment script and validate the data but it is just gone midnight so I think that is enough for now :)

AgileSqlClub.BatchedTableMigration

This is the actual contributor and there are a couple of things that help, in "DeploymentFilter.cs" you need to have an attribute that declares that you have a deployment contributor:


[ExportDeploymentPlanModifier(Name, Version)]
public class DeploymentFilter : DeploymentPlanModifier
{
private const string Name = "AgileSqlClub.BatchedTableMigration";
private const string Version = "0.1.0.0";
...

I use fields for the name and version as there is a way to output messages from your contributor and I like to make it clear where the message has come from so I re-use these fields for that.

The next part is the method that is actually called when a deployment plan has been generated which we can parse and modify if we like (we like):


protected override void OnExecute(DeploymentPlanContributorContext context)
{
if (context.Arguments.ContainsKey("EasyDebugBatchedTableMigration"))
{
MessageBox.Show("Breaking to let you attach a debugger");
}
....

This is really useful for debugging contributors, I have a parameter you can pass in which just shows a message box. You can run sqlpackage.exe and set this message box and when it appears attach visual studio, set your breakpoints and then press OK and you can step through your code. There are other ways to get to step into your code when running outside of a process you control but this always seems the simplest to me.

The "context" argument that is passed in is really useful it gives you access to all the options that were set for the comparison as well as the source and target models and any arguments that were passed in.

I then iterate through all the steps in the plan. The steps are typed as DeploymentStep but typically are other types that inherit from DeploymentStep so to find a table migration I just check whether the step is a SqlTableMigrationStep:


var next = context.PlanHandle.Head;
while (next != null)
{
var current = next;
next = current.Next;

if (current is SqlTableMigrationStep)
{
var batched = new MigrationStepBatcher(current as SqlTableMigrationStep, rowCount).BatchStep();

Remove(context.PlanHandle, current);
AddBefore(context.PlanHandle, next, batched);
}
}

The MigrationStepBatcher is what actually kicks off the process and Remove takes away the original table migration and AddBefore adds my new one - these methods are provided to us by inheriting from "DeploymentPlanModifier".

The MigrationStepBatcher creates a DeploymentStep of our own called a BatchedSqlTableMigrationStep, to implement your own step you just need to inherit from DeploymentStep and override GenerateTSQL() which returns a list of strings which are each batch in your step. A step can contain one or more batches which are automatically separated for you using GO commands.

When we have a table migration to change into a batch there is a whole load of information we need like the table names, the column definitions etc so I ask the original table migration for it's t-sql by calling GenerateTSQL on it and then parsing that with the ScriptDom - I then also use the ScriptDom to create the output t-sql.

Using the ScriptDom isn't straight forward, it is quite complicated to know exactly what you need to do when so to help I wrote the ScriptDom visualizer which you give it some t-sql and it builds a tree of exactly what types of object the ScriptDom expects are which point and what properties you need to set - if you do anything with the ScriptDom try using this as it saved me a lot of time. If you do write stuff for the ScriptDom and you have something better, please share it!

The actual changes

The idea of this is to take this code that is generated by default in SSDT:


INSERT INTO [dbo].[tmp_ms_xx_ForcedTableMigration]
SELECT [count]
FROM [dbo].[ForcedTableMigration]

into:

BEGIN
WHILE (SELECT count(*)
FROM [dbo].[ForcedTableMigration]) > 0
BEGIN
WITH to_delete
AS (SELECT TOP 1480 [count]
FROM [dbo].[ForcedTableMigration])
DELETE to_delete
OUTPUT deleted.* INTO [dbo].[tmp_ms_xx_ForcedTableMigration] ([count]);
END

Why not just use regex and a string builder?

You could certainly do that but if you have ever tried doing it that way you will know it is brittle. Do this sort of thing with the ScriptDom, it is more work up front but you can make changes easily and safely and the more people use and share stuff built using the ScriptDom the easier it will get for everyone.

The interesting thing here is the use of the OUTPUT INTO clause, without that I would have only really been able to do it to tables with a unique key as I could select a certain amount of rows but wouldn't have been able to guarantee that I delete those same rows. Using DELETE and OUTPUT INTO means all I need is the column definitions - also deleting from a CTE deletes from the underlying tables, every day is a school day and one I was very pleased to attend today.

Finally

To use this contributor, build the project, take the dll and put it in your sqlpackage folder or wherever you need to for the tool you use to deploy and add these arguments:

/p:AdditionalDeploymentContributors=AgileSqlClub.BatchedTableMigration /p:AdditionalDeploymentContributorArguments=BatchedTableMigrationBatchSize=1480;

1480 is the batch size, you will probably want to make this bigger.

Enjoy, as always any questions please ask away :)

ScriptDom Visualizer

  • Posted on: 3 November 2015
  • By: Ed Elliott

Writing code to generate TSql using the ScriptDom is hard. There is a seemingly bewildering array of objects you can use and where you use different types is unclear, I think this is due to the complexity of the task, it really isn't simple and dealing with t-sql gets complicated very quickly - to be clear the ScriptDom is a great tool and although is difficult to use is 1 million times better than writing custom parsers / generators for t-sql.

To help I have written the ScriptDom visualizer which parses some t-sql and produces a tree of the different objects and properties that the ScriptDom needs for you to generate the same query via the ScriptDom rather than via text.

Why bother?

The ScriptDom is cool as it lets us generate valid t-sql as well as parsing t-sql, modifying and re-generating it so we can start to build our own tools to help sql developers - wahoo.

If you have no need to use the ScriptDom then this is a useless tool, if you need to interact with the ScriptDom in any way and you need to know what properties and types to use then this will be useful for you.

I have written it for me to use but am sure others will find it useful, it is as always on github, if you want to change anything just submit a pull request.

enjoy.

Look ma, pretty colours

script dom visualizer, pretty

Grab it from:

https://github.com/GoEddie/ScriptDomVisualizer/tree/master/release

Pages