Using the ScriptDom without losing comments

The ScriptDom is cool, it lets you parse T-SQL, play around with the AST (tree of statements) and then generate T-SQL again (probably in a different format etc). This is really cool but if you round-trip the T-SQL into an AST and back you lose some important information like comments so, if you do something like:

var original = @"create procedure [dbo].[get_employee](@employee_id int) as --ssssss

select 100;

select * from a a1 where a1.a like ‘%fff%’;
“;
var parser = new TSql120Parser(false);

IList errors;
var fragment = parser.Parse(new StringReader(original), out errors);

var outputScript = new StringBuilder();

var generator = new Sql120ScriptGenerator();
generator.GenerateScript(fragment, new StringWriter(outputScript));

What you would end up with is instead of:

create procedure [dbo].[get_employee](@employee_id int) as --ssssss

select 100;

select * from a a1 where a1.a like ‘%fff%’;

You get:

CREATE PROCEDURE [dbo].[get_employee] @employee_id INT AS SELECT 100; SELECT * FROM a AS a1 WHERE a1.a LIKE '%fff%';

So if you wanted to do something like change all the keywords to upper-case it would be really simple but you would lose the comments which I know some people are quite attached to.

To get around this there are two ways that I can think of, if anyone has another way please shout!

Approach 1 - replace just the parts you want to

When you parse the t-sql you get back a TSqlFragment, all of the ScriptDom statements like CreateProcedureStatement inherit from TSqlFragment (by way of a TSqlStatement and probably a whole load of other things) what this means is that as well as being able to maniplulate the contents of the statement, we get the offset into the start of the string that was parsed and the length of the fragment. So for example if we wanted to change the type of the parameter, we would use the ScriptDom to find the parameter, we would then have the StartOffset and the Fragment length. Using those two values we can do a replace on the sub-string with whatever we want to put in its place.

Approach 2 - modify the TSqlToken on the fly

The second approach is to get the parser to parse the script and create the TSqlFragment but instead of using the AST and modifying the TSqlStatements, you can iterate through all of the tokens and modify the ones you don’t like. For instance if you wanted to change:

create procedure abc as select 100;

into:

alter procedure abc as select 'ABC';

You could simply do a find and replace on “create procedure” but what about cases like “create /*frrrrrrr*/ procedure” or something else entirely, manually parsing it isn’t trivial so instead we can use my second approach which is to modify the tokens. In this case to change the create into an alter we can do something like:

bool haveCreate = false; var output = new StringBuilder();

foreach (var token in fragment.ScriptTokenStream)
{
if (!haveCreate && token.TokenType == TSqlTokenType.Create)
{
var alterToken = new TSqlParserToken(TSqlTokenType.Alter, “alter”);
output.Append(alterToken.Text);
haveCreate = true;
continue;
}
output.Append(token.Text);
}
return output.ToString();

So “round-tripping” t-sql via a ScriptDom parser and generator does lose some of the content which may be useful but all is not lost, here are a couple of approaches, if anyone has some more please share!