How to get started with the ScriptDom
What is the ScriptDom?
The ScriptDom is an api for taking t-sql scripts, converting them into an AST or taking an AST and generating t-sql.
What is an AST?
Read this: https://en.wikipedia.org/wiki/Abstract_syntax_tree
Think about this:
select col from tablea
select col from tablea
select col /*from not_table*/ from tablea
select –not_col
col from tablea
and then something more complicated like:
select col from (select a as col from something /*else*/) a
with ab as (select col from (select a as col from something /*else*/) a)
select top 1 * from ab
Now in all cases how do you parse the t-sql and pull out the name of the table that is being read from? not so easy is it - fun obviously but not straight forward.
If we want to reliably find the name of the table then we can use the ScriptDom to parse the text and retrieve a list of statements and each type of statement is typed so we know that instead of some text we have a series of select statements and if we want to know what the table is we can say “hey, on the SelectStatement what is the FromClause, oh it is a List of NamedTableReferences which have a SchemaObjectName which has a 4 part identifier but in this case we just have the last part the table name” - can you see how that is better than regex’s or building your own lexer & parser.
So is it for me?
If you need to parse t-sql, modify it and spit out t-sql again then I would say yes.
How do I get started?
To take t-sql text and generate an object you can do something with have a look at you will need to use one of the parsers that implement TSqlParser which corresponds to the version of t-sql that you are parsing and it returns a TSqlFragment which is where the fun really begins.
Lets look at the example above, we want to get the table name so we create a parser and get a TSqlFragment that we can explore:
The “false” that I passed into TSqlParser120 tells it whether you have quoted identifiers on or not, this and the implicit version that we select by choosing the specific parser is the only thing we need to include.
What we get back is a TSqlFragment which doesn’t actually tell us that much, what we need to do is to pass it an object that inherits from TSqlConcreteFragmentVisitor.
TSqlConcreteFragmentVisitor
So this is an interesting thing, what happens is you inherit from it and override one of a whole load of methods. There are 2 methods for each type of statement the ScriptDom understands (SelectStatement, Procedure, StringLiteral, MergeActionClause etc). If we wanted to have a class that lets us retrieve the select statement from the string above we need to create a class that looks something like this:
class SelectVisitor : TSqlConcreteFragmentVisitor
{
public override void Visit(SelectStatement node)
{
}
}
when you create your visitor there are two methods for each statement so you also have ExplicitVisit(SelectStatement node) - explicit visit lets you decide whether or not to bass the base method so you can control whether child objects are parsed or not. In this case and every case I have personally seen, using Vist rather than ExplicitVisit has been the right choice but I guess it might speed up parsing in some cases?
So parsing t-sql into something you can use becomes a sort of two phase thing, firstly you create the parser and get it to create the TSqlFragment which you then pass a visitor that does something with each statement, in this case all we want our visitor to do is to expose the SelectStatement so I will create a public List and add each one to that list which I can then examine when the whole TSqlFragment has been enumerated (visited):
class SelectVisitor : TSqlConcreteFragmentVisitor
{
public readonly List SelectStatements = new List();
public override void Visit(SelectStatement node)
{
SelectStatements.Add(node);
}
}
To get the visit method to be called, I create an instance of the SelectVisitor and pass it to TSqlFragment.Accept:
var visitor = new SelectVisitor();
fragment.Accept(visitor);
After the line “fragmment.Accept(visitor)” completes the visitor has added any select statements in the TSqlFragment to the list of statements so we can just do:
var selectStatement = visitor.SelectStatements.FirstOrDefault();
Now we have a SelectStatement to find the table we first need to look at the property on SelectStatement called QueryExpression which is of type QuerySpecification but what we get in QueryExpression is a the base type QueryExpression which doesn’t give us what we want. This is extremely common with the ScriptDom and in many ways is one of the harder things about creating or modifying objects, knowing what can and can’t be used for each property.
As an aside to help with this, what I do is use my ScriptDom Visualizer to parse some t-sql and show the actual object types which really helps.
So I do a cast of the QueryExpression to QuerySpecification (you will need to add your own defensive code here) and then we can grab the FromClause which in our case is a list of 1 NamedTableReferene:
var selectStatement = visitor.SelectStatements.FirstOrDefault(); var specification = (selectStatement.QueryExpression) as QuerySpecification;
var tableReference = specification.FromClause.TableReferences.FirstOrDefault() as NamedTableReference;
The full listing is:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using Microsoft.SqlServer.TransactSql.ScriptDom;
namespace Table
{
class TableNameFinder
{
static void Main(string[] args)
{
var query = “select col /*from not_table*/ from tablea”;
var parser = new TSql120Parser(false);
IList errors;
var fragment = parser.Parse(new StringReader(query), out errors);
var visitor = new SelectVisitor();
fragment.Accept(visitor);
var selectStatement = visitor.SelectStatements.FirstOrDefault();
var specification = (selectStatement.QueryExpression) as QuerySpecification;
var tableReference = specification.FromClause.TableReferences.FirstOrDefault() as NamedTableReference;
Console.WriteLine(tableReference.SchemaObject.BaseIdentifier.Value);
}
}
class SelectVisitor : TSqlConcreteFragmentVisitor
{
public readonly List SelectStatements = new List();
public override void Visit(SelectStatement node)
{
SelectStatements.Add(node);
}
}
}
One thing I find is that I keep expecting the ScriptDom to not have an object or be able to do something like parse table hints or options or something that isn’t used very often (merge statements!) but it seems to be pretty complete.
Finally
Writing your own parsers and generators for t-sql is a fun thing to do but do it as a hobby for any code that matters use the ScriptDom - there are a few annoyances and a few things to get your head around but it is the right thing to do :)