How to load test a SQL database

How do you know whether a database code change is going to be great, okay or awful? If you want to test a new version of SQL Server how do you know whether to leave on or disable the legacy cardinality estimator? Will the latest cumulative update destroy the performance of your application?

You don’t have to wait until production to know if the change you are going to do works or not and building a system to test the performance of your database is probably more straightforward than you realize.

TL;DR

Testing database performance is hard and takes a great deal of work to probably not even do particularly well. Instead of thinking about how you can load test a database think about how you can drive the application.

For instance, if you have a web app then use JMeter to simulate load. If you have a “fat app”, then you might need to write some code to call specific workflows through the application.

Approaches to load testing

There are two distinct approaches to load testing, and the first is to record some production queries and replay them or to drive the application. First, we will look at recording queries and its close friend generating fake queries to run through a tool.

Recording Queries and Replaying

On paper, this seems like a great plan, record a load of queries using either profiler or extended events and then later replay the queries and we will see whether the queries get faster, slower, or stay the same.

The problem with this approach is that it is highly likely that the database code cannot support this very well.

If you take a fake process, for example, creating a new user. If the application makes these database calls:

user.CreateUser(@FirstName = 'ed',  @LastName = 'elliott', @DoB = '1980-04-01')

CreateUser then returns a user id of 100, and the flow continues:

user.UpdateUser(@UserId = 100, @LastLogin = '2019-08-05 12:21:14.000')

Because the UserId is returned from one call and used in the next, UserId has to be the exact UserId that was returned by the first call. If, for example, CreateUser has returned 10 then the second UpdateUser call would have behaved differently, probably the load on the server would be less than in production.

We could think about working around this problem by backing up the database, recording traffic and restoring to that known point before we replay the traffic.

Restoring to a known place would work great if we only ever wanted to replay the exact traffic, but we often want to add on some multiplier to load testing so that we know that we have 2 times the traffic we will be fine.

We also need to be careful that we only ever record and replay the traffic in the same order, imagine we record these two concurrent set of queries:

--thread 1
CreateUser(@FirstName = 'ed',  @LastName = 'elliott', @DoB = '1980-04-01');

--thread 2
CreateUser(@FirstName = 'bert',  @LastName = 'bumblebee, @DoB = '2019-04-01');

Then we have a job that deletes everyone under a certain age:

DELETE from [user] WHERE DoB < '2001-04-01'

So, in this case, the second user doesn’t exist, but because when we replay the traffic, we are load testing, so we use multiple threads or computers to send the queries “bert bumblebee”, and the remaining queries for the older user no longer work.

Being able to say that every single call to the database is entirely deterministic is hard and probably doesn’t make for an exciting database application.

Another common pattern is to do something like:

AddToQueue @message = 'a really cool thing to do', @QueueTime = '1923-02-01 23:23:11.09'

Then have something that finds everything queued in the last hour and do something but unless you also set the clock of the server back to the time of the backup, which notably causes problems with Kerberos authentication, it is again a dark path to tread.

Then there are things completely out of your control, sometimes when we record the queries we don’t get the actual content of the query but something else, sp_cursor or sp_prepare.

API Cursors (no not the bad cursors)

Some applications make use of what is called API cursors to execute code, they are typically applications that use older frameworks, but there are still a lot out there today. These cursors aren’t the cursors you should tend to avoid with SQL Server, but API cursors and what you see in the recorded traffic are queries like sp_cursoropen, sp_cursorfetch 180060112,32. What happens is the call to sp_cursoropen includes the query to execute and then sp_cursorfetch executes the actual query passing in the handle that returned from sp_cursoropen and any parameters.

If you try replaying this traffic, it is almost guaranteed to fail in hilariously unpredictable ways.

sp_prepare (unless it it) sp_prepexec

sp_prepare is similar to sp_cursoropen in that you prepare a query to call later, the result of sp_prepare is a handle that can re-use later on. There is one slight variation on this is if the application uses sp_prepexec it creates a handle but also executes the query at the same time so this could potentially work if the application doesn’t go on later to use sp_exec on the handle returned by sp_prepexec.

The dreaded MARS

Of course, you might be lucky and not see any prepared statements or API cursors, but then you might come across something ugly and horrible like the dreaded MARS. MARS is a way to run multiple queries over the same connection, and I honestly am at a complete loss to understand how the hell this was ever included in SQL Server. With MARS you can do this:

Run query (don’t fetch results)
Run another query (do or don’t fetch results)
Fetch results from query 1 or 2 if you didn’t already fetch them

If you have this, then I don’t know of any replay tool that knows how to replicate this scenario.

Other issues

I am sure there are other issues with replaying traffic. I have only ever tried and failed on all but the most specific applications where we could do things like deal with generated id’s. Where I have been successful, it had to be very specific, or we had to test only specific parts of the application which sort of ruins the point of a real-life load test where you want to prove performance before you get to production.

Web Applications

If you have a web application then use something like JMeter, you will hit issues like session cookies being set incorrectly but JMeter can easily handle this, and you can do things like ramp up the number of simultaneous users and do some excellent load testing.

You don’t need to have the database in a known state although you might decide to, you don’t need to worry about id’s returned because the web app will naturally return enough information to JMeter that it can make successful calls later on.

Fat App

If you have an application where users click buttons and type things into boxes, a .net WinForms application or WPF application, then there are two paths you can go down.

The first path is that you drive the UI somehow - look at WinAppDriver https://github.com/Microsoft/WinAppDriver for example. I have had some success with versions of this. It is worth investigating but even if you do make this work you typically end up using one physical (or virtual!) client to replicate one real-world client so scaling this is hard.

The second approach is to write a console app or something that lets you call the critical processes in the application, possibly in a loop that you can control by passing arguments to the app. To scale these, you just run more consoles and more loops. Writing code is, almost always, what I recommend for load testing databases with fat desktop applications.

FIN

To load test the database, forget about the database and concentrate on driving the application somehow - that is where you will realize the most gains.

P.S.

If you are going to do load testing then here are some common things that people do wrong, get it right :)

Different hardware spec to production (use like for like, spend some money if you care about performance. If you don’t care about performance then you don’t need to load test as it doesn’t matter)
Different software versions (especially SQL Server), use “standard” in prod? Use standard in test and not “developer” as you’ll be getting all the enterprise features that won’t be there in “standard” (you don’t have to pay for a license if it is for dev)
Realistic data volumes and patterns - figure out how to have a good data set to start with that is realistic, too many things like statistics cause problems that you might not see
Collapsing the environment, if the prod environment is three tiers on three boxes or in different environments, replicate that correctly - don’t save money by having fewer boxes as you are not going to be replicating network latency and your tests are not accurate.
Concurrency - make sure you replicate the same concurrency as in production or you will not be testing things like SQL locking. These things take time to get right, but if performance is important, then you must do it.
Testing the hardware is different to testing the application, you can use things like sqliosim to get some great stats about the hardware
When you run tests also have montitoring in place, see what happens and then prove what you think happened, actually happened.
When you look at your monitoring have a theory in place, use the monitoring to see if your theory is correct or not.