If you were to ask someone why they’re using the NOLOCK hint in their queries what is the usual response?
I know I’ve heard it justified a few different ways saying it’s a Turbo button or SQL Rocket Fuel, but it usually comes down to the fact that the NOLOCK hint will make SQL queries go faster. Now, I’ll give people that and I’m not writing this to fight it because it is true; if you use the NOLOCK hint your queries will more than likely run faster
However, in the same manner, I’m fairly sure that I could probably make my car go faster by tearing off the doors, ripping out the seats, and getting rid of all possible, what I perceive as, excess weight. I might succeed in making my car faster from 0 – 60 but I’m not going to jump in that car to drive to work in the morning.
But why wouldn’t I drive my modified car to work each morning? Well, I’m no car guy by any means but even I know that by getting rid of those objects I perceived as ‘excess’ weight I’ve also now gotten rid of some safety features that I’ve become accustomed too. There will now be side effects because of those modifications when driving, like potentially exiting the car unexpectedly on a hard left turn, that I need to think about before jumping behind the wheel.
Well we have to give similar thought to the use of the NOLOCK hint because we’re ultimately telling SQL by using it that we don’t care about data consistency and it can throw away any safety features it sees fit. We only care about speed. And with the throwing away of these safety features we now have some possible nasty side effects that can occur with the data being returned.
You’re probably thinking: “Yup, I know what you’re going to say: Dirty Reads and Phantom Reads. I’ve heard and read all about it before but for my queries I’m not impacted.” You might be running a query that only cares about the state of the data 5 minutes ago. Dirty or Phantoms reads have no impact because you don’t care about the present, you’re perfectly safe to use the NOLOCK hint.
However, are even these queries still completely safe? Well one possible side effect of the NOLOCK hint that isn’t very well known is that it can make a query return duplicate records or miss records entirely. And this has nothing to do with locks, no modification needs to be happening to the records for this to occur.
Now you might be thinking: “Ah, that won’t really happen will it? That side effect is just DBA folklore that’s told to scare developers. That can’t even happen!”
Well, I’ve pasted three scripts below where we will actually make this happen.
The result is kind of sporadic time wise but it will happen at some point after running. All that needs to be done is running ‘Script 1 Setup’ first to get things ready. Then open another connection and run ‘Script 2’ Query which will just run a query in a loop breaking if it returns unexpected results. Finally opening a third connection to run ‘Script 3 Workload’ will generate a workload that causes a lot of index fragmentation/Page Splits and will ultimately cause Script 2 to break.
What happened here? How did we miss this data?
This couldn’t have had anything to do with bypassing locking as I made sure the data we reported on wasn’t being modified.
Well in my second post in this series I’ll take a very high level look into some SQL internals and the impact the NOLOCK hint has on them to figure out what happened.
Script 1 Setup
<br />-- Create a database to use USE [master]; GO IF DATABASEPROPERTYEX (N'NOLOCK_Test', N'Version') > 0 BEGIN ALTER DATABASE [NOLOCK_Test] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [NOLOCK_Test]; END GO -- Create the database to use CREATE DATABASE [NOLOCK_Test] ON PRIMARY ( NAME = N'NOLOCK_Test', FILENAME = N'D:\Data\NOLOCK_Test.mdf') LOG ON ( NAME = N'NOLOCK_Test_log', FILENAME = N'D:\Log\NOLOCK_Test_log.ldf', SIZE = 5MB, FILEGROWTH = 1MB); GO USE [NOLOCK_Test] -- Creating a table to hold our data -- Using the UNIQUEIDENTIIFIER to make sure we get some good Index fragmentation CREATE TABLE dbo.CallDetails ( CALLID UNIQUEIDENTIFIER PRIMARY KEY DEFAULT ( NEWID() ), AGENTID INT, CALLTIME INT, CALLDETAILS VARCHAR(1000) ) GO -- Inserting data to create some fragmentation INSERT INTO dbo.CallDetails (AGENTID, CALLTIME, CALLDETAILS) SELECT AGENTID, CALLTIME, CALLDETAILS FROM ( SELECT CAST(RAND() * 1000 AS INT) AS AGENTID, 10 AS CALLTIME, REPLICATE('CallGoesOn', 100) AS CALLDETAILS ) AS DETAILS WHERE AGENTID <> 123 GO 100 -- Checking data SELECT * FROM dbo.CallDetails -- Creating the Agent we'll run our report against. -- We want to make sure they have 500 minutes of CALLTIME INSERT INTO dbo.CallDetails ( AGENTID, CALLTIME, CALLDETAILS ) VALUES (123,25, REPLICATE('CallGoesOn', 100)) GO 20 -- Checking data again, no tricks up our sleeve SELECT * FROM dbo.CallDetails -- Checking our Agent has 500 minutes of CALLTIME for our billing report SELECT AGENTID, SUM(CALLTIME) AS TotalBillableTime FROM dbo.CallDetails WHERE AGENTID = 123 GROUP BY AGENTID
Script 2 Query
<br />DECLARE @BillableTime INT USE [NOLOCK_Test] WHILE 1 = 1 BEGIN SET @BillableTime = ( SELECT SUM(CALLTIME) FROM dbo.CallDetails WITH(NOLOCK) WHERE AGENTID = 123) PRINT 'Total Billable Time is: ' + CAST(@BillableTime AS NVARCHAR(10)) IF @BillableTime <> 500 BEGIN SELECT @BillableTime BREAK END END -- After this breaks run the query below to see the actual 500 minute value /* SELECT AGENTID, SUM(CALLTIME) AS TotalBillableTime FROM dbo.CallDetails WHERE AGENTID = 123 GROUP BY AGENTID */
Script 3 Workload
<br />USE [NOLOCK_Test] -- We're now going to randomly generate data for other agents -- to cause more index fragmentation WHILE 1 = 1 BEGIN INSERT INTO dbo.CallDetails (AGENTID, CALLTIME, CALLDETAILS) SELECT AGENTID, CALLTIME, CALLDETAILS FROM ( SELECT CAST(RAND() * 1000 AS INT) AS AGENTID, 10 AS CALLTIME, REPLICATE('CallGoesOn', 100) AS CALLDETAILS ) AS DETAILS WHERE AGENTID <> 123 END