Matt Velic

Data and Design

Looping Transactions

| 4 Comments

Note: after reading this post, please read the follow up where I correct a misstep by removing depreciated code.

The database that I look after has a fair amount of inherent problems. It’s virtualized, but choked for resources. It runs SQL Server, and it also hosts the IIS for the application. Its underlying storage system isn’t optimized for anything and if I had to guess, because the sysadmin has never told me, everything is being written to only a couple of drives in no particular RAID configuration. Needless to say, the system is a bit of a mess. And there’s nothing I can do to change it because it isn’t a business priority.

Trust me, I’m not bragging and I’m not proud. Unless there’s a Least Optimized System Award, in which case I am proud and bragging. But we work with what we’re given, and while it’s frustrating at times, it can also make for an interesting challenge. One challenge that comes up with regular frequency is bulk inserting, updating and deleting of records.

What happens is poor performance. Even touching a single field across a few thousands of records can send the system crawling. But it all has to do with how transactions work: REDACTED.

What just happened? Did the FBI get a hold of my explanation? No. The issue is that while I’ve done some research on the subject, I’m hesitant to post it as an explanation to this blog. It has partially to do with not wanting to look stupid but more to do with not wanting to pass out incorrect information. When we get to the end I’ll post what I think is happening, and then some real experts will hopefully chime in Yay or Nay.

One way that I’ve found to get around poor performance is by “batching” my transactions into smaller units. Perhaps I’ll need to update a field across 200,000 records. I know that running that simple update will take a long time, affecting performance for anyone requesting access to that data along the way. So instead, I use the following little modification to my script to get those transactions committed more quickly. I’ve included two scripts: my intent and the modification.

––Original
UPDATE dbo.bioInfo
SET dbo.bioInfo.alternate_ID = NULL
WHERE dbo.bioInfo.alternate_ID IS NOT NULL

––Modified
SET ROWCOUNT 500
DECLARE @toGo int
SET @toGo = 1

WHILE @toGo > 0
BEGIN
UPDATE dbo.bioInfo
SET dbo.bioInfo.alternate_ID = NULL
WHERE dbo.bioInfo.alternate_ID IS NOT NULL
SET @toGo = @@ROWCOUNT
END

So what’s happening? In the original UPDATE statement, I’m simply resetting an ID field. We’re working on a new integration project and this needed to be done before we could move forward. In the modified script though, there’s a bunch of additional code. The ROWCOUNT tells SQL Server to only return 500 rows for any query even if more than 500 exist. But this doesn’t have to be 500, it can be any number that works well in your environment. After that I’ve declared a variable, @toGo, for use in my loop. The loop is simple: while @toGo is greater than 0, run this UPDATE statement, set @toGo to the remaining @@ROWCOUNT and check again. As long as there are updates to be made, this will continue to take them in 500 piece chunks until none remain.

I can hear it now, “But Matt, I’ve got databases thousands of times bigger than yours and I’ve never run into an issue with slow transaction throughput.” As I mentioned before, my environment is a bit on the abnormal side and so we find ways of working within its tight strictures. But as proof to my pudding, I’ve run this update two ways, once using the original script and once using the modification. The original ran for nearly three full days before I simply cancelled it – don’t worry, it was on a test box and didn’t affect anyone. The second completed in roughly twelve minutes and made both my manager and our integration consultant fairly happy guys.

And yes, I know that I could make this script better with the use of BEGIN TRAN and COMMIT TRAN. I know I should use them more often than I do. I will get better.

What I Think Happens

As I mentioned before, don’t take this as Gospel. And if I’m wrong, please let me know.

When making a modification to the database, the SQL Server grabs the portion of data that is being modified and doesn’t let go until the modification is complete. In SQL Server speak, it’s locking and latching pages until the transaction is committed. And if you’re trying to make modifications over two records, it’s going to hold onto both until both are committed. And if you’re trying to make modifications across 100,000 records, it’s going to hold onto all 100,000 until all of them have been committed.

So combine this functionality with a poorly configured storage subsystem and you can see how the database can be brought to its knees during bulk data loads and updates. By limiting the size of the batch to a smaller, more manageable number, SQL Server can turn through a large set of modifications more quickly because it isn’t holding information in memory (or worse, needing to writing it to TempDB).

4 Comments

  1. I do something similar, but I actually write explicit loops that iterate over the PK values in the table instead of using SET ROWCOUNT. I had never thought about doing it that way.

    AFAIK, the problems are threefold: one is issuing the table level lock and the second is transaction log (our huge ETL scripts would blow through 200GB of log without a problem) and the third was running out of tempdb space (sorting in tempdb is awesome!).

    I also ended up seeing a tremendous performance improvement when I dropped the batch sizes down on my end of things. Again, this was probably because of tempdb + log + sunspots.

    Good stuff! Thanks for the contribution to the community and the excuse for me to use the word “threefold”.

  2. Matt, the looping idea is a good one, it’s not just your server that chokes on huge deletes and updates. Note that the SET ROWCOUNT feature has been deprecated, so you may want to think about using another method for limiting the rows impacted. Below are a couple of relevant links.

    http://msdn.microsoft.com/en-us/library/ms143729.aspx

    https://connect.microsoft.com/SQLServer/feedback/details/282528/do-not-remove-set-rowcount-for-insert-update-and-delete-only?wa=wsignin1.0

  3. Pingback: Tweets that mention Matt Velic » Blog Archive » Looping Transactions -- Topsy.com

  4. Pingback: Matt Velic » Blog Archive » Looping Transactions Revised

Leave a Reply

Required fields are marked *.

*