Friday, May 21, 2010

The Truth About Cursors: Part 2

In Part 1 of this series, we looked under the hood at how various types of cursors worked and we found that their performance (if DECLAREd correctly) is not as horrible as many seem to believe.

In this article, Part 2, we will look at some more performance statistics, discovering some interesting things along the way. And we’ll end with a little mystery that will hopefully whet your appetite for Part 3.

Processing in Clustered Index Key Order

As you recall, in Part 1, we created a million-row test table with a clustered index primary key on the ID column:

if object_id('TestTable','U') is not null drop table TestTable
go
create
table TestTable
(
ID int identity(1,1) primary key
,Column1 varchar(50)
,Column2 varchar(50)
)
go

with
L0
(c) as (select 0 from (values (0),(0),(0)) x(c)) /* 3 Rows */
,L1(c) as (select 0 from L0 a,L0 b,L0 c) /* 27 Rows=3x3x3 */
,L2(c) as (select 0 from L1 a,L1 b,L1 c) /* 19683 Rows=27x27x27 */
,L3(c) as (select 0 from L2 a,L2 b) /* 387,420,489 Rows=19683x19683 */
,NN(n) as (select row_number() over (order by (select 0)) from L3)
insert TestTable (Column1,Column2)
select newid(),newid()
from NN
where n<=1000000
And we ran some tests in processing this table one row at a time, in clustered index key (ID) order, via various cursors and via various non-cursor-based T-SQL Code approaches. And these were the results of those tests (running in SQL2008):

/*
Description of Method CPU Reads Writes Duration
---------------------------------------------------------------------------
CURSOR Approaches (LOCAL FORWARD_ONLY):
STATIC READ_ONLY 23906ms 6,061,368 11,817 25068ms <--Winner
KEYSET READ_ONLY 36578ms 8,787,165 2,110 38158ms
DYNAMIC READ_ONLY 24270ms 1,011,368 0 29539ms
FAST_FORWARD READ_ONLY 24328ms 1,011,368 0 29632ms
Non-CURSOR Approaches:
Emulating STATIC CURSOR 23849ms 3,031,241 12,376 33659ms
Emulating KEYSET CURSOR 34297ms 6,020,084 2,608 39571ms
TOP 1 [Kind of emulates DYNAMIC] 20286ms 3,011,368 0 27137ms
*/
(Note: All of my tests were recorded in SQL2008 with a Profiler server-side trace with Query Results discarded. In addition, before each test, the procedure cache was cleared (DBCC FREEPROCCACHE) and the data buffer pool was cleared (DBCC DROPCLEANBUFFERS) so that they were all run against a completely cold cache.)

The STATIC cursor was the winner among all the tests in terms of Duration. If you strictly based the results on minimum Reads and Writes, then the DYNAMIC and FAST_FORWARD cursors were the winners. But if you used CPU time as your only measure, then the non-cursor approach of doing a TOP 1 was the winner.

Let’s review the code for that TOP 1 approach, which is our attempt to emulate a DYNAMIC cursor by retrieving one row at a time directly from the table itself:

declare @ID int
,@Column1 varchar(50)
,@Column2 varchar(50)
declare @CurrID int
select
@CurrID = -2147483648 /* Initialize to lowest possible ID */
while 1=1
begin
select top 1 @ID=ID
,@Column1=Column1
,@Column2=Column2
from TestTable
where ID>@CurrID
order by ID
if @@rowcount=0 break
/* We would do something with the data here */
select @CurrID=@ID /* Save Current ID */
end
The reason this worked so well is because we were accessing the data in clustered index key order.

Processing in Non-Clustered Index Key Order

Now what if we created a non-clustered index on Column1 in our table…

create nonclustered index IX_TestTable_Column1 on TestTable(Column1)
…and we wanted to access the data in order of Column1 and ID?

Well, now our TOP 1 approach is not so easy to implement. Even though we populated Column1 with NEWID(), which is supposed to generate unique values, we still have to account for the fact that Column1 may, in fact, have potential duplicates. In fact, let’s just go ahead and introduce a few duplicates:

update TestTable
set Column1=replicate('0',36)
where ID%100000=0
So now, out of our million rows in the table, we now know there are 10 rows that have a common Column1 value.

How would we attempt to emulate a DYNAMIC cursor with T-SQL code now?

Well, first of all, we would have to have 2 variables to house our current value of Column1 and current value of ID. And we could just change the WHERE clause of our TOP 1 query to the following:

declare @ID int
,@Column1 varchar(50)
,@Column2 varchar(50)
declare @CurrColumn1 varchar(50)
,@CurrID int
select
@CurrColumn1='' /* Initialize to lowest possible Column1 value */
,@CurrID=-2147483648 /* Initialize to lowest possible ID */
while 1=1
begin
select top 1 @ID=ID
,@Column1=Column1
,@Column2=Column2
from TestTable
where Column1>@CurrColumn1
or Column1=@CurrColumn1 and ID>@CurrID
order by Column1,ID
if @@rowcount=0 break
/* We would do something with the data here */
select @CurrColumn1=@Column1
,@CurrID=@ID
end
We have to account for the fact that the next row we need to access could potentially have the same Column1 value as our most recent @CurrColumn1 value, so we are forced to have a WHERE clause with an OR predicate. And, unfortunately, that brings about a SCAN of the nonclustered index, because OR predicates are not SARGable, as you can see in the Estimated Execution Plan:

TOP 1 Query with OR in Predicate

This will do a SCAN of our nonclustered index, looking for the first row it encounters (because of the TOP 1) that satisfies the WHERE clause. Then, when it gets that row, it will do a Key Lookup into the clustered index to get the Column2 value.

Remember, the above plan reflects what we will be doing a million times. The SCAN will be quick for the first several hundred iterations (because the next row will always be toward the beginning of the index), but when we are processing the 900,000th iteration, for example, the SCAN will have to trudge through 900,000 rows just to find our next row that satisfies the query. This will take a loooonnnnggg time.

So we have to use a different approach. We will have to do it in two steps. First, we’ll look for the TOP 1 row that has the same Column1 value as our last one, but with a higher ID. If we fail to find a row that satisfies that, then we’ll look for the TOP 1 row that has a greater Column1 value as our last one. Here’s what that looks like:

declare @ID int
,@Column1 varchar(50)
,@Column2 varchar(50)
declare @CurrColumn1 varchar(50)
,@CurrID int
select
@CurrColumn1='' /* Initialize to lowest possible Column1 value */
,@CurrID=-2147483648 /* Initialize to lowest possible ID */
while 1=1
begin
select top 1 @ID=ID
,@Column1=Column1
,@Column2=Column2
from TestTable
where Column1=@CurrColumn1 and ID>@CurrID
order by Column1,ID
if @@rowcount=0
begin
select top 1 @ID=ID
,@Column1=Column1
,@Column2=Column2
from TestTable
where Column1>@CurrColumn1
order by Column1,ID
end
if @@rowcount=0 break
/* We would do something with the data here */
select @CurrColumn1=@Column1
,@CurrID=@ID
end
Both of those TOP 1 queries will bring about SEEKs rather than SCANs. It’s just unfortunate that we most likely have to do two SEEKs for all million rows in our loop. But hey, that’s life.

So let’s try that new TOP 1 code to process our million-row table in (Column1,ID) order. And let’s try our various other methods in (Column1,ID) order as well. The code to use the various types of cursors (STATIC, KEYSET, DYNAMIC, and FAST_FORWARD) and our T-SQL code that emulates KEYSET and STATIC cursors are exactly the same as in Part 1, except for changing the ORDER BY clause.

Here are the figures:

/*
Description of Method CPU Reads Writes Duration
---------------------------------------------------------------------------
CURSOR Approaches (LOCAL FORWARD_ONLY):
STATIC READ_ONLY 29437ms 6,069,727 11,817 33469ms <--Winner
KEYSET READ_ONLY 37406ms 8,782,014 2,110 63812ms
DYNAMIC READ_ONLY 37651ms 7,006,216 0 68462ms
FAST_FORWARD READ_ONLY 37734ms 7,006,216 0 69064ms
Non-CURSOR Approaches:
Emulating KEYSET CURSOR 35125ms 6,014,954 2,609 64764ms
Emulating STATIC CURSOR 28437ms 3,039,600 12,377 39221ms
TOP 1 [Kind of emulates DYNAMIC] 64453ms 9,223,658 0 90114ms
*/
Wow! Even though our TOP 1 approach was the CPU-based winner when we processed the table in clustered index key order, it’s the biggest loser (by far) when we process the table in a nonclustered key order. And it’s the biggest loser in terms of Reads and Duration as well. It’s because of the double-SEEK and the Key Lookup that we had to do a million times. Of course, you could INCLUDE Column2 in the nonclustered index to eliminate the Key Lookup, but the other approaches worked just fine without us having to change the index.

Once again, the STATIC cursor approach is the Duration winner. The CPU winner this time is the T-SQL code that we wrote to emulate a STATIC cursor. It also has the lowest number of Reads. Hmmmmm…

One important side note: You may recall from Part 1 of this series that the FAST_FORWARD cursor is not a special type of cursor at all, but rather it’s an indicator to the optimizer to choose what it thinks is the most cost-effective cursor type for us. In this case (as was the case with processing our table in clustered key order), the optimizer chose to run it as a DYNAMIC cursor… you can see the similarities in all the measurements between the two.

So processing the table in a nonclustered index key order changed the standings a bit, didn’t it?

Processing in Non-Indexed Order

Now what if we process the table in order of a column that has no index at all… by (Column2,ID)?

Well, that eliminates our TOP 1 approach. We can’t possibly access the table one row at a time unless we have an index to work with somehow. Well, it is possible, but it would mean doing a SCAN of the entire table looking for the next highest value… one million times! It would take forever and a day.

But, all our other approaches will process the table just fine… it’s just a matter of changing the ORDER BY clauses so they process the table in (Column2,ID) order. And here are the results:

/*
Description of Method CPU Reads Writes Duration
---------------------------------------------------------------------------
CURSOR Approaches (LOCAL FORWARD_ONLY):
STATIC READ_ONLY 29609ms 6,069,727 11,817 33063ms <--Winner
KEYSET READ_ONLY 43109ms 8,791,922 2,111 45806ms
DYNAMIC READ_ONLY 42828ms 8,791,924 2,111 45966ms
FAST_FORWARD READ_ONLY 29672ms 6,069,741 11,820 33073ms
Non-CURSOR Approaches:
Emulating STATIC CURSOR 28406ms 3,039,772 12,378 40183ms
Emulating KEYSET CURSOR 39796ms 6,024,860 2,605 45187ms
TOP 1 [Kind of emulates DYNAMIC] N/A N/A N/A N/A
*/
Once again, as always, the STATIC cursor is the Duration winner and the T-SQL Code that emulates the STATIC cursor is the Reads and CPU winner.

But here are a couple of really interesting things…

You may recall that a DYNAMIC cursor, unlike the KEYSET and STATIC cursors, will attempt to read from a table directly as opposed to populating a temp table up front. Well, since we were accessing the table in order of an unindexed column, it was impossible to access the table directly. And so the optimizer changed the DYNAMIC cursor to a KEYSET cursor… you can see the similarities in measurements between the two. You can also see evidence of this if you take the following command that DECLAREs the DYNAMIC CURSOR…

declare c cursor 
local forward_only dynamic read_only
for
select ID,Column1,Column2
from TestTable
order by Column2,ID
…and look at the Estimated Execution Plan for it:

DYNAMIC Cursor Changed to KEYSET Cursor

Another interesting thing to note about our figures this time is that in the case of the FAST_FORWARD cursor, the optimizer chose a STATIC cursor approach for it (as opposed to a DYNAMIC approach as it did in the last two tests). Again, look at the similarities of its figures and the STATIC cursor’s figures.

The Best Approach

So let’s summarize the winners in our various processing orders:

/*
Winner Clustered Index Order NonClustered Index Order Non-Indexed Order
------------------------------------------------------------------------------------
Duration STATIC Cursor STATIC Cursor STATIC Cursor
Reads DYNAMIC Cursor Emulating STATIC Emulating STATIC
CPU TOP 1 [Emulating DYNAMIC] Emulating STATIC Emulating STATIC
*/
The STATIC Cursor was the winner in all processing orders in terms of Duration. But that was on my particular standalone system. If you go strictly by CPU time, the TOP 1 approach was best, but only when processing the table in clustered index key order. Otherwise the code we wrote to emulate a STATIC cursor is the CPU winner, and also the winner in terms of Logical Reads.

But here’s the interesting part that I want you to think about. These tests were all conducted in SQL2008. If I were to conduct the same tests in SQL2005, the results would come out like this instead:

/*
Winner Clustered Index Order NonClustered Index Order Non-Indexed Order
------------------------------------------------------------------------------------
Duration STATIC Cursor STATIC Cursor STATIC Cursor
Reads DYNAMIC Cursor STATIC Cursor STATIC Cursor
CPU TOP 1 [Emulating DYNAMIC] STATIC Cursor STATIC Cursor
*/
Why is this? What changed in SQL2008?

You’ll have to wait for Part 3 to find out, but here’s a clue. Check out the Logical Reads and CPU time in SQL2008 and SQL2005 when processing the table in Clustered Index Key order:

/*
Description CPU Reads
----------------------------------------------
SQL2008:
STATIC Cursor 23906ms 6,061,368
Emulating STATIC Cursor 23849ms 3,031,241 (Why is this half the reads???)
SQL2005:
STATIC Cursor 22378ms 6,107,958
Emulating STATIC Cursor 25402ms 6,128,352 (Reads are on par, but CPU higher)
*/
I’ll let you chew on that until next time.

No comments:

Post a Comment