SQL Cursors are slow… except when they’re not

SQL cursors are bad, evil, wicked things… a scourge on performance and a pox on the skills of any good SQL author.  Any good SQL man worth his salt will scoff indignantly at the mere sight of the CURSOR keyword in any production code.  It’s clear to those experts that a set-based operation will always outperform a crude cursor-based approach.

… except when it doesn’t.

Let’s say I’ve got a table constructed as follows:

create table #test (id int identity(1, 1), value money)

declare @i int = 0
while @i < 10000
begin
 insert into #test values(100)
 set @i = @i + 1
end

… and then, summoning all my years of SQL skill, I compute a running total as follows:

select
 t1.id, sum(t2.value) as Total
from #test t1
inner join #test t2
 on t2.id <= t1.id
group by t1.id
order by t1.id

The magic is that little <= clause creating a sliding ‘window’ and yielding results like this:

id Total
1 100.00
2 200.00
3 300.00
4 400.00
5 500.00

Awesome, that totally works and I am a genius… but being the diligent SQL guru that I am, I take a little gander at the ‘ol Query Plan.

Ye Gods!

Yeah, that plan looks amazin… wait, what?!?

And I just crapped my pants… that innocent query against 10,000 rows just blew up with… wait for it… over 50 MILLION rows!

That’s your little friend known as an Eager Spool. For each row that you need a running total, sql server needs to pull that row and any row before it… over and over. So to get the first value you read one row… for the second value, you read two (for a total of three)… for the third, you read three (for a total of 6), and so on.

SQL Server helps you out here by just exploding the combinatorial results. This is really the sum of sequential integers (1+2+3=6)… a summation calculable by the formula ((N^2)+N)/2. Plug in 10,000 and you get… lo and behold… 50,005,000 rows. Exactly what our eager little buddy is showing.

You can imagine that, as the number of records goes up, your query performance will go dramatically down… and your imagination would be right. This little baby goes pear-shaped real quick.

The only solution is to calculate a result, store it, and then add the next record, store that again, and so on… in other words, don’t look back at every record before you, but rather store an accumulating result.

How can you do that? Well, you can wait for SQL Server 2012 and the new SUM() OVER (ORDER BY) capability which will do this exact optimization. This is an extension to the windowing functions added in SQL 2005. Prior to SQL 2012, however, the ORDER BY clause is only supported in non-aggregating functions.

In the meantime, you can use your old trusted friend the cursor:

set nocount on
create table #totals(id int, totals money)
declare @total money = 0, @id int, @value money
declare tCursor CURSOR STATIC READ_ONLY FORWARD_ONLY
for select id, value from #test
open tCursor
fetch next from tCursor into @id, @value
while @@fetch_status = 0
begin
	set @total = @total + @value
	insert into #totals values(@id, @total)
	fetch next from tCursor into @id, @value
end
close tCursor
deallocate tCursor
select * from #totals

On my development box, the original query took 19 seconds. This CURSOR version is sub-second… BAM! Genius re-established!

ALL HAIL THE CURSOR!