I have read so much lately about how bad it is to use a recursive CTE to do… well pretty much anything. It came up in a discussion about creating sequential number, and not in a positive sort of way.
I wrote about recursive CTEs when they were first added to SQL Server, and have been a fan for doing breadth-first searching of a hierarchy/graph, but never even thought to use one to generate a set of numbers. As I kept hearing how bad it this method was, so I figured, let’s see just how terrible it is. And of course, I needed something to compare to, so I decided to try all of the ways I could think of. So I will generate varying numbers of rows with the following methods:
- Recursive CTE – A SQL construct that will basically query itself for multiple iterations.
- Good old fashioned WHILE loop – Just a while loop adding one for each iteration
- A cool mathy type generator using
CROSS JOINoperators – Something I have used before, that generates digits with cross joins of varying mathematical equations. - Using
ROW_NUMBER()on a very large set of rows withCROSS JOIN– Basically, generate a lot of rows that start returning data quickly, then use theROW_NUMBERfunction on the rows. GENERATE_SERIES– The built-in function in SQL Server 2022 and beyond that simply solved the problem in a very nice way indeed.
I will present the code in each section, and then I will execute these functions, providing a listing of the time it took to do 10, 100, 1000, 10000, 100000, and 100000 rows for each method.
Which will be faster? Read on.
The methods
In these sections, I will briefly describe the methods I will be using in my tests.
Recursive CTE
I had never tried this, but honestly, it is a really simple method. The basics are that you start with a starting value (which I will always start with 1 for these tests), and for each recursion, it will add one to the value and add it to a result set.
WITH BaseRows AS ( SELECT 1 AS value UNION ALL SELECT value + 1 FROM BaseRows WHERE value BETWEEN 1 AND 9)SELECT ValueFROM BaseRowsOPTION (MAXRECURSION 0); --this makes the boundary data type based. --so over 2 billion for the int type that --that SELECT 1 as value would use.
All of the outputs for the code explanations will look like this:
-------------12345678910
For the performance tests, I will send the results to a temp table.
Good old-fashioned WHILE loop
This is the first algorithm we all used at one time. Just a loop and a counter. Not ideal with SQL, so we learned to create a table of numbers (also known as a sequence table or tally table).
CREATE TABLE #HoldValues (value int);DECLARE @counter INT = 1WHILE @counter <= 10BEGIN INSERT INTO #holdValues(value) SELECT @counter SET @counter = @counter + 1ENDSELECT * FROM #HoldValues;
### Using ROW_NUMBER() on a very large set of rows
I never personally used this method, but it is something that people do. For this, you find a table that has a good number of rows, typically a system view, like:
SELECT COUNT(*)FROM master.sys.indexes;
Then use:
SELECT COUNT(*)FROM master.sys.indexes CROSS JOIN master.sys.indexes as i2 CROSS JOIN master.sys.indexes as i3;
The output for this is: 10503459, which is good enough for our use, where I want to do at least 1 million values in the sequence. Then just do something like:
SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY i2.object_id) as valueFROM master.sys.indexes CROSS JOIN master.sys.indexes as i2 CROSS JOIN master.sys.indexes as i3;
This is something I saw more of before the advent of GENERATE_SERIES because it is very quick to write and works for even moderately large sets. Most of the time we don’t care about performance too much because we are loading a table with values. But when working ad hoc, sometimes you just don’t want to wait. Luckily, too, because there is no row filtering, it can start spooling results very fast.
This method and the next are really good, but you need to have control over the max number of rows you want to put out, and what the min and max are. They generally expect you start at 1 or zero, and if you want something else, you have to change the value with an offset. That offset won’t change the performance of the code, so not a big deal.
A cool mathy type generator
This code is something that someone smarter than me came up with. I have used it many times, and I understand how to use it, but it is more math than I like to think about 🙂
;WITH digits (I) AS (--set up a set of numbers from 0-9 SELECT I FROM (VALUES (0),(1),(2),(3),(4),(5), (6),(7),(8),(9)) AS digits (I)) --builds a set of data from from 0 to 999999 ,Integers (I) AS ( SELECT D1.I + (10<em>D2.I) + (100</em>D3.I) + (1000<em>D4.I) + (10000</em>D5.I) + (100000<em>D6.I) + (1000000</em>D7.I) FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3 CROSS JOIN digits AS D4 CROSS JOIN digits AS D5 CROSS JOIN digits AS D6 CROSS JOIN digits AS D7 )--insert into tableSELECT I + 1 --starts with 0FROM IntegersWHERE I + 1 <= 10;
The problem with this method is basically that you have to choose a size if you are writing it into code. And at these larger sizes, this can be pretty CPU-intensive. (My fans start cranking the second I run, even this 1-10 query because of it.
GENERATE_SERIES
And finally, the newest of them all, the built-in functionality. Which is fastest? That is coming up next.
SELECT valueFROM GENERATE_SERIES(1,10);
This can’t be the fastest, can it? Generally speaking, the built-in tools aren’t always the best performers… but maybe this time?
The results
Using the code in the Appendix that follows this section, the tests were run on my test machine that you can see listed here as my Test Machine (NUC9i7QNX). I used SQL Server 2025 Enterprise RTM for my tests.
In these tests, here is the thing. None of these methods performed so poorly that loading a table of numbers with them is particularly terrible. All of them produced a set of a million rows in less than 12 seconds. But the absolute best performer turned out to be:
SectionName \ Rows 10 100 1000 10000 100000 1000000 Rows--------------------- ----- ------ ------ ------ ------- ------------Recursive CTE 1 3 12 120 929 9292While Loop 1 2 13 112 1116 11661CROSS JOIN ROWNUMBER 59 63 95 97 397 3602CROSS JOIN Digits 5550 5505 5573 5423 5615 5952GENERATE_SERIES 2 1 2 12 103 1123Times in milliseconds.
GENERATE_SERIES! By quite some distance. Interesting to me was that the CROSS JOIN ROWNUMBER method was the second-best performer. I had expected that one to be much slower, honestly.
In any case, if you have a need to generate a series of integers, use GENERATE_SERIES. It is the fastest available method in SQL Server and is just plain easier. You can tailor your output to be what you need with parameters for start, stop, and a step value that can be int, bigint, or even numeric.
One thing to note, the only method that uses parallelism is the CROSS JOIN Digits method. And in order to create the million rows I want and use the same code, it cranked my machine hard. However, while it was over 5 seconds to generate 10 rows, it was only 5 seconds to generate a million rows. If I weren’t afraid my computer was going to melt into a puddle of silicone, I would see just how far I could go with it.
Appendix – The test code
This should work as is on any machine. It is set to create a table in your database that will hold the output and the times. It is also set to reset when you do 10 rows, so don’t forget that and lose your samples.. not that I did that or anything.
--Note: code isn't well formatted. SET NOCOUNT ON;IF OBJECT_ID('dbo.HoldTimes') is nullBEGINCREATE TABLE dbo.HoldTimes (sectionName varchar(30),starttime datetime2,endtime datetime2,sampleSize int,DurationMilliseconds as (DATEDIFF(millisecond,starttime, endtime)));END;DROP TABLE IF EXISTS #holdValues;CREATE TABLE #holdValues (value INT NOT NULL PRIMARY KEY (value));DECLARE @maxValue INT = 100000 --0000;IF @maxValue = 10TRUNCATE TABLE dbo.HoldTimes<hr />PRINT 'Recursive CTE';TRUNCATE TABLE #holdValues;INSERT INTO dbo.HoldTimes (SectionName,StartTime,SampleSize)VALUES ('Recursive CTE',SYSDATETIME(),@maxValue);WITH BaseRows AS (SELECT 1 AS valueUNION ALLSELECT value + 1FROM BaseRowsWHERE value BETWEEN 1 AND (@MaxValue - 1))INSERT INTO #holdValues (value)SELECT ValueFROM BaseRowsOPTION (MAXRECURSION 0);UPDATE dbo.HoldTimesSET EndTime = sysdatetime()WHERE sectionName = 'Recursive CTE'AND sampleSize = @maxValueIF (SELECT COUNT(*) from #holdValues) <> @MaxValueTHROW 50000,'ERROR Occurred',1;<hr />PRINT 'While Loop';TRUNCATE TABLE #holdValues;INSERT INTO dbo.HoldTimes (SectionName,StartTime,SampleSize)VALUES ('While Loop',SYSDATETIME(),@maxValue);DECLARE @counter INT = 1WHILE @counter <= @maxvalueBEGININSERT INTO #holdValues(value)SELECT @counter<pre><code>SET @counter = @counter + 1</code></pre>ENDUPDATE dbo.HoldTimesSET EndTime = sysdatetime()WHERE sectionName = 'While Loop'and sampleSize = @maxValueIF (SELECT COUNT(*) from #holdValues) <> @MaxValueTHROW 50000,'ERROR Occurred',1;<hr />PRINT 'CROSS JOIN ROWNUMBER';TRUNCATE TABLE #holdValuesINSERT INTO dbo.HoldTimes (SectionName,StartTime,SampleSize)VALUES ('CROSS JOIN ROWNUMBER',SYSDATETIME(),@maxValue);INSERT INTO #holdValues(value)SELECT TOP (@MaxValue) ROW_NUMBER() OVER (ORDER BY i2.object_id) as valueFROM master.sys.indexesCROSS JOIN master.sys.indexes as i2CROSS JOIN master.sys.indexes as i3;UPDATE dbo.HoldTimesSET EndTime = sysdatetime()WHERE sectionName = 'CROSS JOIN ROWNUMBER'and sampleSize = @maxValueIF (SELECT COUNT(*) from #holdValues) <> @MaxValueTHROW 50000,'ERROR Occurred',1;<hr />PRINT 'CROSS JOIN Digits';TRUNCATE TABLE #holdValuesINSERT INTO dbo.HoldTimes (SectionName,StartTime,SampleSize)VALUES ('CROSS JOIN Digits',SYSDATETIME(),@maxValue);;WITH digits (I) AS (--set up a set of numbers from 0-9SELECT IFROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I))--builds a set of data from from 0 to 999999,Integers (I) AS (SELECT D1.I + (10<em>D2.I) + (100</em>D3.I) +(1000<em>D4.I) + (10000</em>D5.I) + (100000<em>D6.I) +(1000000</em>D7.I) + (10000000*D8.I)FROM digits AS D1 CROSS JOIN digits AS D2CROSS JOIN digits AS D3CROSS JOIN digits AS D4CROSS JOIN digits AS D5CROSS JOIN digits AS D6CROSS JOIN digits AS D7CROSS JOIN digits AS D8 )--insert into tableINSERT INTO #holdValues (value)SELECT I + 1FROM IntegersWHERE I <= @MaxValue - 1;UPDATE dbo.HoldTimesSET EndTime = SYSDATETIME()WHERE sectionName = 'CROSS JOIN Digits'AND sampleSize = @maxValueIF (SELECT COUNT(*) FROM #holdValues) <> @MaxValueTHROW 50000,'ERROR Occurred',1;<hr />PRINT 'GENERATE_SERIES'TRUNCATE TABLE #holdValues;INSERT INTO dbo.HoldTimes (SectionName,StartTime,SampleSize)VALUES ('GENERATE_SERIES',SYSDATETIME(),@maxValue);INSERT INTO #holdValues (value)SELECT valueFROM GENERATE_SERIES(1,@MaxValue)UPDATE dbo.HoldTimesSET EndTime = SYSDATETIME()WHERE sectionName = 'GENERATE_SERIES'AND sampleSize = @maxValue;IF (SELECT COUNT(*) FROM #holdValues) <> @MaxValueTHROW 50000,'ERROR Occurred',1;<hr />PRINT 'That'' All'SELECT SectionName,MAX(CASE WHEN SampleSize = 10 THEN DurationMilliseconds ELSE -1 END) AS [10 Rows],MAX(CASE WHEN SampleSize = 100 THEN DurationMilliseconds ELSE -1 END) AS [100 Rows],MAX(CASE WHEN SampleSize = 1000 THEN DurationMilliseconds ELSE -1 END) AS [1000 Rows],MAX(CASE WHEN SampleSize = 10000 THEN DurationMilliseconds ELSE -1 END) AS [10000 Rows],MAX(CASE WHEN SampleSize = 100000 THEN DurationMilliseconds ELSE -1 END) AS [100000 Rows],MAX(CASE WHEN SampleSize = 1000000 THEN DurationMilliseconds ELSE -1 END) AS [1000000 Rows]FROM dbo.HoldTimesGROUP BY SectionName



Leave a Reply