Generating a set of sequential numbers, part 2

When I wrote the blog from last week (Performance test of generating a set of sequential numbers), I was mostly wanting to see how well I could do this using a recursive CTE. I wondered how terribly they might be. Turns out, they are terrible compared to other methods, but they also aren’t nearly as bad as I expected (on my my Test Machine (NUC9i7QNX)), at least.

I thought I was done, nice quick little throwaway piece, but I went a little more in depth than I planned. Then Aaron Bertrand messaged me about a post that I had forgotten (even if I did edit it :)), where he was introducing GENERATE_SERIES (GENERATE_SERIES: My new go-to to build sets). In it, he had included a method of doing this using a method that replicates digits and then uses STRING_SPLIT and ROW_NUMBER to generate more digits. He also noted that it was blistering fast.

Aaron (if you know him) is rarely wrong about SQL (at the very least).

I also realized there was one other thing I wanted to add to my tests, that being just selecting from a Numbers/Tally table that has a billion rows. This should be the fastest way to pull a set of numbers.

It became a full-on post when I made an improvement to a technique that I am actually a bit proud of.

The new/adjusted methods

I am not going cover anything other than these differences, and follow that with a summary that includes the same sort of chart I made last week, with times in milliseconds, for each of these methods and those created last time. The code for this article, along with a database create script that shows the choices I made for my database (though most inserts are into tempdb, in the script, it builds a 1 billion row table named dbo.Numbers.)

Just query from a table

The fastest method (that I am shocked I forgot to include), is just querying from a table with guaranteed sequential values. Not always possible, and if you only need short sequences of vary large/small values, probably not the most optimal, but was the fastest in my test:

CREATE TABLE dbo.Number
(
value INT NOT NULL PRIMARY KEY
);
--trying to get bulk inserts, took five minutes on
--"my machine", but it did insert a billion rows
-- with a clustered index
INSERT INTO dbo.Number WITH (TABLOCKX) (value)
SELECT value
FROM   GENERATE_SERIES(1,1000000000);

In my tests, this just gets turned into a simple query: SELECT value FROM dbo.Number WHERE value < @MaxValue. If you regularly need complex sets, especially where simple math won’t cut it, pre-calculation is never a bad way to go. It will take up space, but if you are paying for compute, this will also be the least compute.

STRING_SPLIT REPLICATE

This one is kind of brilliant, and definitely one of the fastest methods. Totally glad Aaron pointed it out.

SELECT TOP (@maxValue )
     ROW_NUMBER() OVER (ORDER BY @@SPID) AS Value
FROM STRING_SPLIT(REPLICATE(CAST(',' AS VARCHAR(MAX)), @maxValue - 1), ',')
ORDER BY value;

The idea is it replicates a string of commas, then lets STRING_SPLIT split them out into the number of empty strings that there are commas (plus one, hence the @maxValue - 1) that can then be output with ROW_NUMBER.

I made one slight change to Aaron’s code and made the comma an varchar(max) so it could be used for a lot larger size of data.

Enhancing the CROSS JOIN Digits method

There has been this de facto standard way of generating a fast set of numbers that I have used for many years. It is really cool because it uses math to create a large set of numbers very quickly. But it has never been great in terms of being “generic”. So when I was using the following code to try to generate small and large sets, there was a problem. It took five seconds to output 10 rows or 1000000 rows. And when I tried the vaguely insane 100000000 rows (that is 100 million), it wouldn’t work anymore. I needed another cross join to keep going. I only wanted code that could be generic for all cases, so I included it where it could handle up to 10 million rows I was trying to generate.

   ;WITH digits (I) AS (--set up a set of numbers from 0-9
            SELECT I
            FROM   (VALUES (0),(1),(2),(3),(4),(5),
                           (6),(7),(8),(9)) AS digits (I))
    --builds a set of data from from 0 to 999999
    ,Integers (I) AS (
            SELECT D1.I + (10*D2.I) + (100*D3.I) + 
                   (1000*D4.I) + (10000*D5.I) + (100000*D6.I) +
                   (1000000*D7.I) + (10000000*D8.I) 
            FROM digits AS D1 CROSS JOIN digits AS D2 
                 CROSS JOIN digits AS D3
                 CROSS JOIN digits AS D4 
                 CROSS JOIN digits AS D5
                 CROSS JOIN digits AS D6 
                 CROSS JOIN digits AS D7 
                 CROSS JOIN digits AS D8 )
    --insert into table
    INSERT INTO #holdValues WITH (TABLOCKX) (value)
    SELECT I + 1
    FROM   Integers
    WHERE  I <= @MaxValue - 1;

So, it being a snow day where they requested people not be out frivolously, I said, “What could I do to make this faster?” I knew the problem came in with those later cross joins. The typical usage is to choose how large your output would be; you could just choose the number of digits and comment out the others. I also didn’t want to do dynamic SQL, because that seemed messy.

I started thinking about how to remove those cross joins without dynamic SQL? You can see in the next listing that what I did was to eliminate them by making them derived tables that either return 10 rows with 0-9, or 1 row with 0. The @MaxValue variable does have to be put into all of those derived tables, but you need it to filter the output anyhow. So the output will always have (in my case) 10 columns, but any that aren’t needed are turned into 0.

;WITH digits (I) AS (--set up a set of numbers from 0-9
        SELECT I
        FROM   (VALUES (0),(1),(2),(3),(4),(5),
                       (6),(7),(8),(9)) AS digits (I))
--builds a set of data from from 0 to 999999
,Integers  AS (
     SELECT D1.I + 
            (10*D2.I) +
            (100*D3.I) + 
            (1000*D4.I) + 
            (10000*D5.I) + 
            (100000*D6.I) +
            (1000000*D7.I) + 
            (10000000*D8.I) + 
            (100000000*D9.I) +
            (1000000000*D10.I) AS I
    --the >= or i=0 effectively removes a cross join from the set, making small 
    --numbers (the normal) much faster, while allowing a lot higher digit count.
      FROM digits AS D1 
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue > 10 OR i = 0) AS D2 
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 100 OR i = 0) AS D3 
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 1000 OR i = 0) AS D4 
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 10000 OR i = 0) AS D5 
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 100000 OR i = 0) AS D6 
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 1000000 OR i = 0) AS D7 
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 10000000 OR i = 0) AS D8
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 100000000 OR i = 0) AS D9
         CROSS JOIN (SELECT * FROM digits 
                       WHERE @MaxValue >= 1000000000 OR i = 0) AS D10
                 )
   --insert into table
 SELECT I + 1
 FROM   Integers
 WHERE  I < @MaxValue ;

Note that I also went ahead and added a few more digit cross joins to the query, and now it could technically generate more 9.9 billion rows. (I did not try that as 100 million was my breaking point!). (I realize now, this technique probably should have been a blog of its own!)

Have I tested it for every single case? No. I did check for various numbers that weren’t 1 with zeros following, and I did add checks to every one of the cases to make sure that the set requested was the set I got. This checks to make sure the values are between 1 and @MaxValue, and that there are @MaxValue rows in the table:

IF (SELECT COUNT(*) from #holdValues) <> @MaxValue
OR EXISTS(SELECT * FROM #holdValues where value > @MaxValue)

Summary

So I ran (and reran and reran) these tests again using the script you can find here on my new GitHub repo, and there were a few things I noticed, which I will discuss in the following sections.

Generating up to 100000 rows

First, at these low levels, the output varied by milliseconds, likely showing some activity on my computer. This is not a perfect test bed, and frankly, in a test like this, a few milliseconds is not a big deal.

Second, up to 100000 rows, only one method really stank. And that was trying to make a generic CROSS JOIN Digits work. At just over 5 seconds each time, it was not awesome. The others were all from great to ok.

Sure 55 milliseconds to query direct from a table is way faster than 939 for the Recursive CTE (which was the worst at that point), but if you need 100000 numbers, are you going to notice? And you can do fancier things with a CTE if you are interested (since you don’t need to just + 1 every time)

Times stated in milliseconds.

SectionName               10 Rows   100 Rows    1000 Rows 10000 Rows  100000 Rows
------------------------- --------- ----------- --------- ----------- -----------
Recursive CTE             3         5           12        100         939
While Loop                3         6           55        42          267
CROSS JOIN ROWNUMBER      169       70          62        91          456
GENERATE_SERIES           3         2           4         15          239
CROSS JOIN Digits         5525      5711        5628      5504        5498

Modified CROSS JOIN Digit 34        38          37        46          137
STRING_SPLIT REPLICATE    4         5           6         30          290
Numbers Table             4         5           5         11          55

The cool thing is that the modified CROSS JOIN Digits method was, at this point, the fastest to get to 100000. Of course, for 10000, all but that original CROSS JOIN ROWNUMBER method were < .1 seconds. None too shabby.

To 100000000, and not beyond!

Here is where things get a lot more interesting. If you need to generate a lot of sequential numbers fast, what is the fastest way to do it? Well, recursion isn’t the way. Though let’s be clear, generating 10 million rows in 92 seconds on a computer with mobile processors from like 5+ years ago… not bad.

What is slightly interesting is that the fastest method (other than the numbers table) was the original CROSS JOIN Digits method. This stands to reason because at this point, we are in the zone where you need all the rows to be generated. And my modified method has to pay the price for the derived tables and filtering.

SectionName                    1000000 Rows 10000000 Rows 100000000 Rows
=----------------------------- ------------ ------------- --------------
Recursive CTE                  9250         92267         950370
While Loop                     13629        135228        -1
CROSS JOIN ROWNUMBER           3630         36005         -1
GENERATE_SERIES                1163         12137         125171
CROSS JOIN Digits              5936         9812          -1

Modified CROSS JOIN Digits     1103         10706         111282
STRING_SPLIT REPLICATE         6384         66191         680313
Numbers Table                  494          4551          31878

The -1 values are places where I couldn’t realistically finish the calculation.

I tried the While Loop method a few times at 100 million, and it kept crashing. I crashed my SQL Server one time, and a few times just the connection. CROSS JOIN ROWNUMBER failed because while this:

SELECT TOP (@maxValue) ROW_NUMBER() OVER (ORDER BY i2.object_id) as value
FROM   master.sys.indexes
        CROSS JOIN  master.sys.indexes as i2
        CROSS JOIN  master.sys.indexes as i3;

was fast, this following bit:

SELECT TOP (@maxValue) ROW_NUMBER() OVER (ORDER BY i2.object_id) as value
FROM   master.sys.indexes
        CROSS JOIN  master.sys.indexes as i2
        CROSS JOIN  master.sys.indexes as i3
        CROSS JOIN  master.sys.indexes as i4;

took FOREVER to process even a few rows. I love doing this, but at a certain point, you just have to give in.

Conclusion

Use GENERATE_SERIES in almost every case possible when you have it available. If you need it significantly faster or you need specific sequences, create a table. You might load that table with GENERATE_SERIES (as I did), Or you might want to use a recursive CTE to load it if you need some weird (to the rest of the world) series like:

WITH BaseRows AS (
SELECT 1 AS value
UNION ALL
SELECT value + value + 1 --not a standard need
FROM   BaseRows
WHERE  value + value + 1 BETWEEN 1 AND 10000
)
SELECT Value
FROM   BaseRows
OPTION (MAXRECURSION 0);

This output is:

Value
----------
1
3
7
15
31
63
127
255
511
1023
2047
4095
8191

But not for anything that GENERATE_SERIES or one of these other methods can provide.

I assume I am done with this for now, but who knows? Maybe someone else will have another method that I can try and alter for fun and education?

2 responses to “Generating a set of sequential numbers, part 2”

  1. Brent Ozar Avatar

    One very big catch: this conclusion is true if you need the sequential numbers in ASCENDING order, but if you need them in DESCENDING order, all hell breaks loose: https://www.brentozar.com/archive/2025/11/query-exercise-answer-generating-big-tempdb-spills/

    Liked by 1 person

    1. Louis Davidson Avatar

      Yeah, I was only covering the concept of generating the numbers. Not if you need to sort them. Sorting all the values from a bigint is equal parts awesome and terrifying :). Someday I might try to extend this to generating negative and positive values, but never going to try to sort that on my little test machine (and probably not paying for the compute to try it on a real powerhouse on Azure either!

      Thank you!

      Like

Leave a reply to Louis Davidson Cancel reply

I’m Louis

I have been at this database thing for a long long time, with no plans to stop.

This is my blog site, companion to drsql.org

Recents