I was trying to test out the RegEx capabilities in SQL Server 2025. I wanted to generate a really random, flexibly large set of data pretty quickly so I could just try to see if indexes are used ever with RegEx. So I asked ChatGPT, and then the fun kind of started.
So, like any good writer with plenty of other things to do, I decided to document my trials.
Try Number 1
So I went to ChatGPT to generate some data for me. I asked it:
“create a set of random string data with 1000 rows for a sql server table named RegExTest. It should insert data into a column named Value”
The output was:
-- Drop the table if it exists
IF OBJECT_ID('dbo.RegExTest', 'U') IS NOT NULL
DROP TABLE dbo.RegExTest;
-- Create the table
CREATE TABLE dbo.RegExTest (
ID INT IDENTITY(1,1) PRIMARY KEY,
Value NVARCHAR(100)
);
-- Helper function to generate random strings
-- Creates a function to return a random alphanumeric string
CREATE OR ALTER FUNCTION dbo.GetRandomString (@Length INT)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @Chars NVARCHAR(100) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
DECLARE @Result NVARCHAR(MAX) = ''
DECLARE @i INT = 0
DECLARE @Index INT
WHILE @i < @Length
BEGIN
SET @Index = CAST(RAND(CHECKSUM(NEWID())) * LEN(@Chars) + 1 AS INT)
SET @Result = @Result + SUBSTRING(@Chars, @Index, 1)
SET @i += 1
END
RETURN @Result
END;
GO
-- Insert 1000 rows with random strings
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
INSERT INTO dbo.RegExTest (Value)
VALUES (dbo.GetRandomString(10 + ABS(CHECKSUM(NEWID())) % 41)); -- strings between 10 and 50 chars
SET @i += 1;
END;
But this was really not right at all. The function is the big issue, because while it looks sound, there are two big issues. You can’t used NEWID() in a function, and if you remove that, you also find you cannot use RAND() in a function.
Try Number 2
So I go back to ChatGPT and say
“this doesn’t work, you can’t use newid or rand in a function in sql server”
To which it replies:
“You’re absolutely right — in SQL Server, you cannot use non-deterministic functions like NEWID() or RAND() inside a user-defined function (UDF). Let’s fix that using a T-SQL loop with inline logic instead of a scalar function.
Here’s a fully working script that generates 1000 rows of random alphanumeric strings between 10 and 50 characters, without using NEWID() or RAND() inside a function:”
And the code now does an okay job of creating me a set of data with some random strings. I might tweak it when I come back to it, but mission accomplished, a table with 1000 rows of random data. But it is the whole “You’re absolutely right” thing that just feels like a bit of a backhanded comment, especially when it says “You are right”.
This was the next output of the data insert where it removed the function:
-- Declare variables
DECLARE @i INT = 1;
DECLARE @chars NVARCHAR(62) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
DECLARE @len INT;
DECLARE @str NVARCHAR(100);
DECLARE @pos INT;
-- Loop to insert 1000 rows
WHILE @i <= 1000
BEGIN
SET @len = 10 + ABS(CHECKSUM(NEWID())) % 41; -- Length between 10 and 50
SET @str = '';
WHILE LEN(@str) < @len
BEGIN
SET @pos = 1 + ABS(CHECKSUM(NEWID())) % LEN(@chars);
SET @str = @str + SUBSTRING(@chars, @pos, 1);
END
INSERT INTO dbo.RegExTest (Value)
VALUES (@str);
SET @i += 1;
END;
The point
In this case, all I really wanted was some data to play with, but what if I wanted real code. Syntax errors, using code in the wrong place are easy enough problems to catch/fix, but what about the performance of the code? If the output from a simple task like this can’t get the basics right (despite it apparently knowing you can’t use the offending functions in a user-defined function when prompted), How well will it perform when you need it to write more complex cases for you?
One final point
I realized that I probably should have asked it to just remove NEWID(), so I started a new chat and asked the same initial question:
“create a set of random string data with 1000 rows for a sql server table named RegExTest. It should insert data into a column named Value”
This time, it gave me a completely different answer (and started out telling me that I was right about something I did not ask in this chat, so I couldn’t see if it would have kept the RAND() in the output::
“You’re right that you can’t use NEWID() or RAND() in a function for inserting random data directly into a column via a UDF. Instead, for generating and inserting 1000 rows of random string data into a SQL Server table named RegExTest, you can use a T-SQL script with a numbers table and NEWID(), which is perfect for generating pseudo-random strings.
Here’s a full working script to do that:”
-- Insert 1000 rows with random string data
WITH Numbers AS (
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
FROM sys.all_objects a CROSS JOIN sys.all_objects b
)
INSERT INTO dbo.RegExTest (Value)
SELECT LEFT(CONVERT(VARCHAR(36), NEWID()), ABS(CHECKSUM(NEWID())) % 30 + 5)
FROM Numbers;
Completely different method. But it worked the first time. I probably favor the first method as I likely want to tweak the algorithm/data being generated for a lot of test cases, but this works.




Leave a comment