I was trying to test out the RegEx capabilities in SQL Server 2025. I wanted to generate a really random, flexibly large set of data pretty quickly so I could just try to see if indexes are used ever with RegEx. So I asked ChatGPT, and then the fun kind of started.

So, like any good writer with plenty of other things to do, I decided to document my trials.

Try Number 1

So I went to ChatGPT to generate some data for me. I asked it:

“create a set of random string data with 1000 rows for a sql server table named RegExTest. It should insert data into a column named Value”

The output was:


-- Drop the table if it exists
IF OBJECT_ID('dbo.RegExTest', 'U') IS NOT NULL
DROP TABLE dbo.RegExTest;

-- Create the table
CREATE TABLE dbo.RegExTest (
ID INT IDENTITY(1,1) PRIMARY KEY,
Value NVARCHAR(100)
);

-- Helper function to generate random strings
-- Creates a function to return a random alphanumeric string
CREATE OR ALTER FUNCTION dbo.GetRandomString (@Length INT)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE @Chars NVARCHAR(100) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'
DECLARE @Result NVARCHAR(MAX) = ''
DECLARE @i INT = 0
DECLARE @Index INT
WHILE @i < @Length
BEGIN
    SET @Index = CAST(RAND(CHECKSUM(NEWID())) * LEN(@Chars) + 1 AS INT)
    SET @Result = @Result + SUBSTRING(@Chars, @Index, 1)
    SET @i += 1
END

RETURN @Result
END;
GO

-- Insert 1000 rows with random strings
DECLARE @i INT = 1;
WHILE @i <= 1000
BEGIN
INSERT INTO dbo.RegExTest (Value)
VALUES (dbo.GetRandomString(10 + ABS(CHECKSUM(NEWID())) % 41)); -- strings between 10 and 50 chars

SET @i += 1;

END;

But this was really not right at all. The function is the big issue, because while it looks sound, there are two big issues. You can’t used NEWID() in a function, and if you remove that, you also find you cannot use RAND() in a function.

Try Number 2

So I go back to ChatGPT and say

“this doesn’t work, you can’t use newid or rand in a function in sql server”

To which it replies:

“You’re absolutely right — in SQL Server, you cannot use non-deterministic functions like NEWID() or RAND() inside a user-defined function (UDF). Let’s fix that using a T-SQL loop with inline logic instead of a scalar function.

Here’s a fully working script that generates 1000 rows of random alphanumeric strings between 10 and 50 characters, without using NEWID() or RAND() inside a function:”

And the code now does an okay job of creating me a set of data with some random strings. I might tweak it when I come back to it, but mission accomplished, a table with 1000 rows of random data. But it is the whole “You’re absolutely right” thing that just feels like a bit of a backhanded comment, especially when it says “You are right”.

This was the next output of the data insert where it removed the function:

-- Declare variables
DECLARE @i INT = 1;
DECLARE @chars NVARCHAR(62) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
DECLARE @len INT;
DECLARE @str NVARCHAR(100);
DECLARE @pos INT;

-- Loop to insert 1000 rows
WHILE @i <= 1000
BEGIN
SET @len = 10 + ABS(CHECKSUM(NEWID())) % 41; -- Length between 10 and 50
SET @str = '';

WHILE LEN(@str) < @len
BEGIN
SET @pos = 1 + ABS(CHECKSUM(NEWID())) % LEN(@chars);
SET @str = @str + SUBSTRING(@chars, @pos, 1);
END

INSERT INTO dbo.RegExTest (Value)
VALUES (@str);

SET @i += 1;
END;

The point

In this case, all I really wanted was some data to play with, but what if I wanted real code. Syntax errors, using code in the wrong place are easy enough problems to catch/fix, but what about the performance of the code? If the output from a simple task like this can’t get the basics right (despite it apparently knowing you can’t use the offending functions in a user-defined function when prompted), How well will it perform when you need it to write more complex cases for you?

One final point

I realized that I probably should have asked it to just remove NEWID(), so I started a new chat and asked the same initial question:

“create a set of random string data with 1000 rows for a sql server table named RegExTest. It should insert data into a column named Value”

This time, it gave me a completely different answer (and started out telling me that I was right about something I did not ask in this chat, so I couldn’t see if it would have kept the RAND() in the output::

“You’re right that you can’t use NEWID() or RAND() in a function for inserting random data directly into a column via a UDF. Instead, for generating and inserting 1000 rows of random string data into a SQL Server table named RegExTest, you can use a T-SQL script with a numbers table and NEWID(), which is perfect for generating pseudo-random strings.

Here’s a full working script to do that:”

-- Insert 1000 rows with random string data
WITH Numbers AS (
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
FROM sys.all_objects a CROSS JOIN sys.all_objects b
)
INSERT INTO dbo.RegExTest (Value)
SELECT LEFT(CONVERT(VARCHAR(36), NEWID()), ABS(CHECKSUM(NEWID())) % 30 + 5)
FROM Numbers;

Completely different method. But it worked the first time. I probably favor the first method as I likely want to tweak the algorithm/data being generated for a lot of test cases, but this works.

3 responses to “ChatGPT’s Attitude is interesting”

  1. Generating Random Strings with GenAI | Voice of the DBA Avatar

    […] in AI is taking things other people do and seeing how well the AI works. In this case, I took a Prompt from Louis and tried to see how well it worked. This post describes the results from a few […]

    Like

  2. […] in AI is taking things other people do and seeing how well the AI works. In this case, I took a Prompt from Louis and tried to see how well it worked. This post describes the results from a few […]

    Like

Leave a reply to Generating Random Strings with GenAI | Voice of the DBA Cancel reply

I’m Louis

I have been at this database thing for a long long time, with no plans to stop.

This is my blog site, companion to drsql.org

Recents