SQL Server REGEXP_LIKE features augment, not replace, LIKE

So I expect most people (like myself), before they learned anything about the new RegEx features of SQL Server 2025 (and the Azure SQL DB too), assumed they would include a direct replacement for LIKE. Filtering data on a pattern has always been an important need when querying string data. For the most part, LIKE has always served our needs, but sometimes there are situations where it has been a bit too limiting.

As a community, we have regularly begged for RegEx capabilities to be imported into SQL Server (as it has been implemented in Oracle, PostgreSQL, and even MySQL, and they are finally here!

There are a lot of cool features in these new RegEx functions that I will explore in upcoming blogs, but I want to start by briefely discussing REGEXP_LIKE. and how it can be used alongside LIKE to get the most performance possible when filtering data.

SQL Server RegEx provides only functions

REGEXP_LIKE takes the concept of LIKE and expands it greatly, making a lot of the types of filters you might want to create either easier to create, or in some cases possible where LIKE may not. But to use it in the best performing way, it is important to realize these are functions, not operators.

Note that PostgreSQL does have an operator named SIMILAR TO that will do RegEx pattern matching, though I have not tested to see how it works with their indexes.

Basically using the code I created in my past blog, I created a set of data to play with. I have repeated the code in the Appendix of this article, and I added an index to the Value column. It is just random junk data.

Here is the structure of the table:

USE TempDb;

-- Create the table
CREATE TABLE dbo.RegExTest (
   RegExTestId INT IDENTITY(1,1) PRIMARY KEY,
   Value NVARCHAR(100) INDEX Value
);

Note that I have an index on the Value column. To this table, I am going to add a couple of known values so the random data won’t matter if you want to try this yourself:

INSERT INTO dbo.RegExTest(Value) 
VALUES ('This is my first known value'),
       ('This is my second known value');

Execute the following, and you can see in my table:

SELECT *
FROM RegExTest
ORDER BY RegExTestId;

There will be a bunch of rows with random characters returned, and my known value is at the bottom of the list (since typically a query like this will return rows in clustered index order, yours could be different, it doesn’t really matter what this data is, it is just to make sure that we can’t do LIKE '%' and get the same results and performance from an incorrect RegEx expression) :

RegExTestId Value
----------- -------------------------------------------------------
1 399s69FdlPXrMzqwil71gGYrXRMHOV65R45TvRxFYPsLqUIwnY
2 ZE0kbsGKqoqxcxQqoLV29kBzWTyVB3X5b4QxaymuH
3 2JxDD5PxoormnXYDiRfMcYiUYnKX
4 OeSBfdmlR0RFWAYPXG7hqIm8umpg
5 8CZyrUbsUZ
6 4nccu1NDuIzOn4NBOQ5jDAmklTViorGDs0DQ8
7 HNYWyxxnI9XcRXptwStyysdIvkeapHz9gXuQwNK
8 ez3M7SbDez09mqKpZQq
9 U5xZ8vk0EofjmDSp

1001 This is my first known value
1002 This is my second known value

(1002 rows affected)

Trying it out

Say you want to use a filter criteria that is only possible using a regular expression. I won’t pretend to know what one might be because learning regular expressions is a future project for me, but lets suppose that is the goal. The expression is not important to this first blog on Regular Expressions, and in fact I am going to stick with the easiest one of all: “Equality”.

Just like when you let a user type a LIKE expression, sometimes you end up with an equality operation.

So you write:

SELECT *
FROM RegExTest
WHERE REGEXP_LIKE(Value,'^This is my first known value$');

Note: A previous version of this blog used \b as a prefix and suffix. The basic message of the blog stands as originally written, but I have corrected this code to use ^ to denote a hard end to the filter, and $ to denote the end.

This expression requires the string to match exactly. The function does return a Boolean, so it might look a bit like this is an operator, but the parenthesis do indicate that this is a function that will need to be executed (more or less), row by row. (The optimizer could possibly see the same value and not need to execute every row perhaps, but the likelihood of that saving a lot of time would require a very non-selective data set.)

I am a bit confused about case sensitivity as the documentation seems to say that REGEXP_LIKE is case insensitive by default. This is not germane to this discussion but you can make it case sensitive for certain using, the ‘c’ flag:

SELECT *
FROM RegExTest
WHERE REGEXP_LIKE('^This is my first known value$',Value,'c');

The biggest thing to notice is the plan for these queries. For this simple equality example is a clustered index scan it is this:

This is because REGEXP_LIKE is a function, not an operator like LIKE is. It is a very similar thing to when you format a date column in an expression in a WHERE clause, that cannot be translated to an index.

However, when you do the same with a LIKE operator expression:

SELECT *
FROM RegExTest
WHERE Value LIKE 'This is my first known value';

The plan changes to something more tenable:

The plan now uses an Index Seek. but let’s say that our REGEXP_LIKE expression needed to be more complex than a LIKE could handle. In this case there are two possibilities.

  • Just like a complex LIKE expression, we bite the bullet and don’t use an index. This is often going to be the case when we know our expression doesn’t filter at all on the leading characters of the values.
  • Use both LIKE and REGEXP_LIKE when there is a leading reason.

For the example, lets just keep it simple. I will use LIKE to get initial set that can be formed by values that start with "This" and then REGEXP_LIKE to do the rest of the selecting.

SELECT *
FROM RegExTest
WHERE Value LIKE 'This%'
  AND REGEXP_LIKE(Value,'is my first known value$');

This still outputs a single row, but the plan is still using an Index Seek.

Obviously your mileage may vary based on your data and its selectivity, your requirements, and how many rows will match the LIKE expression, but that is true in any case.

It is kind of messy, but if you look at the XML of the plan, you will see that there are two steps internally. One is an index seek on the range of This to ThiT, or basically the results of this query:

SELECT *
FROM RegExTest
WHERE Value LIKE 'This%';

Then the intrinsic function REGEXP_LIKE is used to filter the items that matched the LIKE expression. It is cool that it didn’t need to stream to the results from the RegEx function into another operator that then filtered out rows that didn’t match the regular expression.

Summary

Adding regular expression functions to the SQL Server family of products is a big, exciting win for anyon who writes code in SQL Server. In future intalments of this series I will cover all of the functions and what they can do.

My first stop was to help others realze that one main thing I realized… that REGEXP_LIKE is going to be an exciting addition to the toolbox, but you are still going to need to keep LIKE around for the forseeable future, even if you are a pro at writing regular expressions.


Appendix

Code to create sample table and initial data set

USE TempDb;
GO

-- Drop the table if it exists
IF OBJECT_ID('dbo.RegExTest', 'U') IS NOT NULL
DROP TABLE dbo.RegExTest;

-- Create the table
CREATE TABLE dbo.RegExTest (
RegExTestId INT IDENTITY(1,1) PRIMARY KEY,
Value NVARCHAR(100) INDEX Value
);

-- Declare variables
DECLARE @i INT = 1;
DECLARE @chars NVARCHAR(62) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
DECLARE @len INT;
DECLARE @str NVARCHAR(100);
DECLARE @pos INT;

-- Loop to insert 1000 rows
WHILE @i <= 1000
BEGIN
SET @len = 10 + ABS(CHECKSUM(NEWID())) % 41; -- Length between 10 and 50
SET @str = '';

WHILE LEN(@str) < @len
BEGIN
    SET @pos = 1 + ABS(CHECKSUM(NEWID())) % LEN(@chars);
    SET @str = @str + SUBSTRING(@chars, @pos, 1);
END

INSERT INTO dbo.RegExTest (Value)
VALUES (@str);

SET @i += 1;

END;

2 responses to “SQL Server REGEXP_LIKE features augment, not replace, LIKE”

  1. Cardinality and REGEXP_LIKE – DrsqlBlog Avatar

    […] SQL Server REGEXP_LIKE features augment, not replace, LIKE […]

    Like

  2. Matching one of a set of characters with SQL Server Regular Expressions – Drsql's Blog Avatar

    […] Like normal, I will start with a few examples of using like expressions to achieve what it can achieve both naturally, and using some more complex coding. As I started out with my discussion of using regular expressions in SQL Server (SQL Server REGEXP_LIKE features augment, not replace, LIKE. SQL Server REGEXP_LIKE features augment, not replace, LIKE)/ […]

    Like

Leave a reply to Matching one of a set of characters with SQL Server Regular Expressions – Drsql's Blog Cancel reply

I’m Louis

I have been at this database thing for a long long time, with no plans to stop.

This is my blog site, companion to drsql.org

Recents