Part 5 of Learn RegEx With Louis

The goal of this week’s entry is specifically to show how to see how multiple matches can be viewed using SQL Server’s RegEx, specifically to make the examples clearer (especially in the upcoming entries).

There are several functions that you can use where multiple matches are used as part of the output:

  • REGEXP_REPLACE – Like the typical replace function, can replace multiple (or a specific) values that match the RegEx expression
  • REGEXP_SUBSTR – Can be used to fetch a substring based on a certain occurrence number
  • REGEXP_INSTR – Can find a specific occurrence of a string based on the RegEx expression
  • REGEXP_COUNT – Counts the number of matches in a string to a RegEx expression
  • REGEXP_SPLIT_TO_TABLE – Used like the typical SPLIT_TO_TABLE function, but the delimiter is a RegEx expression
  • REGEXP_MATCHES – Outputs a table of every character string that matches your RegEx expression.

Basically every RegEx function other than the one I have focused on, REGEXP_LIKE. In this entry, I want to expand the discussion about multiple matches by adding the REGEXP_MATCHES function to the toolbox so I can show more output (I guessed which function to use wrong, as REGEXP_SPLIT_TO_TABLE LOOKS like the right thing, even if the SPLIT_TO_TABLE should ahve been a clue.

In this article, we are going to stop looking at new ways to write a regular expression search pattern and add a tool to our toolbelt. Using REGEXP_MATCHES to see how many times a string matches a RegEx search pattern.

Multiple Matches

For example, consider a string like 1234567890. Using REGEXP_LIKE, when you query:

SELECT CASE WHEN REGEXP_LIKE('1234567890','\d') THEN 1 ELSE 0 END;

The output is 1. Because that string includes a digit. But it actually matches more than once, which the other functions would continue to process. Using REGEXP_MATCHES, we can see all the matches:

SELECT *
FROM   REGEXP_MATCHES('123456789','\d');

The output has several columns that give you details about the matches:

match_id  start_position end_position match_value substring_matches
--------- -------------- ------------ ----------- -----------------------
1         1              1            1           [{"value":"1","start":1,"length":1}]
2         2              2            2           [{"value":"2","start":2,"length":1}]
3         3              3            3           [{"value":"3","start":3,"length":1}]
4         4              4            4           [{"value":"4","start":4,"length":1}]
5         5              5            5           [{"value":"5","start":5,"length":1}]
6         6              6            6           [{"value":"6","start":6,"length":1}]
7         7              7            7           [{"value":"7","start":7,"length":1}]
8         8              8            8           [{"value":"8","start":8,"length":1}]
9         9              9            9           [{"value":"9","start":9,"length":1}]

The columns are:

  • match_id – Just a unique value for the data set.
  • start_position – the position in the character set
  • end_position – the position of the last character in the match
  • match_value – the vale that matched the pattern
  • substring_matches – the value, start, and length in a JSON format (for some reason that I will hopefully understand one day!)

From here on I will only include these three columns from the REGEXP_MATCHES output: start_position, end_position, match_value which should be enough to see (and less to format for the post!) So take a slightly more interesting expression like the following that looks for any combination of 2-3 numerical digits:

SELECT start_position, end_position, match_value
FROM   REGEXP_MATCHES('123456789','\d{2,3}');

Note that it doesn’t stop when it matches 2 characters, but matches all three before it stops and outputs a match before moving on:

start_position end_position match_value
-------------- ------------ -----------
1              3            123
4              6            456
7              9            789

And if you use \d+, it matches between 1 and infinity number of digits:

SELECT start_position, end_position, match_value
FROM   REGEXP_MATCHES('123456789','\d+');
--which would be the same for '\d{1,N} for any value 
--of N >= 9 in this case:

The output here is one row with all of the characters, because each additional character continues to match the pattern:

start_position end_position match_value
------------- ------------ -----------
1              9            123456789

Multiple matches, table of values

Now, instead of just looking at one row in the output, I will examine multiple rows in a table. Going back to our table example that has been used in all of the articles, let’s load in some rows. These are the rows I used in this blog. Like always, the table create is in the Appendix:

TRUNCATE TABLE RegExTest;

INSERT INTO RegExTest (Value)
VALUES ('aa'),('aaa'),('aaaa'),('aaaaa'),
('baaaaaa'),('aaaaab'),('baaaaab'),
('bababab'),('bababac'),('cbababac'),
('b1ab2ab3ab4a');

From an example in the aforementioned blog, I used this query:

SELECT Value
FROM RegExTest
WHERE REGEXP_LIKE(Value,'a{3,4}');

You can see that it matches these rows:

Value
-----------
aaa
aaaa
aaaaa
aaaaab
baaaaaa
baaaaab

Say we wanted to look at how a given pattern connected to a set of rows. We can use CROSS APPLY to send the values to the REGEXP_MATCHES function:

SELECT CAST(Value as nvarchar(15)) as Value,
       REGEXOUT.start_position, REGEXOUT.end_position,
       REGEXOUT.match_value
FROM RegExTest
        CROSS APPLY REGEXP_MATCHES(Value,'a{3,4}') as REGEXOUT;

The rows that are output are the same as previously, but now we can see the match and the length of the match.

Value           start_position end_position match_value
-------------- -------------- ------------ ---------------
aaa             1              3            aaa
aaaa            1              4            aaaa
aaaaa           1              4            aaaa
aaaaab          1              4            aaaa
baaaaaa         2              5            aaaa
baaaaab         2              5            aaaa

Change that RegEx expression to just a{2}:

SELECT cast(Value as nvarchar(15)) as Value,
       REGEXOUT.start_position, REGEXOUT.end_position,
       REGEXOUT.match_value
FROM RegExTest
       CROSS APPLY REGEXP_MATCHES(Value,'a{2}') as REGEXOUT;

Now the rows that are output from RegExTest are the same, but the number of matches is quite different (and you will see the RegExTest rows repeated):

Value           start_position end_position match_value
-------------- -------------- ------------ -----------
aa              1              2            aa
aaa             1              2            aa
aaaa            1              2            aa
aaaa            3              4            aa
aaaaa           1              2            aa
aaaaa           3              4            aa
aaaaab          1              2            aa
aaaaab          3              4            aa
baaaaaa         2              3            aa
baaaaaa         4              5            aa
baaaaaa         6              7            aa
baaaaab         2              3            aa
baaaaab         4              5            aa

Summary

In this post, we have established a new tool that can be used to see the multiple matches by outputting them to a table. This is a tool I will use more in future posts (especially as I start to veer away from pattern entries and cover the functions that I started the blog with. But now we have the tool to show all of the possible matches of a RegEx pattern expression to a value in a literal, variable, or even values in a table.

Appendix

USE TempDB;
GO

SET NOCOUNT ON;

-- Drop the table if it exists
IF OBJECT_ID('dbo.RegExTest', 'U') IS NOT NULL
DROP TABLE dbo.RegExTest;

-- Create the table
CREATE TABLE dbo.RegExTest (
RegExTestId INT IDENTITY(1,1) CONSTRAINT PK_RegExTest PRIMARY KEY,
Value NVARCHAR(100) CONSTRAINT AK_RegExTest UNIQUE
);

2 responses to “Viewing Multiple Matches in SQL Server Regular Expression Functions”

  1. […] Louis Davidson has popped and therefore cannot stop: […]

    Like

  2. REGEXP_ Functions in SQL Server 2025 – REGEXP_SUBSTR – Drsql's Blog Avatar

    […] my blog that was basically about REGEXP_MATCHES, I showed how you could see all the matches that an RegEx expression had with a string. […]

    Like

Leave a reply to REGEXP_ Functions in SQL Server 2025 – REGEXP_SUBSTR – Drsql's Blog Cancel reply

I’m Louis

I have been at this database thing for a long long time, with no plans to stop.

This is my blog site, companion to drsql.org

Recents