Part 5 of Learn RegEx With Louis
The goal of this week’s entry is specifically to show how to see how multiple matches can be viewed using SQL Server’s RegEx, specifically to make the examples clearer (especially in the upcoming entries).
There are several functions that you can use where multiple matches are used as part of the output:
REGEXP_REPLACE– Like the typical replace function, can replace multiple (or a specific) values that match the RegEx expressionREGEXP_SUBSTR– Can be used to fetch a substring based on a certain occurrence numberREGEXP_INSTR– Can find a specific occurrence of a string based on the RegEx expressionREGEXP_COUNT– Counts the number of matches in a string to a RegEx expressionREGEXP_SPLIT_TO_TABLE– Used like the typical SPLIT_TO_TABLE function, but the delimiter is a RegEx expressionREGEXP_MATCHES– Outputs a table of every character string that matches your RegEx expression.
Basically every RegEx function other than the one I have focused on, REGEXP_LIKE. In this entry, I want to expand the discussion about multiple matches by adding the REGEXP_MATCHES function to the toolbox so I can show more output (I guessed which function to use wrong, as REGEXP_SPLIT_TO_TABLE LOOKS like the right thing, even if the SPLIT_TO_TABLE should ahve been a clue.
In this article, we are going to stop looking at new ways to write a regular expression search pattern and add a tool to our toolbelt. Using REGEXP_MATCHES to see how many times a string matches a RegEx search pattern.
Multiple Matches
For example, consider a string like 1234567890. Using REGEXP_LIKE, when you query:
SELECT CASE WHEN REGEXP_LIKE('1234567890','\d') THEN 1 ELSE 0 END;
The output is 1. Because that string includes a digit. But it actually matches more than once, which the other functions would continue to process. Using REGEXP_MATCHES, we can see all the matches:
SELECT *
FROM REGEXP_MATCHES('123456789','\d');
The output has several columns that give you details about the matches:
match_id start_position end_position match_value substring_matches
--------- -------------- ------------ ----------- -----------------------
1 1 1 1 [{"value":"1","start":1,"length":1}]
2 2 2 2 [{"value":"2","start":2,"length":1}]
3 3 3 3 [{"value":"3","start":3,"length":1}]
4 4 4 4 [{"value":"4","start":4,"length":1}]
5 5 5 5 [{"value":"5","start":5,"length":1}]
6 6 6 6 [{"value":"6","start":6,"length":1}]
7 7 7 7 [{"value":"7","start":7,"length":1}]
8 8 8 8 [{"value":"8","start":8,"length":1}]
9 9 9 9 [{"value":"9","start":9,"length":1}]
The columns are:
match_id– Just a unique value for the data set.start_position– the position in the character setend_position– the position of the last character in the matchmatch_value– the vale that matched the patternsubstring_matches– the value, start, and length in a JSON format (for some reason that I will hopefully understand one day!)
From here on I will only include these three columns from the REGEXP_MATCHES output: start_position, end_position, match_value which should be enough to see (and less to format for the post!) So take a slightly more interesting expression like the following that looks for any combination of 2-3 numerical digits:
SELECT start_position, end_position, match_value
FROM REGEXP_MATCHES('123456789','\d{2,3}');
Note that it doesn’t stop when it matches 2 characters, but matches all three before it stops and outputs a match before moving on:
start_position end_position match_value -------------- ------------ ----------- 1 3 123 4 6 456 7 9 789
And if you use \d+, it matches between 1 and infinity number of digits:
SELECT start_position, end_position, match_value
FROM REGEXP_MATCHES('123456789','\d+');
--which would be the same for '\d{1,N} for any value
--of N >= 9 in this case:
The output here is one row with all of the characters, because each additional character continues to match the pattern:
start_position end_position match_value ------------- ------------ ----------- 1 9 123456789
Multiple matches, table of values
Now, instead of just looking at one row in the output, I will examine multiple rows in a table. Going back to our table example that has been used in all of the articles, let’s load in some rows. These are the rows I used in this blog. Like always, the table create is in the Appendix:
TRUNCATE TABLE RegExTest;
INSERT INTO RegExTest (Value)
VALUES ('aa'),('aaa'),('aaaa'),('aaaaa'),
('baaaaaa'),('aaaaab'),('baaaaab'),
('bababab'),('bababac'),('cbababac'),
('b1ab2ab3ab4a');
From an example in the aforementioned blog, I used this query:
SELECT Value
FROM RegExTest
WHERE REGEXP_LIKE(Value,'a{3,4}');
You can see that it matches these rows:
Value ----------- aaa aaaa aaaaa aaaaab baaaaaa baaaaab
Say we wanted to look at how a given pattern connected to a set of rows. We can use CROSS APPLY to send the values to the REGEXP_MATCHES function:
SELECT CAST(Value as nvarchar(15)) as Value,
REGEXOUT.start_position, REGEXOUT.end_position,
REGEXOUT.match_value
FROM RegExTest
CROSS APPLY REGEXP_MATCHES(Value,'a{3,4}') as REGEXOUT;
The rows that are output are the same as previously, but now we can see the match and the length of the match.
Value start_position end_position match_value -------------- -------------- ------------ --------------- aaa 1 3 aaa aaaa 1 4 aaaa aaaaa 1 4 aaaa aaaaab 1 4 aaaa baaaaaa 2 5 aaaa baaaaab 2 5 aaaa
Change that RegEx expression to just a{2}:
SELECT cast(Value as nvarchar(15)) as Value,
REGEXOUT.start_position, REGEXOUT.end_position,
REGEXOUT.match_value
FROM RegExTest
CROSS APPLY REGEXP_MATCHES(Value,'a{2}') as REGEXOUT;
Now the rows that are output from RegExTest are the same, but the number of matches is quite different (and you will see the RegExTest rows repeated):
Value start_position end_position match_value -------------- -------------- ------------ ----------- aa 1 2 aa aaa 1 2 aa aaaa 1 2 aa aaaa 3 4 aa aaaaa 1 2 aa aaaaa 3 4 aa aaaaab 1 2 aa aaaaab 3 4 aa baaaaaa 2 3 aa baaaaaa 4 5 aa baaaaaa 6 7 aa baaaaab 2 3 aa baaaaab 4 5 aa
Summary
In this post, we have established a new tool that can be used to see the multiple matches by outputting them to a table. This is a tool I will use more in future posts (especially as I start to veer away from pattern entries and cover the functions that I started the blog with. But now we have the tool to show all of the possible matches of a RegEx pattern expression to a value in a literal, variable, or even values in a table.
Appendix
USE TempDB;
GO
SET NOCOUNT ON;
-- Drop the table if it exists
IF OBJECT_ID('dbo.RegExTest', 'U') IS NOT NULL
DROP TABLE dbo.RegExTest;
-- Create the table
CREATE TABLE dbo.RegExTest (
RegExTestId INT IDENTITY(1,1) CONSTRAINT PK_RegExTest PRIMARY KEY,
Value NVARCHAR(100) CONSTRAINT AK_RegExTest UNIQUE
);




Leave a comment