Part 7 of Learn RegEx With Louis
In this week’s seventh entry of my learning RegEx series, I am going to do one last intro into expressions entry for a while. Next I will move into all of the functions that are available in SQL Server 2025 and Azure SQL (and I will come back if I learn any additional things that we need to cover.
There are currently only 4 flags that SQL Server supports and they are used to change some of the fundamental ways that the expressions are applied. These flags are:
i – insensitivec – case sensitivem – ^ and $ match end of line, not entire strings – single line, dot matches newline
In Part 6, I covered i and c, not let’s do m and s. These flags are not ones I expect to use all that often, but they are definitely useful to know.
Intro
These flags (m and s) deal with the cases where you are dealing with what will typically be longer strings that have multiple lines. I say typically, because a lot of these functions will be useful when dealing with messy data you are parsing through, trying to ignore certain characters. They deal with the case where there are strings that stretch across multiple lines, that is a string with multiple linefeed characters in them. There are a few cases where you will want to treat these strings differently when you are processing them, so let’s look at that.
Longer strings? In the documentation, Microsft states: “The
REGEXP_LIKE,REGEXP_COUNT, andREGEXP_INSTRfunctions support LOB types (varchar(max) and nvarchar(max)) up to 2 MB for the string_expression parameter.”
2MB expression? Holy moly, please don’t make me debug a nearly 2MB RegEx expression!
As normal, there is a table in the Appendix of this post, and it contains a table we will use. I will load that table with the following rows that we can work with.
TRUNCATE TABLE RegExTest;
INSERT INTO RegExTest
VALUES (
'12345
abcde
6789012'), --numbers then letters, then numbers
('abcdef
ghih
ijklm'), --three lines of letters
('abcdef
ghih
1234'), --letters, letters, numbers
(
'1234
5678
910'); --three lines of numbers
Flag m – multiline
This flag lets you start to treat the multiple line string as more of a multi-value string. Each of the lines in the string can then be processed individually.
As an example, we can look easily for a string that starts with one or more letters. This will only look for the rows where the entire string starts with a letter
--starts with a letter SELECT Value FROM RegExTest WHERE REGEXP_LIKE( Value,'^[a-z]+');
This returns (I added space between the items for clarity).:
Value ------------------------- abcdef ghih 1234 abcdef ghih ijklm
But if we want to get back this row too:
12345 abcde 6789012
Then we can add the m flag. Now it treats each linefeed as starting a new item.
--a line starts with a number SELECT Value FROM RegExTest WHERE REGEXP_LIKE( Value,'^[a-z]+','m');
This now returns:
Value -------------------------- 12345 abcde 6789012 abcdef ghih 1234 abcdef ghih ijklm
You can see all the matches using REGEP_MATCHES:
--show all of the matches in each of the rows.
SELECT RegExTest.Value, Matches.*
FROM RegExTest
CROSS APPLY REGEXP_MATCHES ( Value,'^[a-z]+','m') as Matches
ORDER BY Value, Match_id;
This returns a much harder to format string (so I will use an image in this case!) You can see the value (which is shown here without the line breaks by the grid), the match_id which appears as the second column is the position of the match in the string.

So finding strings in individual lines, from the start of the line onward is fairly straight forward and works as you might expect if you have gotten this far with regular expressions.
Matching complete lines
In this section I want to go to the next level and try to match the entire string. This will include any of the characters that make us a string, especially one with multiple lines in it.
To start this discussion, instead of using the table data, I will use a simpler example to demonstrate something that was really confusing to me. Take this following string in the @Value variable. Each line ends in a comma as far as we can see.
DECLARE @Value nvarchar(40) = 'world, hello, worldly' SELECT start_position, end_position, match_value FROM REGEXP_MATCHES ( @Value,'^(world.+|hello.+)','m');
When you look at the output in text view, you will see something interesting (the extra space between lines is part of the results):
start_position end_position match_value -------------- ------------ ------------------- 1 7 world, 9 15 hello, 17 23 worldly
Each match_value ends with a comma, but also there is space between each room. Also, you can see that there is a gap of 2 characters between the items in the list. This is because in typical text, lines end with a carriage return character, followed by a rine feed character. Respectively CHAR(13) and the CHAR(10) (or NCHAR(13) and NCHAR(10) for Unicode). The m flag splits the strings on the Line Feed character.
Note: Not all strings may have both carriage returns and linefeeds, but it is very typical for basic text editors to not include both, though I do see that you can remove them using notepad++. You should likely assume that text will always have carriage returns if the column allows it. (That’s a can of worms right there, because who generally stops carriage returns in their data? UI sure, but probably rare in a database column to have constraints to stop that.
So if you want to stop the parsing at the comma or the CHAR(13), you need to check for this. So now, instead of the + that said to repeat the previous character pattern (which I had used a +)
DECLARE @Value nvarchar(40) =
'world,
hello,
worldly'
SELECT start_position, end_position, match_value
FROM REGEXP_MATCHES (
@Value,'^(world[a-z]?[^\s,]?|hello[a-z]?[^\s,]?)','m');
This returns:
start_position end_position match_value **-------------- ------------ --------------- 1 5 world 9 13 hello 17 23 worldly
Breaking down the RegEx pattern:
^(world[a-z]?[^\s,]?|hello[a-z]?[^\s,]?)
^ – the start of the line
Then either:world[a-z]?[^\s,]?
Starts with world, then 0 or more other letters, and then not whitespace or , which ends at the endline. Or the same prefixed with hello:
hello[a-z]?[^\s,]?
Back to our table examples, this next query returns 0 rows, because the meaning of this filter is data that is all numbers, and the linefeed is not a number.
SELECT Value, match_value
FROM RegExTest
CROSS APPLY REGEXP_MATCHES(Value,'^[\d]+$');
However, this will match:
SELECT Value, match_id, match_value
FROM RegExTest
CROSS APPLY REGEXP_MATCHES(Value,'^[\d]+$','m');
But it only matched one line in each case. The last one (spaces added for clarity again):
Value match_id match_value ----------- ---------- -------------- 1234 5678 910 1 910 12345 abcde 6789012 1 6789012 abcdef ghih 1234 1 1234
The problem is that m breaks on a linefeed, not a carriage return. So you need to include that in your RegEx expression if you are going to process it.
SELECT Value, match_id, match_value FROM RegExTest CROSS APPLY REGEXP_MATCHES(Value,'^[\d]+\r?$','m');
But this output, while it will include all the matches, will still include the carriage return character (no spaces added to output):
Value match_id match_value ---------------- ---------- --------------- 1234 5678 910 1 1234 1234 5678 910 2 5678 1234 5678 910 3 910 12345 abcde 6789012 1 12345 12345 abcde 6789012 2 6789012 abcdef ghih 1234 1 1234
Frankly, as I try this more and more, I would likely suggest to either expect to either:
- Need to do some formatting on the output for very large strings
- Remove the
CHAR(13)or (NCHAR(13)for Unicode) characters from your strings if you need to break on the linefeed. So the following works in a straighforward manner to break on linefeed by adding:REPLACE(Value,NCHAR(13),'')to the code.
SELECT Value, match_id, match_value
FROM RegExTest
CROSS APPLY REGEXP_MATCHES(
REPLACE(Value,NCHAR(13),''),'^[\d]+$','m');
Now you get the previous output with out the extra whitespace. For really large inputs, probably easier to do the formatting post fetching data. Either way, this is really cool and likely useful occasionally.
Flag s – Single Line
Using the s flag, the purpose is still working with multiple line strings, but in this case, you want to work with it like it was a single line or value, instead of multiple values as with the m flag. The one use case I have in mind is when you want to look at the start and end of a long string. Basically all this flag does is to make the . include the linefeed character in its matches, which is the only character it doesn’t match to start with.
For this example, I will start with a simple string and the simplest of filters. My goal being to match anything, so I try this filter: '^.+$' From start to end of the string, any character:
--returns no rows, because . doesn't match LineFeed by default. DECLARE @Value nvarchar(40) set @Value = '12345 3504023 6789012' SELECT * FROM REGEXP_MATCHES ( @Value,'^.+$');
This returns 0 rows, because of the only one character it won’t match, the linefeed character.
Change this to use the s flag, then you will get a match the the entire value.
SELECT * FROM REGEXP_MATCHES ( @Value,'^.+$','s');
This now returns:
match_id match_value
--------- ------------ --------------
12345
3504023
6789012 1 12345
3504023
6789012
One of the hardest things when working on this stuff is seeing the output. I had to do quite a bit of formatting on the output all through this article, because the grid doesn’t show multi-line output, and the texh output doesn’t format it.
Finally, as I noted at the start of the blog, the main use case I can see for this is to to get the rows that start and end with something like a string of numbers. In my sample query, starts with 4 numbers. This filter will do that, but it fails to return data because of all of the multi-line data.
SELECT Value, match_id, match_value FROM RegExTest CROSS APPLY REGEXP_MATCHES( Value,'^[\d].+[\d]$');
But use the s flag with the same filter:
SELECT Value, match_id, match_value FROM RegExTest CROSS APPLY REGEXP_MATCHES( Value,'^[\d].+[\d]$','s');
This output is:
Value match_id match_value
---------- ---------- -------------
1234
5678
910 1 1234
5678
910
12345
abcde
6789012 1 12345
abcde
6789012
Summary
Whew, that was quite a trip, and while these flags are probably not something you or I will need all that often…they could very much be something that makes life easy one day when you need to look for certain kinds of data in a document or input file. Either treating lines in a column as individual values with the m flag, or treating data with carriage returns and linefeeds as a single line.
Appendix
USE TempDB;
GO
SET NOCOUNT ON;
-- Drop the table if it exists
IF OBJECT_ID('dbo.RegExTest', 'U') IS NOT NULL
DROP TABLE dbo.RegExTest;
-- Create the table
CREATE TABLE dbo.RegExTest (
RegExTestId INT IDENTITY(1,1) CONSTRAINT PK_RegExTest PRIMARY KEY,
Value NVARCHAR(100) CONSTRAINT AK_RegExTest UNIQUE
);




Leave a comment