Flags in SQL Server Regular Expressions – Multiline Processing

Part 7 of Learn RegEx With Louis

In this week’s seventh entry of my learning RegEx series, I am going to do one last intro into expressions entry for a while. Next I will move into all of the functions that are available in SQL Server 2025 and Azure SQL (and I will come back if I learn any additional things that we need to cover.

There are currently only 4 flags that SQL Server supports and they are used to change some of the fundamental ways that the expressions are applied. These flags are:

i – insensitive
c – case sensitive
m – ^ and $ match end of line, not entire string
s – single line, dot matches newline

In Part 6, I covered i and c, not let’s do m and s. These flags are not ones I expect to use all that often, but they are definitely useful to know.

Intro

These flags (m and s) deal with the cases where you are dealing with what will typically be longer strings that have multiple lines. I say typically, because a lot of these functions will be useful when dealing with messy data you are parsing through, trying to ignore certain characters. They deal with the case where there are strings that stretch across multiple lines, that is a string with multiple linefeed characters in them. There are a few cases where you will want to treat these strings differently when you are processing them, so let’s look at that.

Longer strings? In the documentation, Microsft states: “The REGEXP_LIKE, REGEXP_COUNT, and REGEXP_INSTR functions support LOB types (varchar(max) and nvarchar(max)) up to 2 MB for the string_expression parameter.”
2MB expression? Holy moly, please don’t make me debug a nearly 2MB RegEx expression!

As normal, there is a table in the Appendix of this post, and it contains a table we will use. I will load that table with the following rows that we can work with.

TRUNCATE TABLE RegExTest;
INSERT INTO RegExTest
VALUES (
'12345
abcde
6789012'), --numbers then letters, then numbers

('abcdef
ghih
ijklm'), --three lines of letters

('abcdef
ghih
1234'), --letters, letters, numbers

(
'1234
5678
910'); --three lines of numbers

Flag m – multiline

This flag lets you start to treat the multiple line string as more of a multi-value string. Each of the lines in the string can then be processed individually.

As an example, we can look easily for a string that starts with one or more letters. This will only look for the rows where the entire string starts with a letter

--starts with a letter
SELECT Value
FROM   RegExTest
WHERE REGEXP_LIKE( Value,'^[a-z]+');

This returns (I added space between the items for clarity).:

Value
-------------------------
abcdef
ghih
1234

abcdef
ghih
ijklm

But if we want to get back this row too:

12345
abcde
6789012

Then we can add the m flag. Now it treats each linefeed as starting a new item.

--a line starts with a number
SELECT Value
FROM   RegExTest
WHERE REGEXP_LIKE( Value,'^[a-z]+','m');

This now returns:

Value
--------------------------
12345
abcde
6789012

abcdef
ghih
1234

abcdef
ghih
ijklm

You can see all the matches using REGEP_MATCHES:

--show all of the matches in each of the rows.
SELECT RegExTest.Value, Matches.*
FROM   RegExTest
          CROSS APPLY REGEXP_MATCHES ( Value,'^[a-z]+','m') as Matches
ORDER BY Value, Match_id;

This returns a much harder to format string (so I will use an image in this case!) You can see the value (which is shown here without the line breaks by the grid), the match_id which appears as the second column is the position of the match in the string.

So finding strings in individual lines, from the start of the line onward is fairly straight forward and works as you might expect if you have gotten this far with regular expressions.

Matching complete lines

In this section I want to go to the next level and try to match the entire string. This will include any of the characters that make us a string, especially one with multiple lines in it.

To start this discussion, instead of using the table data, I will use a simpler example to demonstrate something that was really confusing to me. Take this following string in the @Value variable. Each line ends in a comma as far as we can see.

DECLARE @Value nvarchar(40) =
'world,
hello,
worldly'

SELECT  start_position, end_position, match_value
FROM    REGEXP_MATCHES ( @Value,'^(world.+|hello.+)','m');

When you look at the output in text view, you will see something interesting (the extra space between lines is part of the results):

start_position end_position match_value
-------------- ------------ -------------------
1              7            world,

9              15           hello,

17             23           worldly

Each match_value ends with a comma, but also there is space between each room. Also, you can see that there is a gap of 2 characters between the items in the list. This is because in typical text, lines end with a carriage return character, followed by a rine feed character. Respectively CHAR(13) and the CHAR(10) (or NCHAR(13) and NCHAR(10) for Unicode). The m flag splits the strings on the Line Feed character.

Note: Not all strings may have both carriage returns and linefeeds, but it is very typical for basic text editors to not include both, though I do see that you can remove them using notepad++. You should likely assume that text will always have carriage returns if the column allows it. (That’s a can of worms right there, because who generally stops carriage returns in their data? UI sure, but probably rare in a database column to have constraints to stop that.

So if you want to stop the parsing at the comma or the CHAR(13), you need to check for this. So now, instead of the + that said to repeat the previous character pattern (which I had used a +)

DECLARE @Value nvarchar(40) =
'world,
hello,
worldly'

SELECT  start_position, end_position, match_value
FROM    REGEXP_MATCHES (
            @Value,'^(world[a-z]?[^\s,]?|hello[a-z]?[^\s,]?)','m');

This returns:

start_position end_position match_value
**-------------- ------------ ---------------
1              5            world
9              13           hello
17             23           worldly

Breaking down the RegEx pattern:

^(world[a-z]?[^\s,]?|hello[a-z]?[^\s,]?)

^ – the start of the line

Then either:
world[a-z]?[^\s,]?

Starts with world, then 0 or more other letters, and then not whitespace or , which ends at the endline. Or the same prefixed with hello:

hello[a-z]?[^\s,]?

Back to our table examples, this next query returns 0 rows, because the meaning of this filter is data that is all numbers, and the linefeed is not a number.

SELECT Value, match_value
FROM   RegExTest
         CROSS APPLY REGEXP_MATCHES(Value,'^[\d]+$');

However, this will match:

SELECT Value, match_id, match_value
FROM   RegExTest
         CROSS APPLY REGEXP_MATCHES(Value,'^[\d]+$','m');

But it only matched one line in each case. The last one (spaces added for clarity again):

Value       match_id   match_value
----------- ---------- --------------
1234
5678
910         1          910

12345
abcde
6789012     1          6789012

abcdef
ghih
1234        1          1234

The problem is that m breaks on a linefeed, not a carriage return. So you need to include that in your RegEx expression if you are going to process it.

SELECT Value, match_id, match_value
FROM   RegExTest
CROSS APPLY REGEXP_MATCHES(Value,'^[\d]+\r?$','m');

But this output, while it will include all the matches, will still include the carriage return character (no spaces added to output):

Value            match_id   match_value
---------------- ---------- ---------------
1234
5678
910              1          1234

1234
5678
910              2          5678

1234
5678
910              3          910
12345
abcde
6789012          1          12345

12345
abcde
6789012          2          6789012
abcdef
ghih
1234             1          1234

Frankly, as I try this more and more, I would likely suggest to either expect to either:

Need to do some formatting on the output for very large strings
Remove the CHAR(13) or (NCHAR(13) for Unicode) characters from your strings if you need to break on the linefeed. So the following works in a straighforward manner to break on linefeed by adding: REPLACE(Value,NCHAR(13),'') to the code.

SELECT Value, match_id, match_value
FROM   RegExTest
     CROSS APPLY REGEXP_MATCHES(
                 REPLACE(Value,NCHAR(13),''),'^[\d]+$','m');

Now you get the previous output with out the extra whitespace. For really large inputs, probably easier to do the formatting post fetching data. Either way, this is really cool and likely useful occasionally.

Flag s – Single Line

Using the s flag, the purpose is still working with multiple line strings, but in this case, you want to work with it like it was a single line or value, instead of multiple values as with the m flag. The one use case I have in mind is when you want to look at the start and end of a long string. Basically all this flag does is to make the . include the linefeed character in its matches, which is the only character it doesn’t match to start with.

For this example, I will start with a simple string and the simplest of filters. My goal being to match anything, so I try this filter: '^.+$' From start to end of the string, any character:

--returns no rows, because . doesn't match LineFeed by default.
DECLARE @Value nvarchar(40)

set @Value =
'12345
3504023
6789012'

SELECT *
FROM    REGEXP_MATCHES ( @Value,'^.+$');

This returns 0 rows, because of the only one character it won’t match, the linefeed character.

Change this to use the s flag, then you will get a match the the entire value.

SELECT *
FROM    REGEXP_MATCHES ( @Value,'^.+$','s');

This now returns:

          match_id     match_value
--------- ------------ --------------
12345
3504023
6789012    1           12345
3504023
6789012

One of the hardest things when working on this stuff is seeing the output. I had to do quite a bit of formatting on the output all through this article, because the grid doesn’t show multi-line output, and the texh output doesn’t format it.

Finally, as I noted at the start of the blog, the main use case I can see for this is to to get the rows that start and end with something like a string of numbers. In my sample query, starts with 4 numbers. This filter will do that, but it fails to return data because of all of the multi-line data.

SELECT Value, match_id, match_value
FROM   RegExTest
CROSS APPLY REGEXP_MATCHES( Value,'^[\d].+[\d]$');

But use the s flag with the same filter:

SELECT Value, match_id, match_value
FROM   RegExTest
CROSS APPLY REGEXP_MATCHES( Value,'^[\d].+[\d]$','s');

This output is:

Value      match_id   match_value
---------- ---------- -------------
1234
5678
910        1          1234
                      5678
                      910
12345
abcde
6789012    1          12345
                      abcde
                      6789012

Summary

Whew, that was quite a trip, and while these flags are probably not something you or I will need all that often…they could very much be something that makes life easy one day when you need to look for certain kinds of data in a document or input file. Either treating lines in a column as individual values with the m flag, or treating data with carriage returns and linefeeds as a single line.

Appendix

USE TempDB;
GO

SET NOCOUNT ON;

-- Drop the table if it exists
IF OBJECT_ID('dbo.RegExTest', 'U') IS NOT NULL
DROP TABLE dbo.RegExTest;

-- Create the table
CREATE TABLE dbo.RegExTest (
RegExTestId INT IDENTITY(1,1) CONSTRAINT PK_RegExTest PRIMARY KEY,
Value NVARCHAR(100) CONSTRAINT AK_RegExTest UNIQUE
);

2 responses to “Flags in SQL Server Regular Expressions – Multiline Processing”

Multi- and Single-Line Regular Expression Processing in SQL Server – Curated SQL

2025-08-21

[…] Louis Davidson continues a series on regular expressions in SQL Server: […]

LikeLike

REGEXP_ Functions in SQL Server 2025 – REGEXP_SPLIT_TO_TABLE – Drsql's Blog

2025-09-03

[…] A big limitation to STRING_SPLIT is that you are limited to one character for the delimeter. You can’t split on c, or even carriage return/linefeed (you could replace the carriage return or linefeeds like we did back in the multi-line blog entry of this series): […]

LikeLike

Drsql's Database Musings