Sometimes along comes a feature that seems so obvious, so natural, that you wonder why it took so long for Microsoft to implement it. One of those features in SQL Server 2005 is the optional length parameter in the SUBSTRING function. It has long been one of those questions when you wrote a SUBSTRING expression when you wanted to go from the Nth character to the end of the string, how many characters do you want? And for the most part, it didn’t really matter.
But sometimes it did (especially when dealing with nvarchar(max) data.
Simple example
Consider this following statement:
SELECT SUBSTRING(CAST('1234567890' as char(30)),2,30);
What will the output be?
------------------------------234567890
From the output, it appears to be a 9 character string in a char(30) column (due to the 30 character output dashes for the column in the text output). But is it? (stay tuned because is it even a char?)
Let’s check this.
WITH Calculated AS (SELECT SUBSTRING(CAST('1234567890' AS char(30)),2,30) as test)SELECT '*' || test || '*' AS CheckTest, LEN(test) AS LEN_Test, DATALENGTH(test) AS DATALENGTH_TestFROM Calculated;
When in SQL Server 2025, as in Rome, use the new || concatenate operator.
This returns:
CheckTest LEN_Test DATALENGTH_Test-------------------------------- ----------- ---------------*234567890 * 9 29
You can see that the value is padded to 29 characters with spaces, not 30.
LEN is included just to show that that function does in fact, only count non-space characters, but DATALENGTH tells us that it is really a 29 character string. So let me change this to surround the test value with '*' characters, this time, using the optional length parameter instead of 30:
WITH Calculated AS (SELECT SUBSTRING(CAST('1234567890' AS char(30)),2) as test)SELECT test, LEN(test) AS LEN_Test, DATALENGTH(test) AS DATALENGTH_TestFROM Calculated;
As expected, it does the same thing as the previous statement did. But what if I don’t set the datatype in a CAST??
WITH Calculated AS (SELECT SUBSTRING('1234567890',2,30) as test)SELECT '*' || test || '*' AS CheckTest, LEN(test) AS LEN_Test, DATALENGTH(test) AS DATALENGTH_TestFROM Calculated;
The change is basically that the data type is now inferred as varchar(10), based on the source string.
CheckTest LEN_Test DATALENGTH_Test------------ ----------- ---------------*234567890* 9 9
What is the datatype?
When I was finished with the basics, I thought, this feels good, but maybe the blog is a bit short. So I decided to look around and see if I missed anything. The first post I found that was interesting was on SQL Server Central, but from Steve Jones (I did not go out looking for it, it was what I found :))
He noted 2 things. 1, if either of the length parameters are NULL, you get NULL as a return:
SELECT SUBSTRING('1234567890',2,NULL), SUBSTRING('1234567890',NULL,10);
This returns: NULL, NULL
Useful information, and something that comes up a good bit when using variable parameters or complex expressions.
The other point he made was about the output dataype. In the documentation, it claimed that the output type is varchar for a char/varchar, or text input. But, that did not match what I thought I saw in the first example.
I passed a char, and it was padded:
WITH Calculated AS (SELECT SUBSTRING(cast('1234567890' AS char(30)),2) as test)SELECT test, LEN(test) AS LEN_Test, DATALENGTH(test) AS DATALENGTH_TestFROM Calculated;
If that is varchar, this shouldn’t be padded, right?
test LEN_Test DATALENGTH_Test------------------------------ ----------- ---------------234567890 9 29
Can varchar values be padded? Of course they can. But I had actually forgotten that was the case (I can’t remember everything!) You can see the output pretty easily using the following table create statement (I am working in TempDb, be careful creating tables in a server that isn’t your personal server.):
WITH Calculated AS (SELECT SUBSTRING(cast('1234567890' AS char(30)),2) as test)SELECT test, cast(test as varchar(100)) as vartestINTO testTable FROM Calculated;
Then a quick look a the table structure:
EXEC sp_help 'dbo.testTable';
From which you can see:
Name Owner Type Created_datetime------------- ---------- ------------ -----------------------testTable dbo user table 2026-04-04 19:25:58.910Column_name Type Computed Length Prec Scale Nullable ------------- ---------- ---------------- ----------- ----- ----- -----------test varchar no 30 yes vartest varchar no 100 yes
In that output you can see a few other column settings that are interesting:
TrimTrailingBlanks FixedLenNullInSource-------------------- --------------------no yes no yes
TrimTrailingBlanks is based on the settings in ANSI_PADDING. No means that ANSI_PADDING is on, and my trailing blanks are preserved. FixedLenNullInSource is apparently a legacy value and meant that the column allows NULL values and was varbinary, varchar, binary, or char.
You can see that the data is still padded in both cases, even with the two different data types.:
SELECT DATALENGTH(test), DATALENGTH(vartest)FROM dbo.testTable;
Both are still 29 bytes.
----------- -----------29 29
--Cleanup the tableDROP TABLE dbo.testTable;
Where this counts most
LONG strings. It was always semi-obvious that we could just put in a number larger than the length of the string, and it would just return the remainder of the string. Most of the time one works with variable character lengths when doing string manipulation, but if you have a very long string, and want everything from position 2 on, you don’t need to calculate the length anymore.
So say you have a string like this:
DECLARE @LongString VARCHAR(max) = REPLICATE('A', 1000000);
Quite often the actual length might not be so simple, but even in this case it is not likely you just know the data size you have to deal with in a varchar(max) column. You used to have to do something like this:
SELECT SUBSTRING(@LongString, 2,1000000);
Or often people might think they needed do TO (I was always concerned about this too!):
SELECT SUBSTRING(@LongString, 2, LEN(@LongString)-2);
But not you can just leave it off and know you are getting all the data you expect. The output is varchar, but if the data ends with spaces, that data is not truncated.
SELECT SUBSTRING(@LongString, 2) AS Result
This returns 7999 characters, starting from position 2
DECLARE @LongString varchar(max) = REPLICATE(cast('A' as varchar(max)), 1000000); SELECT DATALENGTH(SUBSTRING(@LongString, 2,1000000));SELECT DATALENGTH(SUBSTRING(@LongString, 2, LEN(@LongString)-1));SELECT DATALENGTH(SUBSTRING(@LongString, 2)) AS Result;
The all return 999999, the last one we just didn’t need to know how long the string was.
Summary
Not requiring the length parameter on a SUBSTRING is a nice change, even if it doesn’t make that much difference in the long run most of the time. Where it usually became an problem was with a really large value, but I know I have many times just took brain cycles thinking about whether or not to specify the length of the string or the output, or the target, and this makes things a lot easier.



Leave a Reply