Comparing the contents of two values in a relational database can be complicated at times, particularly when the values may have a NULL value. What really makes this more troublesome is that this is not something that a typical language would have. Briefly, it comes down to the fact that in SQL, an uninitialized column (but also variable) is thought to have an UNKNOWN value. In most languages, an uninitialized variable is considered to have a value of NOTHING.
A bit of history
When you compare nothing to nothing, they match. And you can do math with these values. As Billy Preston once said, Nothing from Nothing leaves nothing. Nothing is a known value. But if you don’t know how much one has, you can’t compare that to any other value because an unknown value may be anything at all.
So in most SQL queries, when you compare NULL to NULL, you don’t get TRUE or FALSE, you get the mathematical equivalent of ¯_(ツ)_/¯ or UNKNOWN. And since comparisons are only treated as TRUE when they evaluate to TRUE, a NULL value compared to a NULL value in most comparisons (WHERE clause or CASE expression for example), we typically had to create messy comparisons to compare two values where one might be NULL and the other not.
There were some common ways to compare two values, like using ISNULL:
ISNULL(Value1,'Impossible Value') = ISNULL(Value2,'Impossible Value')
You can do the same with COALESCE too. Or this one that is safer unless you have a constraint on Value1 and Value2 that completely prevents that impossible value from being possible. This is safer, but a pain to write once, much less for 100 columns:
(Value1 = Value2 OR (Value1 IS NULL and Value2 IS NULL))
And I know I have overthought the negation of this last one and have written:
((Value1 <> Value2 OR (Value1 IS NULL and Value2 IS NOT NULL) OR (Value1 IS NOT NULL and Value2 IS NULL))
Now that 100 comparisons is 400 lines of code.
Enter IS DISTINCT FROM
IS DISTINCT FROM and IS NOT DISTINCT FROM is a command that is standard (I mean with a name like that, it had to be from a standards committee, right?) command that lets you do the comparison between two values that never returns UNKNOWN/NULL. It was added to SQL Server in the 2022 version and pertains to PostgreSQL as well.
The name can be a bit confusing, because the NOT version means that it is equal in value or NULL value. So, a TRUE value returned from the operator means:
IS DISTINCT FROM– The values are different from one anotherIS NOT DISTINCT FROM– The values are the same
To demonstrate:
DECLARE @Value1 int, @Value2 int;
SELECT CASE WHEN @Value1 IS NOT DISTINCT FROM @Value2
THEN 'Distinct Match'
WHEN @Value1 IS DISTINCT FROM @Value2
THEN 'Distinct Non Match'
END AS [IS [[NOT]] DISTINCT],
CASE WHEN @value1 = @value2 THEN 'Matches'
--if it doesn't match, it must be false. but if this doesn't
--return true since we negated the true value, then it won't
--match this, stating that we have one or more mull values
WHEN NOT(@value1 = @Value2) THEN 'Doesn''t Match'
ELSE 'One or More NULL values'
END AS [Equals];
For the two NULL values, it returns:
IS [[NOT] DISTINCT Equals ------------------ ----------- Distinct Match Matches
Change to a non-match:
DECLARE @Value1 int =0, @Value2 int --plus code
Now you know they are different somehow:
IS [[NOT] DISTINCT Equals ------------------ ------------------------ Distinct Non Match One or More NULL values
Next, a proper match:
DECLARE @Value1 int=0, @Value2 int=0; --plus code
Now you see that they match:
IS [[NOT] DISTINCT Equals ------------------ ------------------------ Distinct Match Matches
Finally, a typical non-match:
DECLARE @Value1 int=0, @Value2 int=1; --plus code
Now you see that they don’t match:
IS [[NOT] DISTINCT Equals ------------------ ------------------------ Distinct Non Match Doesn't Match
One quick example
Just to show this used in a table example. In this case, I am comparing to make sure that my identity values are the same, maybe for some demo I have created. Usually this wouldn’t make a lot of sense to compare on identity values for production code.:
CREATE TABLE #Table1
(
TableId int IDENTITY PRIMARY KEY,
Value varchar(10)
);
CREATE TABLE #Table2
(
TableId int IDENTITY PRIMARY KEY,
Value varchar(10)
);
Load some data:
INSERT INTO #Table1(Value)
VALUES ('Larry'),('Sam'),(NULL),('Bob'),('Doug');
INSERT INTO #Table2(Value)
VALUES ('NULL'),('Sam'),('Roger'),('Chick'),('Doug');
Look at how that data lines up:
SELECT #Table1.TableId,
#Table1.Value as T1Value, #Table2.Value as T2Value,
CASE WHEN #Table1.Value IS NOT DISTINCT FROM #Table2.Value
THEN 'Match'
WHEN #Table1.Value IS DISTINCT FROM #Table2.Value
THEN 'Non Match'
END AS [Data Match]
FROM #Table1
JOIN #Table2
ON #Table1.TableId = #Table2.TableId;
And you can see that the comparisons were made:
TableId T1Value T2Value Data Match ----------- ---------- ---------- ----------- 1 Larry NULL Non Match 2 Sam Sam Match 3 NULL Roger Non Match 4 Bob Chick Non Match 5 Doug Doug Match
Performance
While I haven’t spent a lot of time on the performance of these operators, using a copy of AdventureWorksDW on my test server, I got the same plan from:
USE AdventureworksDW; GO SELECT * FROM FactInternetSales WHERE [SalesOrderNumber] = 'SO65429' AND [SalesOrderLineNumber] = 1;
And:
SELECT * FROM FactInternetSales WHERE [SalesOrderNumber] IS NOT DISTINCT FROM 'SO65429' AND [SalesOrderLineNumber] IS NOT DISTINCT FROM 1;
A clustered index seek looking for the one primary key row. I may test some more and post later if I see any differences.
Please tell me I am wrong in a comment (and your own post but let me know too!) so I can learn any differences that exist that I (and anyone else using these functions need to know about. Most of the time you won’t be doing indexed type searches with IS [NOT] DISTINCT FROM anyhow. You will join two sets together via a key and then check for differences.
Summary
IS DISTINCT FROM and IS NOT DISTINCT from are operators that ETL developers need to know. Whether it is for transforming in place, or just comparing two data sets, it make the comparison of two data sets so much easier since you no longer need to care about NULL values for the comparison.




Leave a comment