Comparing the contents of two values in a relational database can be complicated at times, particularly when the values may have a NULL value. What really makes this more troublesome is that this is not something that a typical language would have. Briefly, it comes down to the fact that in SQL, an uninitialized column (but also variable) is thought to have an UNKNOWN value. In most languages, an uninitialized variable is considered to have a value of NOTHING.

A bit of history

When you compare nothing to nothing, they match. And you can do math with these values. As Billy Preston once said, Nothing from Nothing leaves nothing. Nothing is a known value. But if you don’t know how much one has, you can’t compare that to any other value because an unknown value may be anything at all.

So in most SQL queries, when you compare NULL to NULL, you don’t get TRUE or FALSE, you get the mathematical equivalent of ¯_(ツ)_/¯ or UNKNOWN. And since comparisons are only treated as TRUE when they evaluate to TRUE, a NULL value compared to a NULL value in most comparisons (WHERE clause or CASE expression for example), we typically had to create messy comparisons to compare two values where one might be NULL and the other not.

There were some common ways to compare two values, like using ISNULL:

ISNULL(Value1,'Impossible Value') = ISNULL(Value2,'Impossible Value')

You can do the same with COALESCE too. Or this one that is safer unless you have a constraint on Value1 and Value2 that completely prevents that impossible value from being possible. This is safer, but a pain to write once, much less for 100 columns:

(Value1 = Value2 OR (Value1 IS NULL and Value2 IS NULL))

And I know I have overthought the negation of this last one and have written:

((Value1 <> Value2 OR (Value1 IS NULL and Value2 IS NOT NULL) OR (Value1 IS NOT NULL and Value2 IS NULL))

Now that 100 comparisons is 400 lines of code.

Enter IS DISTINCT FROM

IS DISTINCT FROM and IS NOT DISTINCT FROM is a command that is standard (I mean with a name like that, it had to be from a standards committee, right?) command that lets you do the comparison between two values that never returns UNKNOWN/NULL. It was added to SQL Server in the 2022 version and pertains to PostgreSQL as well.

The name can be a bit confusing, because the NOT version means that it is equal in value or NULL value. So, a TRUE value returned from the operator means:

  • IS DISTINCT FROM – The values are different from one another
  • IS NOT DISTINCT FROM – The values are the same

To demonstrate:

DECLARE @Value1 int, @Value2 int;

SELECT CASE WHEN @Value1 IS NOT DISTINCT FROM @Value2 
              THEN 'Distinct Match'
            WHEN @Value1 IS DISTINCT FROM @Value2 
              THEN 'Distinct Non Match'
END AS [IS [[NOT]] DISTINCT],
   CASE WHEN @value1 = @value2 THEN 'Matches'
      --if it doesn't match, it must be false. but if this doesn't
      --return true since we negated the true value, then it won't
      --match this, stating that we have one or more mull values
       WHEN NOT(@value1 = @Value2) THEN 'Doesn''t Match'
         ELSE 'One or More NULL values'
   END AS [Equals];

For the two NULL values, it returns:

IS [[NOT] DISTINCT Equals
------------------ -----------
Distinct Match     Matches

Change to a non-match:

DECLARE @Value1 int =0, @Value2 int
--plus code

Now you know they are different somehow:

IS [[NOT] DISTINCT Equals
------------------ ------------------------
Distinct Non Match One or More NULL values

Next, a proper match:

DECLARE @Value1 int=0, @Value2 int=0;
--plus code

Now you see that they match:

IS [[NOT] DISTINCT Equals
------------------ ------------------------
Distinct Match     Matches

Finally, a typical non-match:

DECLARE @Value1 int=0, @Value2 int=1;
--plus code

Now you see that they don’t match:

IS [[NOT] DISTINCT Equals
------------------ ------------------------
Distinct Non Match Doesn't Match

One quick example

Just to show this used in a table example. In this case, I am comparing to make sure that my identity values are the same, maybe for some demo I have created. Usually this wouldn’t make a lot of sense to compare on identity values for production code.:

CREATE TABLE #Table1
(
    TableId int IDENTITY PRIMARY KEY,
    Value    varchar(10)
);
CREATE TABLE #Table2
(
    TableId int IDENTITY PRIMARY KEY,
    Value    varchar(10)
);

Load some data:

INSERT INTO #Table1(Value)
VALUES ('Larry'),('Sam'),(NULL),('Bob'),('Doug');

INSERT INTO #Table2(Value)
VALUES ('NULL'),('Sam'),('Roger'),('Chick'),('Doug');

Look at how that data lines up:

SELECT #Table1.TableId,
       #Table1.Value as T1Value, #Table2.Value as T2Value,
       CASE WHEN #Table1.Value IS NOT DISTINCT FROM #Table2.Value 
              THEN 'Match'
             WHEN #Table1.Value IS DISTINCT FROM #Table2.Value 
              THEN 'Non Match'
            END AS [Data Match]
FROM   #Table1
          JOIN #Table2
             ON #Table1.TableId = #Table2.TableId;

And you can see that the comparisons were made:

TableId     T1Value    T2Value    Data Match
----------- ---------- ---------- -----------
1           Larry      NULL       Non Match
2           Sam        Sam        Match
3           NULL       Roger      Non Match
4           Bob        Chick      Non Match
5           Doug       Doug       Match

Performance

While I haven’t spent a lot of time on the performance of these operators, using a copy of AdventureWorksDW on my test server, I got the same plan from:

USE AdventureworksDW;
GO

SELECT *
FROM   FactInternetSales
WHERE [SalesOrderNumber] = 'SO65429'
   AND  [SalesOrderLineNumber] = 1;

And:

SELECT *
FROM   FactInternetSales
WHERE  [SalesOrderNumber] IS NOT DISTINCT FROM 'SO65429'
  AND  [SalesOrderLineNumber] IS NOT DISTINCT FROM 1;

A clustered index seek looking for the one primary key row. I may test some more and post later if I see any differences.

Please tell me I am wrong in a comment (and your own post but let me know too!) so I can learn any differences that exist that I (and anyone else using these functions need to know about. Most of the time you won’t be doing indexed type searches with IS [NOT] DISTINCT FROM anyhow. You will join two sets together via a key and then check for differences.

Summary

IS DISTINCT FROM and IS NOT DISTINCT from are operators that ETL developers need to know. Whether it is for transforming in place, or just comparing two data sets, it make the comparison of two data sets so much easier since you no longer need to care about NULL values for the comparison.

One response to “Using IS [NOT] DISTINCT FROM to compare data with NULL values”

  1. SQL Strategies for Table Comparison – Drsql.link Avatar

    […] love IS DISTINCT FROM and one of my earliest blogs on this site was about it. The Value column is defined as allowing NULL values, so this makes sure if one is […]

    Like

Leave a reply to SQL Strategies for Table Comparison – Drsql.link Cancel reply

I’m Louis

I have been at this database thing for a long long time, with no plans to stop.

This is my blog site, companion to drsql.org

Recents