A few weeks ago, there was a LinkedIn post (I can’t find it anymore) that covered something about how indexes were used by COUNT in SQL. I think it may have been based on SQL Server, but I am not sure (it is rare that one of the SQL posts on LinkedIn mentions a platform). At the time, I went and tried a few of the mentioned cases and realized this was an interesting question: how does the COUNT aggregate use indexes when you use various different expressions.

I will try the following cases, which I present first, to see if you have a guess which will use or not use an index.

COUNT(*)
COUNT(Literal Expression) (Like COUNT(1) or COUNT(1+100)
COUNT(PRIMARY KEY column)
COUNT(Column that doesn't allow NULL values)
COUNT(Column that Allows NULL values)

Most of these will be counting columns or expressions that are not NULL, but there is one case where a NULL does make a difference.

Note: it is rare that you will actually need to execute a query to just fetch all rows from a table. But it is useful to understand what occurs when you do, because it can help you understand what the engine may do in similar cases. Plus it is kind of fun to do these sort of out there thought tests that challenge you to understand how some process occurs in your database engine.

Picking my table to work with (and another technique)

To do this, I decided to find the largest tables in the Adventureworks2025 datababase (AdventureWorks2025 Download). To do this I will use one additional technique for finding the rowcount in a table, or all tables in a database.

You can do this by querying the sys.partitions table (grouping by index_id, as all partitions are in this system view, then filtering for index_id in 0,1 because this is either the heap or clustered physical structure.)

SELECT TOP 1 OBJECT_SCHEMA_NAME(p.object_id) AS SchemaName,
       MAX(o.name) AS OjectName,
       SUM(p.rows) AS Rows
--there may be > 1 partition
FROM   sys.partitions AS p
          JOIN sys.objects AS o
             ON o.object_id = p.object_id
WHERE p.index_id IN (0,1) -- heap or clustered
  AND OBJECT_SCHEMA_NAME(o.object_id) != 'sys'
GROUP BY p.object_id, p.index_id
ORDER by Rows DESC;

If you don’t need a filtered/concurrency safe/transactionally consistent count of all the rows in a table, this is the best way to do it for sure.

From this database, I looked for the largest table, then I will check to see if it supports the scenarios I am looking for well enough in a pretty large table:

SchemaName         OjectName           Rows
------------------ ------------------- ----------------
Sales              SalesOrderDetail    121317

Using this age-old system procedure, I can see that it is good enough for this discussion.

EXEC sp_help 'sales.salesorderdetail';

I won’t show the details, but I see that:

There is a clustered compound primary key (PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID)
There is a unique key on a guid (AK_SalesOrderDetail_rowguid)
There is an index on (IX_SalesOrderDetail_ProductID)
There is a nullable column – CarrierTrackingNumber nvarchar(50)

And from there I can create a demo tables to make it fit each of my scenarios. In the appendix, I make a new copy of this table in a schema named Demo so I can alter it without affecting your copy of AdventureWorks.

For each scenario, I will include the plan that is generated, and discuss what that indicates occurred. Naturally, this being a cost based optimizer, the plan could change as long as it is guaranteed to return the correct answer. But for the tests I have done, it does pretty much what you probably ought to expect.

I tested this on some larger tables I have on a private server and got the same sort of plans/results (just larger numbers of rows!)

The method I will use

For each query, I will grab the estimated plan for the query using SET SHOWPLAN_TEXT ON; (you can use the graphical version, but the text version is a bit easier to copy/paste into an article and see the details). I will use a query that looks like this for each grouping.

SET SHOWPLAN_TEXT ON;
GO
The query I am testing
GO
SET SHOWPLAN_TEXT OFF;
GO
SET STATISTICS IO ON;
GO

GO
SET STATISTICS IO OFF;
GO

This will give us enough information to see the basics of how each query is optmized, and then a bit of detail on how much IO is output. I won’t dig into the plans very deep, because that would take a lot more typing and my great friend Hugo Kornelis has done a better job than I could in this context. The query plans are pretty straighforward. : SQL Server Execution Plan Reference

`COUNT(*)`

The classic, basic, count all the rows in a table method that is tried and true:

USE AdventureWorks2022;
GO

SET SHOWPLAN_TEXT ON;
GO
SELECT COUNT(*)
FROM   Demo.SalesOrderDetail;
GO
SET SHOWPLAN_TEXT OFF;
GO
SET STATISTICS IO ON;
GO
SELECT COUNT(*)
FROM   Demo.SalesOrderDetail;
GO
SET STATISTICS IO OFF;
GO

The plan is perhaps a bit surprising. You should generally expect it to do an index scan, but if I was guessing, I would have guessed that it would scan the clustered index (just because that seems like it is the main index). It would if no other choice is available, but in this case, it finds the smallest index that will suit it’s purposes:

|--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1003],0)))
  |--Stream Aggregate(DEFINE:([Expr1003]=Count(*)))
     |--Index Scan(OBJECT:([AdventureWorks2025].[Demo].[SalesOrderDetail].
           [IX_SalesOrderDetail_ProductID]))

Just like you should always be thinking when you are writing queries: get the right answer, in the fastest way, even if it isn’t the most obvious to everyone.

Then the output of the query, which should always be the same.

------------
121317

And then the output from STATISTICS IO:

Table 'SalesOrderDetail'. Scan count 1, logical reads 218, physical reads 0,
page server reads 0, read-ahead reads 0, page server read-ahead reads 0,
lob logical reads 0, lob physical reads 0, lob page server reads 0,
lob read-ahead reads 0, lob page server read-ahead reads 0.

Note: if you are using this on the actual Sales.SalesOrderDetail table, the logical reads will be higher, since the clustering key is larger (I used one integer column, they used two)

218 is the number of logical reads you get when you do:

SELECT SalesOrderDetailID, ProductId
FROM   Demo.Salesorderdetail;

The three main things we will care about in this test is:

Table: What tables are involved (will there be any additional worktable required? Doubt it, but who knows).
Scan count: How many times did a physical structure get scanned
Logical reads: One of the best indications of the cost of your query, because I will assume that the table I am working with will fit in memory and will hence be cached.

Once you have ran the query, you won’t see any physical reads, and all the other values will be 0 most likely. Now, since this should be be best case scenario, we will expect that this is the cost to beat:

Table 'SalesOrderDetail'. Scan count 1, logical reads 218, physical reads 0,

`COUNT(Literal)`

There has been a debate for years and years about whether COUNT(*) or COUNT(1) is faster. The * throws people because it makes them think it is going to expand the columns of the table and check each of them. Thinking back to the last section, this isn’t logical because it scanned an index that does not include all the columns of the table:

CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ProductID] 
 ON [Demo].[SalesOrderDetail]
  (
     [ProductID] ASC
  );

It is possible that one is technically faster, but as many who have built a database engine have noted that any differences would be measured in a handful of clock ticks. As you can see from the following output, there is no discernable difference in the plan.

--only showing the query now, you can use the code from the COUNT(*) section.
SELECT COUNT(1)
FROM   Demo.SalesOrderDetail;

The plan and scans for this and the first plan are identical. In reality, this is probably the slower of the two queries by only a few clock ticks, because the optimizer has to decide if the expression can be NULL. In the case of 1, it cannot be, and the expression for the stream aggregate in the plan is:

|--Stream Aggregate(DEFINE:([Expr1003]=Count(*)))

Which you will note, says Count(*), not Count(1) or COUNT(1). So these two forms are the same in terms of how the query processor will use them.

You may ask, why not SET STATISTICS TIME ON;? When the plan for two queries are identical, the differences in time should be negligable unless you are doing some sort of looping that doesn’t show up in the plan. So if there are the same scans and plans reported, I will consider them equal in this case (this may not always be 100% true, depending on the cost to do the calculation of whether the value to count is NULL or not.)

`COUNT(Literal Expression)`

This is one of what I will call the “silly” paths that I love to throw in just to see what will happen. So for the next three queries, I will use a more complex expression for the COUNT call.

SELECT COUNT(1+1)
FROM   Demo.SalesOrderDetail;

SELECT COUNT(COALESCE(NULL,2))
FROM   Demo.SalesOrderDetail;

DECLARE @value INT = 1
SELECT COUNT(@value)
FROM   Demo.SalesOrderDetail;

Perhaps predictably the plans and scans are the same, and the two expressions that can be evaluated as always being non-NULL have the following:

|--Stream Aggregate(DEFINE:([Expr1003]=Count(*)))

But the expression that uses the value does know that it needs to verify the value is or is not NULL:

|--Stream Aggregate(DEFINE:([Expr1003]=COUNT([@value])))

Even so, since these plans all use the Stream Aggregate query operator and the same indexes, they all count rows, so the only difference is that they have to evaluate each row for the value in the expression to see if it is NULL or non NULL:

DECLARE @value int = NULL;

SELECT COUNT(@value)
FROM   Demo.Salesorderdetail;

But it still uses the same index, because we have still not required the query processor to evaluate any data.

`COUNT(Simple PRIMARY KEY column)`

In this section, I want to name the column to count from. I will start using a column from the primary key. First, the single column in the table I created in the Demo schema, and then I will change it to a compound primary key like the one in the Sales Schema

SELECT COUNT(SalesOrderDetailId)
FROM   Demo.SalesOrderDetail;

A pattern started to be pretty noticeable at this point in the process as the plan again was:

|--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1003],0)))
  |--Stream Aggregate(DEFINE:([Expr1003]=Count(*)))
    |--Index Scan(OBJECT:([AdventureWorks2025].[Demo].
          [SalesOrderDetail].[IX_SalesOrderDetail_ProductID]))

It knew that counting any column that doesn’t allow NULL values will be the same as Count(*), so that is how it converted it. So even though the SalesOrderDetailID column is itself part of a unique index, it is faster to use a different index when it exists.

ALTER TABLE Demo.SalesOrderDetail
  DROP CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID];

ALTER TABLE Demo.[SalesOrderDetail] 
  ADD CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]
     PRIMARY KEY CLUSTERED
     (
        [SalesOrderID] ASC,
        [SalesOrderDetailID] ASC
     );

Then I tested these two queries that used the first index key and the last index key from the primary key we just added.

SELECT COUNT(SalesOrderId)
FROM   Demo.SalesOrderDetail;

SELECT COUNT(SalesOrderDetailId)
FROM   Demo.SalesOrderDetail;

No surprise (at least not after I have been doing this for a few hours now!), it again changes the expression to Count(*) in the plan, so you get the plan we have already seen. The only difference is that now the logical counts have gone up:

Table 'SalesOrderDetail'. Scan count 1, logical reads 278, physical reads 0,

Which makes sense because the index key now includes two integer columns instead of one, so it is a bit larger of physical data that is being scanned.

`COUNT(Column that doesn't allow NULL values)`

To finish out this thread, let’s take a couple of columns in the table that are declared as NOT NULL, with and without an index.

rowguid has an index, so I will choose that one.

This index already exists in the code in the appendix:

CREATE UNIQUE NONCLUSTERED INDEX
   [AK_SalesOrderDetail_rowguid] 
    ON [Demo].[SalesOrderDetail]
       (
          [rowguid] ASC
       );

And then UnitPrice, which has no index but is defined as NOT NULL.

SELECT COUNT(rowguid)
FROM   Demo.SalesOrderDetail;

SELECT COUNT(UnitPrice)
FROM   Demo.SalesOrderDetail;

They still optimized to Count(*) and it uses the same index we have used in every example so far, even though rowguid HAS its own index. But since it would always use the best path for COUNT(*), it would always be best to do whatever was done there.

|--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1003],0)))
  |--Stream Aggregate(DEFINE:([Expr1003]=Count(*)))
     |--Index Scan(
            OBJECT:([AdventureWorks2025].[Demo].[SalesOrderDetail].
                                      [IX_SalesOrderDetail_ProductID]))

`COUNT(Column that Allows NULL values)`

And now for something completely somewhat different. A query that will optimize in a different manner when the column or expression allows NULL values. Initially, for this demo, the column I will use has no index. So it will need to scan some structure:

SELECT COUNT([CarrierTrackingNumber])
FROM   Demo.SalesOrderDetail;

The output starts out with a warning message when there are NULL values and you have ANSI_WARNINGS turned on, which is typically is by default:

Warning: Null value is eliminated by an aggregate or other SET operation.

And then you get the plan, and you can see that it scans the base structure (clustered index or it will scan the heap, which I won’t include as an example):

|--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1003],0)))
  |--Stream Aggregate(DEFINE:([Expr1003]=
    COUNT([AdventureWorks2022].[Demo].[SalesOrderDetail].[CarrierTrackingNumber])))
     |--Clustered Index Scan(OBJECT:([AdventureWorks2022].[Demo].[SalesOrderDetail].      
                             [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID]))

When it has to actually check to see if the expression is NULL or not, this is where it will use an index, specifically on the column in question. For example, add an index to this column:

CREATE INDEX CarrierTrackingNumber 
  ON Demo.SalesOrderDetail (CarrierTrackingNumber);

Then execute the following:

SELECT COUNT([CarrierTrackingNumber])
FROM Demo.SalesOrderDetail;

You get the following plan (and the same warning), but you can see does include the index I just created.

|--Compute Scalar(DEFINE:([Expr1002]=CONVERT_IMPLICIT(int,[Expr1003],0)))
  |--Stream Aggregate(DEFINE:([Expr1003]=COUNT(
          [AdventureWorks2022].[Demo].[SalesOrderDetail].[CarrierTrackingNumber])))
    |--Index Scan(
          OBJECT:([AdventureWorks2022].[Demo].[SalesOrderDetail].
                                                   [CarrierTrackingNumber]))

The logical reads are elevated:

Table 'SalesOrderDetail'. Scan count 1, logical reads 438, physical reads 0

But this is to be expected because the average length of the data is 12 characters (or 24 bytes) for each row that has a value. This is more than the 4 bytes per integer (and it is a variable-length column with some NULL values), so you can’t simply use (Number of Rows * Datatype Size) to see how large the basic character data will take up.

Summary

In this article, I showed a few of the important things you should realize about the query optimizer. It will do what is the best thing it can do to process your query as fast as possible, even if the plan no longer looks like the query you executed.

In this case of counting the number of rows in a table, unless what is in the COUNT() expression could be NULL, it always optimized to Count(*), which basically means that COUNT(1) and COUNT(*) are equivalent as far as SQL Server is concerned.

Appendix

This creates a schema and a table that looks like the SalesOrderDetail table in AdventureWorks (I am using the new 2025 version). The only difference is that I changed the PK to use the one identity column instead of two columns so the table starts out a bit more like other typical tables. It will be changed in some of the example code.)

CREATE SCHEMA Demo;
GO
DROP TABLE Demo.SalesOrderDetail;
SELECT SalesOrderID,
       SalesOrderDetailID,
       CarrierTrackingNumber,
       OrderQty,
       ProductID,
       SpecialOfferID,
       UnitPrice,
       UnitPriceDiscount,
       rowguid,
       ModifiedDate
INTO Demo.SalesOrderDetail
FROM Sales.SalesOrderDetail;
GO

ALTER TABLE Demo.[SalesOrderDetail]
ADD
	[LineTotal]  AS (isnull(([UnitPrice]*
                     ((1.0)-[UnitPriceDiscount]))*[OrderQty],(0.0)));

--Different PK, since this is actually an identity column
ALTER TABLE Demo.[SalesOrderDetail]
  ADD CONSTRAINT [PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID] 
     PRIMARY KEY CLUSTERED 
     (
  	[SalesOrderDetailID] ASC
     );

CREATE UNIQUE NONCLUSTERED INDEX 
       [AK_SalesOrderDetail_rowguid] ON [Demo].[SalesOrderDetail]
(
	[rowguid] ASC
);

CREATE NONCLUSTERED INDEX 
       [IX_SalesOrderDetail_ProductID] ON [Demo].[SalesOrderDetail]
(
	[ProductID] ASC
);