I was reading LinkedIn posts the other day when I saw this blog about what was apparently an interview question about some forms of a COUNT aggregate function
This was apparently asked in an interview. What will each of these constructs do in a SQL statement:
COUNT(*) = ? COUNT(1) = ? COUNT(-1) = ? COUNT(column) = ? COUNT(NULL) = ? COUNT() = ?
The correct answers the poster gave were not the same ones that are correct for SQL Server, but I will also note that there is one of these that are never going to be a simple answer without knowing the data either.
This is similar to some questions I might ask in an interview, though I should note, I never care if people get the answers wrong as long as I see the reason why they get them wrong makes sense. I would not have guessed that one of these would fail in SQL Server, but it did. And I would have forgotten about a warning. Does that make me less of a SQL expert? No. There are a just a few of these where if you didn’t know what the output would be that you could lose your “SQL EXPERT” badge. And not remembering esoteric syntax is not usually involved
The sample data
For the sample data, I am going to create a couple of temp tables. One I will leave empty, one I will load with some simple data.
DROP TABLE IF EXISTS #testCountEmpty, #testCountLoaded;
CREATE TABLE #testCountEmpty
(
value int NULL
)
CREATE TABLE #testCountLoaded
(
value int NULL
)
GO
--load 1000 rows into the temp table,
INSERT INTO #testCountLoaded
SELECT CASE--even numbers are left NULL
WHEN value % 2 = 0 THEN NULL
ELSE value END AS value
FROM GENERATE_SERIES(1, 10000);
GO
Empty table
Take a look at the following 4 queries. If you want to play along as you might with an interview, go through and think abou twhat each COUNT aggregate function call will do. Then I will try to execute the code and see what happens:
DECLARE @nullvalue INT,
@notNullValue INT = 1;
SELECT COUNT(*) AS count_star,
COUNT(value) AS count_columnName,
COUNT(NULL) AS count_null
FROM #testCountEmpty;
SELECT COUNT(1) AS count_1,
COUNT(-1) AS count_minus_1,
COUNT(CASE WHEN value IS NULL THEN 1
ELSE NULL
END ) AS count_expression
FROM #testCountEmpty;
SELECT COUNT('') AS count_emptystring,
COUNT(@notNullValue) AS count_notNullVariable,
COUNT(@nullvalue) AS count_nullVariable
FROM #testCountEmpty;
SELECT COUNT() AS count_no_parameter,
COUNT(COUNT(*))
FROM #testCountEmpty;
This space was left blank to give you a chance to look at the code.
.
.
.
.
.
.
.
.
.
.
.
.
.
First off, there were several basic syntax type errors. So when I tried to run the code as is: COUNT() threw an error:
Msg 174, Level 15, State 1, Line 36 The count function requires 1 argument(s).
Removing that and running the it again, COUNT(NULL) threw a different error:
Msg 8117, Level 16, State 1, Line 29 Operand data type NULL is invalid for count operator
This is the one error that I did not expect. I would have said this would probably return 0. Why not a literal NULL? In the flavor of SQL the LinkedIn poster used, that did not throw an error. The weird differences in SQL dialects can also throw you if you use more than one even semi regularly.
Then COUNT(COUNT(*)) was also not a happy operation:
Msg 130, Level 15, State 1, Line 41 Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
Ok, reformatting this to only the ones that will work because I don’t pay myself by the line count or the word count despite the extra space I added earlier. (I pretty much pay myself in knowledge, though I have to suplement sometimes that to pay for hardware, software, and travelling to conferences sometimes!)
Of the things I put in my test queries, I did not originally realize that COUNT(NULL) would fail! I was 99.9% sure COUNT() AND COUNT(COUNT(*)) would, hence they were in the last statement by themselves.
DECLARE @NULLVALUE int,
@NOTNULLVALUE int = 1;
SELECT COUNT(*) AS count_star,
COUNT(Value) AS count_columnName
FROM #testCountEmpty;
SELECT COUNT(1) AS count_1,
COUNT(-1) AS count_minus_1,
COUNT(CASE WHEN value IS NULL THEN 1 ELSE NULL END)
AS count_expression
FROM #testCountEmpty;
SELECT COUNT('') AS count_emptystring,
COUNT(@NOTNULLVALUE) AS count_notNullVariable,
COUNT(@NULLVALUE) AS count_nullVariable
FROM #testCountEmpty;
This returns, completely unsurprisingly, all zeros now that we have removed all the errors:
count_star count_columnName ----------- ---------------- 0 0 count_1 count_minus_1 count_expression ----------- ------------- ---------------- 0 0 0 count_emptystring count_notNullVariable count_nullVariable ----------------- --------------------- ------------------ 0 0 0
For a bit of explanation:
- count_star (
COUNT(*))- the typical way we do things. This counts all the rows in the table, no matter their value. - count_columnName (
COUNT(value)) – this counts all non-null values in the column - count_1 and count_minus_1 (
COUNT(1)andCOUNT(-1)) – functionally equivalent toCOUNT(*). Like*in aSELECTclause, this does represent the entire row. But it does not care if any values areNULL. Same with the literal values of 1 and -1. - count_expresion (
COUNT(CASE WHEN value IS NULL THEN 1 ELSE NULL END)) – no rows, hence a 0 output. - count_emptystring (
COUNT('')) – Pretty much the equivalent ofCOUNT(*)andCOUNT(1)orCOUNT(-1). Any scalar value will return the number of rows in the set that comes from theFROMandWHEREclause
I am going to leave off more discussion of the COUNT(@Variable) for the next section except to say, COUNT(@NULLVALUE) does work, whereas the literal NULL expression did not. In any case, when there are 0 rows in the set you are counting, every legal COUNT aggregate function call will return 0.
Table with data
Now, the exact same query as before, but this time we have data. Still no group by, and really that shouldn’t change the outcome for you. Aggregate functions like COUNT work on groups of data, and the most basic is all rows in a single group.
These functions would just output data per group without this, something I might write about sometime later, but for now, just think of the table as one group and focus on what will happen for the rows in the table. Consider whether there might be other stuff output or not too?
I feel bad for anyone who would have to answer this perfectly to impress someone while in an interview setting. There are a few nuances that I would have not realized when I was doing this!
As a reminder, there are 10,000 rows in the table, and 5000 non-null Value column values.
DECLARE @NULLVALUE int,
@NOTNULLVALUE int = 1;
SELECT COUNT(*) AS count_star,
COUNT(Value) AS count_columnName
FROM #testCountLoaded;
SELECT COUNT(1) AS count_1,
COUNT(-1) AS count_minus_1,
COUNT(CASE WHEN value IS NULL THEN 1 ELSE NULL END)
AS count_expression
FROM #testCountLoaded;
SELECT COUNT('') AS count_emptystring,
COUNT(@NOTNULLVALUE) AS count_notNullVariable,
COUNT(@NULLVALUE) AS count_nullVariable
FROM #testCountLoaded;
Some blank space, so you can work out the answers if you are so inclined. (Maybe I should pay myself by the line?)
.
.
.
.
.
.
.
.
.
.
.
.
.
The output is:
count_star count_columnName ----------- ---------------- 10000 5000 Warning: Null value is eliminated by an aggregate or other SET operation. count_1 count_minus_1 count_expression ----------- ------------- ---------------- 10000 10000 5000 Warning: Null value is eliminated by an aggregate or other SET operation. count_emptystring count_notNullVariable count_nullVariable ----------------- --------------------- ------------------ 10000 10000 0 Warning: Null value is eliminated by an aggregate or other SET operation.
The first thing you may notice, is there is a warning output. Some of our output was influenced by there being NULL values in the output. This is an ANSI warning to tell you that your totals could be something you did not expect, because NULL is not counted in the output. This little message used to cause us havoc in SSIS years ago!
The real trick to getting this question perfect would be to know the state of the ANSI_WARNING setting. If the connection has this turned OFF, or if you turn it off with:
SET ANSI_WARNINGS OFF;
Then you would not get the warnings. It is a great warning to get when you are doing ad-hoc queries, but good to turn off when you know it is the case and you are expecting some NULL data, especially when embedding code into an application.
Ok, this time lets go back through the results:
- count_star (
COUNT(*)) – This aggregate returns the number of rows in the results. In this case 10000. - count_columnName (
COUNT(value)) – this counts all non-null values in the column, so in our case, since I loaded 5000 non-null values in the rows: 5000. This is where the first warning came from. - count_1 and count_minus_1 (
COUNT(1)andCOUNT(-1)) – functionally equivalent toCOUNT(*). Sometimes people use a literal because they have heard that * is bad, soCOUNT(*)should be worse thanCOUNT(1). It really makes no difference in the query optimization. - count_expresion (
COUNT(CASE WHEN value IS NULL THEN 1 ELSE NULL END)) – This one is interesting, but since the result of the expression just flipsNULLto 1 and aNOT NULLvalue toNULLso hence we get 5000. If you weren’t paying attention, you might have thought it was 10000, because most of the time an expression like this is something like:SUM(CASE WHEN ...THEN 1 ELSE 0 END),and a common mistake is to useCOUNTinstead ofSUM. (Thankfully I can edit my work so there isn’t a panel of interviewers staring at me!) - count_emptystring (
COUNT('')) – Just like the other expressions where the parameter is a scalar, it returns the same asCOUNT(*) - count_notNullVariable – (
COUNT(@NOTNULLVALUE)) – When the expression is not a literal, it countsNOT NULLvalues as 1, andNULLvalues as 0 (plus the warning). Hence, this returned 10,000. - count_nullVariable – (
COUNT(@NULLVALUE)) – Since this expression results inNULLfor every row, it will return 0 (and cause the warning)
Summary
One of the most interesting parts of SQL is debugging. As a declarative language, we don’t go row by row and evaluate expression in code (the query processor may, but we don’t write that code, at least ideally! Some constructs, like the ones I put in this blog, can make you scratch your head. This is especially true when you have columns, CASE expressons, function calls, and even variables. So it can take a bit of focus to make sure your code is right, when it seems to be behaving wrong. Sure it might be a bug that millions of other users haven’t seen (it actually does happen), but to be fair, it is probably something we coded wrong.




Leave a comment