I was about to walk out the door to take a flight, when my phone rang. Our major software system we had just released last week was returning weird data. FirstName and LastName was being reversed. I missed my flight because someone wrote SELECT * instead of SELECT FirstName, LastName and a table structure was reorganized.
Ok, the story was a fabrication, but I wanted to start out with a story that could resonate with the reader. In this post, I want to say a few things about the use of SELECT * and I wanted to make sure it was clear that I am not encouraging more use of SELECT *. Not at all.
Well, okay, I do prefer SELECT * over SELECT 1 in a subquery. Same performance, easier to test. But that is all.I have changed my mind on this and have documented it here: Ok, SELECT * doesn’t belong in EXISTS either
But the real reasons you should get in the habit of never using it for reusable code are more subtle and about future-proofing your code than any immediate performance value.
Why is this post not only about SELECT *? It is mostly about one of the biggest problems with technical writing (and many other types of rhetoric, but I am not going to tackle the world today).
The reason I see this habit starting in the first place is that it is faster to type and mostly works fine. It doesn’t seem like a problem, so why take the time to even use a tool to create more characters to be compiled? Then comes the rhetoric.
“SELECT * is one of the worst habits in coding SQL, it is a crime against data!”
SELECT * is not a crime against data
There are far far worse things you can do in a relational database than execute a SELECT statement using the wildcard symbol * to mean all columns in the results. Far far worse. We all do it, and we should. If I had typed out every column from every SELECT statement I have ever written as I looked around databases and tested queries, I would have gotten about half of the work done in my life.
This isn’t to say it is a good practice to use in your compiled code, but the messaging around SELECT * elevates it to this level of incorrectness that seems catastrophic and something you shouldn’t do in your ad hoc queries. To do so even once can feel like you are somehow less of a SQL programmer when you type SELECT * from TableName just one time..
The problem is that the overzealous approach includes too many explanations of why it is not a good practice that don’t resonate because they can be easily debunked in the short term.
Overly debatable reasons
Here are a few of the reasons that are just way too easy to debunk for someone who disagrees (and you can’t respond to them. It isn’t like when you are talking to someone and can explain.
It is slower
As a headline, fine, but you can just say this (and so many do):
“SELECT * is slower than using a column list.”
But it isn’t true as stated. It CAN be true, but if someone needs to learn this lesson, they need to know when it is true, and when it isn’t.
If you need all the columns
Then it is wrong. If you need all the columns, it is the same. For example, if your table has these columns and the thousands of LargeValue column values in your table are all 1 GB each:
[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL,
LargeValue varchar(max) NOT NULL
);
[/code]
Whether you execute:
[code lang=”SQL”]
SELECT *
FROM Test;
[/code]
Or:
[code lang=”SQL”]
SELECT TestId, Value, LargeValue
FROM Test;
[/code]
It will take the same amount of processing, network usage, CPU, whatever.
If you don’t need all of the columns
This is where it changes, but even this can cause people to think “who cares?” because if you have no tables that have large columns, it won’t be noticeable. For example, say your table is this:
[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL,
Value2 varchar(10) NULL
);
[/code]
What will be the difference between:
[code lang=”SQL”]
SELECT *
FROM Test;
[/code]
And
[code lang=”SQL”]
SELECT TestId, Value
FROM Test;
[/code]
The latter will be better, but again, not be even somewhat noticeable in any typical situation. So now your reader is like, “whatever,” and moves on.
It is fragile
Now you are hitting on something that should matter to people. But if you have led with the slower argument (using an enormous column as your example), you probably have lost people.
Fragility IS something that people should care about, but even then, if it isn’t slower (but it IS slower to type, which matters to more people than you would think). Fragility levels are interesting because a lot depends on how the tools fetch data.
By name, things are less fragile because: SELECT TestId, Value will be no different than SELECT Value, TestId, SELECT Value, Value2, TestId, or SELECT *.
Positionally is far messier (and more likely to be a mistake they learn the hard way. Because even SELECT Value, TestId and SELECT Value, TestId are going to cause you issues.
I will note that there have been many times in my actual, non-fictional, non-exaggerated life when adding a new column broke code. They expected 3 columns, someone (or some process like replication added a new column, and thing broke and had to be removed.
And things like having:
[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL
);
[/code]
Change to:
[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL,
Notes nvarchar(max) NULL
);
[/code]
Have happened too. And when you first add that column, no big deal. The column wasn’t noticed by some code, but then, for some reason, everything slowed down.. Because data was being added in the notes column… to a list… and not being shown.
The moral of the story is
SELECT * isn’t a crime against data, it is a crime against the stability of your code.
SELECT * in your compiled code is bad for many reasons. It is important to make that clear to anyone who is writing SQL code. But you have to make it clear that you are playing a long game. The problems you are trying to avoid are not usually immediate, especially if you are actually testing your code.
And teaching that is, is a crime, if you don’t really hit hard the most realistic, reasonable problems people will hit.
This goes for everything you teach. Start with the most practical reasons. In my post last week on generating a sequence of numbers (which has a follow-up coming!), I tested using what I had heard was a terrible method for generating such numbers, “A recursive CTE”. Was it the worst method? Yes. Could it generate far more numbers than most of us need (100000) in under a second, also yes.
So if my only reason to not use a feature is that it is terrible, but it works for me, you need to give practical examples that your audience will believe. For SELECT *, it is that it will break your code, so if you want to save the code for later use, get rid of the SELECT *. But it harms no one to write SELECT * FROM Table and look at the output.



Leave a Reply