This post is not only about SELECT *

I was about to walk out the door to take a flight, when my phone rang. Our major software system we had just released last week was returning weird data. FirstName and LastName was being reversed. I missed my flight because someone wrote SELECT * instead of SELECT FirstName, LastName and a table structure was reorganized.

Ok, the story was a fabrication, but I wanted to start out with a story that could resonate with the reader. In this post, I want to say a few things about the use of SELECT * and I wanted to make sure it was clear that I am not encouraging more use of SELECT *. Not at all.

Well, okay, I do prefer SELECT * over SELECT 1 in a subquery. Same performance, easier to test. But that is all. I have changed my mind on this and have documented it here: Ok, SELECT * doesn’t belong in EXISTS either

But the real reasons you should get in the habit of never using it for reusable code are more subtle and about future-proofing your code than any immediate performance value.

Why is this post not only about SELECT *? It is mostly about one of the biggest problems with technical writing (and many other types of rhetoric, but I am not going to tackle the world today).

The reason I see this habit starting in the first place is that it is faster to type and mostly works fine. It doesn’t seem like a problem, so why take the time to even use a tool to create more characters to be compiled? Then comes the rhetoric.

“SELECT * is one of the worst habits in coding SQL, it is a crime against data!”

SELECT * is not a crime against data

There are far far worse things you can do in a relational database than execute a SELECT statement using the wildcard symbol * to mean all columns in the results. Far far worse. We all do it, and we should. If I had typed out every column from every SELECT statement I have ever written as I looked around databases and tested queries, I would have gotten about half of the work done in my life.

This isn’t to say it is a good practice to use in your compiled code, but the messaging around SELECT * elevates it to this level of incorrectness that seems catastrophic and something you shouldn’t do in your ad hoc queries. To do so even once can feel like you are somehow less of a SQL programmer when you type SELECT * from TableName just one time..

The problem is that the overzealous approach includes too many explanations of why it is not a good practice that don’t resonate because they can be easily debunked in the short term.

Overly debatable reasons

Here are a few of the reasons that are just way too easy to debunk for someone who disagrees (and you can’t respond to them. It isn’t like when you are talking to someone and can explain.

It is slower

As a headline, fine, but you can just say this (and so many do):

“SELECT * is slower than using a column list.”

But it isn’t true as stated. It CAN be true, but if someone needs to learn this lesson, they need to know when it is true, and when it isn’t.

If you need all the columns

Then it is wrong. If you need all the columns, it is the same. For example, if your table has these columns and the thousands of LargeValue column values in your table are all 1 GB each:

[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL,
LargeValue varchar(max) NOT NULL
);
[/code]

Whether you execute:

[code lang=”SQL”]
SELECT *
FROM Test;
[/code]

Or:

[code lang=”SQL”]
SELECT TestId, Value, LargeValue

FROM Test;
[/code]

It will take the same amount of processing, network usage, CPU, whatever.

If you don’t need all of the columns

This is where it changes, but even this can cause people to think “who cares?” because if you have no tables that have large columns, it won’t be noticeable. For example, say your table is this:

[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL,
Value2 varchar(10) NULL
);
[/code]

What will be the difference between:

[code lang=”SQL”]
SELECT *
FROM Test;
[/code]

And

[code lang=”SQL”]
SELECT TestId, Value
FROM Test;
[/code]

The latter will be better, but again, not be even somewhat noticeable in any typical situation. So now your reader is like, “whatever,” and moves on.

It is fragile

Now you are hitting on something that should matter to people. But if you have led with the slower argument (using an enormous column as your example), you probably have lost people.

Fragility IS something that people should care about, but even then, if it isn’t slower (but it IS slower to type, which matters to more people than you would think). Fragility levels are interesting because a lot depends on how the tools fetch data.

By name, things are less fragile because: SELECT TestId, Value will be no different than SELECT Value, TestId, SELECT Value, Value2, TestId, or SELECT *.

Positionally is far messier (and more likely to be a mistake they learn the hard way. Because even SELECT Value, TestId and SELECT Value, TestId are going to cause you issues.

I will note that there have been many times in my actual, non-fictional, non-exaggerated life when adding a new column broke code. They expected 3 columns, someone (or some process like replication added a new column, and thing broke and had to be removed.

And things like having:

[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL
);
[/code]

Change to:

[code lang=”SQL”]
CREATE TABLE Test
(
TestId int PRIMARY KEY,
Value varchar(20) NOT NULL,
Notes nvarchar(max) NULL
);
[/code]

Have happened too. And when you first add that column, no big deal. The column wasn’t noticed by some code, but then, for some reason, everything slowed down.. Because data was being added in the notes column… to a list… and not being shown.

The moral of the story is

SELECT * isn’t a crime against data, it is a crime against the stability of your code.

SELECT * in your compiled code is bad for many reasons. It is important to make that clear to anyone who is writing SQL code. But you have to make it clear that you are playing a long game. The problems you are trying to avoid are not usually immediate, especially if you are actually testing your code.

And teaching that is, is a crime, if you don’t really hit hard the most realistic, reasonable problems people will hit.

This goes for everything you teach. Start with the most practical reasons. In my post last week on generating a sequence of numbers (which has a follow-up coming!), I tested using what I had heard was a terrible method for generating such numbers, “A recursive CTE”. Was it the worst method? Yes. Could it generate far more numbers than most of us need (100000) in under a second, also yes.

So if my only reason to not use a feature is that it is terrible, but it works for me, you need to give practical examples that your audience will believe. For SELECT *, it is that it will break your code, so if you want to save the code for later use, get rid of the SELECT *. But it harms no one to write SELECT * FROM Table and look at the output.

10 responses to “This post is not only about SELECT *”

  1. […] Louis Davidson defends a slighted bit of syntax: […]

  2. Lauri Avatar
    Lauri

    Stopped reading at “Well, okay, I do prefer SELECT * over SELECT 1 in a subquery”

    1. Louis Davidson Avatar

      But why? Share, give your opinion, convince me.

      You realize that this query:

      SELECT c.CustomerID, p.FirstName, p.LastName, ea.EmailAddress
      FROM Sales.Customer c
      JOIN Person.Person p
      ON c.PersonID = p.BusinessEntityID
      JOIN Person.EmailAddress ea
      ON p.BusinessEntityID = ea.BusinessEntityID
      WHERE NOT EXISTS ( SELECT 1
      FROM Sales.SalesOrderHeader o
      WHERE o.CustomerID = c.CustomerID );

      has the EXACT same plan whether it is SELECT 1 or SELECT *. That construct is a semi-join either way.

      I do see the point that Kevin Feasel has here: Thoughts on SELECT * – Curated SQL, that it is harder to find bad SELECT * usage if you use it here. So I can see some points of view. My opinion is not that strong on the subject, but I would love to know your thinking. (Or I will use it to write another editorial :))

      1. Liver Avatar
        Liver

        he must ask the IA to answer you

    2. Brad Schulz Avatar

      Hi Lauri (and Louis)…

      You may want to read this, which I wrote (YIKES!) 17 years ago:

      https://bradsruminations.blogspot.com/2009/09/age-old-select-vs-select-1-debate.html

  3. samot-dwarf Avatar
    samot-dwarf

    Hard to track down, but we once had a view that used SELECT * instead of a column list.

    Lets say the table had
    col1, col2, col3, col4 (values in the row were 1, 2, 3, 4)

    and col2 was dropped and a col5 with value = 5 was added.

    Suddenly the view returned

    1 as col1, 3 as col2, 4 as col3 and 5 as col4
    instead of
    1 as col1, 3 as col3, 4 as col4 and 5 as col5

    reason: a view will not be automatical recompiled when you change the underlying tables and returns the old column names but uses the new values (as long the datatype matches).

    It caused several problems just imagine what happens when it returns the product_id in the ordered_amount column and you are suddenly sending out 328567 items of a product (or try to print this amount of copies)

    1. Louis Davidson Avatar

      This is 100% the story we need to remind people. Your failures may take years to show up, but when they do…oh boy

  4. ugrok xugl Avatar
    ugrok xugl

    From a support perspective, given a table with over 50 columns and 15 billion rows, telling a customer to SELECT * FROM Messages is not going to go down well.

    Much better to put down the data shotgun and get out the sniper rifle,; include a WHERE clause and a list of columns to target the relevant data to solve their problem.

    1. Louis Davidson Avatar

      True. And my support query process depending on my knowledge of a table/query is more SET TRANSACTION ISOLATION READ UNCOMMITTED (or SNAPSHOT) SEELCT TOP 10 from tableOrQuery, scan the column names, the data I see, and then write queries using more interesting filters and column lists.

      Obviously this depends on the server/criticality of the server/db I am working on

  5. […] few weeks ago, I wrote this post “This post is not only about SELECT *”. It was primarily about the various reasons to avoid using SELECT * in your reusable code, but it […]

Leave a Reply

I’m Louis

I have been at this database thing for a very long time, with no plans to stop.

Series: SQL Techniques You Should Know

Recents

Discover more from Drsql's Database Musings

Subscribe now to keep reading and get access to the full archive.

Continue reading