Define the question before writing the query

I am writing a presentation to do a couple of times this year on SQL techniques you ought to know. In the abstract, I defined the story as basically, you get an urgent request for a new report that does some aggregation and I am going to give you some techniques to help you out.

As I iterated through ideas (like I will definitely be covering a few ways to group data/use window functions), I realized that a number of problems we have with SQL come down to the fact that querying a database is just like the one type of problem we all hated in school math class.

O no, word problems

At the risk of sounding like an over-dramatic AI bot, oh those dastardly destructive to our minds word problems. Just knowing how to add, multiply, subtract, and divide was not enough. No, we had to decide what to use those operations on from a word salad of details that maybe or maybe not mattered (especially as you progressed into higher grade levels/university.)

Sure, it prepared us for the real world to have trick problems (not nearly as tricky as real life, of course). But digging through details of if we add the apples and oranges together, or not? Because just because you can’t compare them (also a big lie for another time), even though you can add them in the context of being fruit (for example), but not to decide if you have the seeds from 20 pieces of fruit and 33.3% are oranges and each orange produces 3 seeds, and the rest are apples and 1/2 of the apples produce 2 seeds, and the other ones 4. So, if you have that much fruit, how many trees can you plant from 1 of the oranges?

There’s the catch

So many details and so few of the details make a bit of difference. The number of apples or oranges, meaningless. The fact that you have 20 pieces of fruit? Nope. Not even the fact that splitting 20 pieces of fruit in 1/3 is impossible without a knife makes one bit of difference.

All we need are two pieces of information.

Each orange produces 3 seeds.

Yet, as you read that, you know you dreaded the questions that could be asked. You may have started taking notes. You may have even started the math needed. In school you probably would had argued strenuously about the fractional oranges. All for naught while your teacher had to fight back laughing at all of the completely unnecessary questions.

Oh, and how half of the students were finished in 30 seconds, and the others were 10 minutes in before they realized it, and then felt sad when some students completely missed the point. (And then their parents called in and complained that their child was sad they missed the question and how dare you make a question like this?)

When doing a task, especially answering a question, be sure that you really know the question. Then, just like the optimizer will do with your query, find the simplest way to ask and answer the question asked.

One big difference

While queries are just word problems with slightly more complex answers, there is one major difference. Word problems were meticulously created to have a specific answer. Someone else has solved it and knows the answer they want. Sure, they can be wrong occasionally, but most of the time, there is one, findable answer.

Word problems based on math can be complex enough, but is anything as complex as navigating the data of a 20-year-old computer system that has seen 2 mergers and 6 sell-offs all with no governance other than “do what we need to get it done.“

Now, ideally, you have implemented a data warehouse for your data, and it is all cleaned and neat as straightforward. But that bowl of fruit in the word problem looks straightforward too.

Map the question to the data and off you go

Once you understand the question, then you map the question to the data structures you have. Sometimes it is simple, sometimes it is hard because terms are sort of vague. This is where you clarify the details so you get the questions that the customer really wants:

  • “When you said fruit, what if the fruit is rotten?”
  • “Are there different kinds of oranges?”
  • “Is a nectarine an orange?”

Silly-sounding questions about word problems, but your database users may say things like “customers” and mean “customers who have purchased something in the past year and paid for it” and you need to exclude customers who also shoplifted for this specific report.

Over time, these questions become second nature to ask and then eventually to get bak in the specs. But as hard as it is to say, “sorry boss, I need these questions answered before I can write that urgent report,” it is also bad to say we had far too many or far too few sales last quarter because you didn’t ask the question the customer expected.

Summary

If you ask the wrong question, you get the wrong answer. Coding is fun, but you can easily get bogged down in unneeded details and end up missing the needed ones if you aren’t careful.

Leave a Reply

I’m Louis

I have been at this database thing for a very long time, with no plans to stop.

Series: SQL Techniques You Should Know

Recents

Discover more from Drsql's Database Musings

Subscribe now to keep reading and get access to the full archive.

Continue reading