Using TRANSLATE in SQL Server instead of REPLACE

So the other day I was doing some data cleanup tasks, trying to format various values in a first name column. Needless to say, not everyone takes the task of giving a first name as sensically as you might imagine. And sense you generally don’t want to stop anyone from having any name they want. I mean, what kid doesn’t remember when I first saw the Peanuts comic strip with the character 555 95472} or for a more recent reference, XKCD’s Little Bobby Tables.

The data I am working with sometimes has people with multiple parts to their name (Mary Jo, Cindy Lou) etc, or sometimes Fred/Joe, Mary and Jack, Mary & Jack, or what have you. My goal was to turn these names into little delimited lists that I could parse on a space character with STRING_SPLIT and there were a “few” of these cases. This was the code I had arrived at when I reached the “good enough” stage of my coding.

REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(FirstName,'.',' '),'&',''),'-',''),'''',''),'`',''),'_','')
,'\',' '),'(',' '),')',' '),' and ',' '),',',''),'/',' ')
,'|',' '),':',''),'"',''),'’',''),'“',''),'”',''),';',''),'.','') AS FirstName

This was only an experiment to see if I could match two names together better if one time they said Mary, one time Mary Lou, etc. I will probably write a whole lot more on this later, but for now, the story gets good when I went onto X and was grousing about not yet having Regular Expressions in SQL Server. To which the awesome Erik Darling replied:

Translate? TRANSLATE, must be something new in SQL Server 202.. nope. 2017. So I decided to give it a try and of course, write a blog about it for future reference.

What is TRANSLATE

It is a shorthand for a REPLACE expression that replaces multiple, single character expressions. Not only does it differ in this way, as the name implies, you are translating one character to another, which also says you can’t remove a character with it directly. (You could translate one character to a character that you want to replace with nothing and then do a final REPLACE.)

That single character is not a big problem as I really only needed to clean up one common phrase. In this next example, I will keep it simpler and just do a few characters, but you can see how easy it would be to replace a rather large number of characters.

To the code

In the Appendix, there is a temp table that you can use to get the following set of data:

SELECT *
FROM   #Names;

These are the rows:

NameValue
------------------------------
Mary/Jo
Mary-Jo
Mary and Jo
Mary&Jo
Mary,Jo
“Mary”“Jo”
Mary_Jo
Mary(Jo)
Mary+Jo

All of these should translate to Mary, some number of spaces, and then the value Jo. I could clean up the spaces, but it would require a loop with replace (or REGEXP_REPLACE(NameValue, '\s+', ' ')).

In order to use REPLACE to do this, you would need eleven nested REPLACE calls. Like this:

SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
NameValue
,'/',' '),'-',' '),' and ',' '),'&',' '),',',' ')
,'“',' '),'”',' '),'_',' '),')',' '),'(',' '),'+',' ')
FROM #Names;

The output of this query is now:

--------------
Mary Jo
Mary Jo
Mary Jo
Mary Jo
Mary Jo
 Mary  Jo 
Mary Jo
Mary Jo 
Mary Jo

The smart quoted values have extra spaces, so you could use TRIM and the REGEXP_REPLACE call mentioned earlier to clean that up very quickly.

We can change that crazy set of nested REPLACE function calls to TRANSLATE for all except one. The ‘ and ‘ one. We will need to use REPLACE for that.

SELECT TRANSLATE(NameValue,'/-&;,“”_,+()','            ')
FROM   #Names;

Now each character is translated to a space and the output is exactly the same (except for the and):

-------------
Mary Jo
Mary Jo
Mary and Jo
Mary Jo
Mary Jo
Mary  Jo
Mary Jo
Mary Jo
Mary Jo

And finish it off with three nested calls to clean up the spaces and the ‘and’:

SELECT DISTINCT REGEXP_REPLACE(
           TRIM(REPLACE(TRANSLATE(
                NameValue,
                '/-&,“”_,+()','           '),' and ',' ')),' +',' ')
FROM   #Names;

Just one row with:

---------
Mary Jo

Nice, and the worst part of using nested REPLACE calls (the tedium of getting all the nesting j u s t right… is now a snap. If you don’t get the number of characters straight in your call, like this:

--no translate to string included:
SELECT TRANSLATE(NameValue,'/-&,“”_,+()','')
FROM   #Names;

The sad trombone plays and you get to make sure you get that correct. It is not a syntax error like when you don’t quite get the REPLACE calls correct which can make debugging more complex.

Msg 9828, Level 16, State 3, Line 72 The second and third arguments of the TRANSLATE built-in function must contain an equal number of characters.


Note of course, that it could get tricky to get that translation right if you were using a very long string. For example, say you had the following:

SELECT TRANSLATE(NameValue,'abcdefghijklmnopqrstuvwxy',
                           'ABCDEFGHIJKLMNOPQRSTUVWXYZ')
FROM   #Names;

On the first string I left out z. When I was writing the example, I did accidentally add an extra O, and that did take me a bit to fix, even when I could spot the issue by singing the alphabet song.

As you might expect, you can also pass the criteria in as variables:

DECLARE @from varchar(100) = 'abcdefghijklmnopqrstuvwxyz',
        @to   varchar(100) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

SELECT TRANSLATE(NameValue,@from,@to)
FROM   #Names;

The output from here is:

-------------
MARY/JO
MARY-JO
MARY AND JO
MARY&JO
MARY,JO
“MARY”“JO”
MARY_JO
MARY(JO)
MARY+JO

As you can see they are now uppercase.

A bit of fun

The name of this function is TRANSLATE and as such you could use a simple substitution cipher to encode messages to your friends if you are 13 and happen to have a SQL server along with you 🙂

DECLARE @from varchar(100) = 'abcdefghijklmnopqrstuvwxyz',
        @to   varchar(100) = 'BCDEFGHIJKLMNOPQRSTUVWXYZA'
        --all letters shifted one place for ease reading:

SELECT TRANSLATE('it was a dark and stormy night',@from,@to);

This returns the mysterious:

---------------------------------
JU XBT B EBSL BOE TUPSNZ OJHIU

But just swap around the from and to:

DECLARE @from varchar(100) = 'abcdefghijklmnopqrstuvwxyz',
        @to   varchar(100) = 'BCDEFGHIJKLMNOPQRSTUVWXYZA' 
        --all letters shifted one place:

SELECT TRANSLATE('JU XBT B EBSL BOE TUPSNZ OJHIU',@to,@from);

And you will find that it is a dark and stormy night.

-------------------------------
it was a dark and stormy night

Summary

You say complaining never does anything, but if I hadn’t taken a few minutes to go grouse on social media, I would never have learned of this function. This will save me a ton of time on these kinds of issues because writing nested REPLACE calls is a nightmare.

Honestly, I am not sure that even REGEXP_REPLACE would be as easy to work with as this in most cases.

Thank you Erik Darling for the inspiration!

Appendix

Initialization:

DROP TABLE IF EXISTS #Names;

CREATE TABLE #Names
(
NameValue varchar(30) NOT NULL
);

INSERT INTO #Names
VALUES ('Mary/Jo'),('Mary-Jo'),('Mary and Jo'),
('Mary&Jo'),('Mary,Jo'),('“Mary”“Jo”'),
('Mary_Jo'),('Mary(Jo)'),('Mary+Jo');

3 responses to “Using TRANSLATE in SQL Server instead of REPLACE”

  1. aaronbertrand Avatar
    aaronbertrand

    Nice. I had an early draft going for my own post showing the evolution from:

    20 nested REPLACE calls

    a mix of TRANSLATE (for single-character swaps) and REPLACE() and TRANSLATE()

    the much more flexible REGEXP_REPLACE().

    This variation isn’t much easier to read (like every RegEx I’ve ever met), but does avoid falling back to REPLACE and TRANSLATE.SELECT DISTINCT TRIM ( REGEXP_REPLACE ( NameValue,      N’s*(bandb|[^p{L}]+)s*’,      N’ ‘ ) ) AS NormalizedNameFROM #Names;

    Liked by 2 people

    1. Louis Davidson Avatar

      Got this one (thanks for keeping trying, sometimes I get tired of WordPress and trying to reply). Yeah, regular expressions still make my brain hurt, but so much nicer than what we have previously at times.

      Like

  2. […] Louis Davidson is lost in translation: […]

    Like

Leave a reply to aaronbertrand Cancel reply

I’m Louis

I have been at this database thing for a long long time, with no plans to stop.

This is my blog site, companion to drsql.org

Recents