SQL Anti Pattern Confessions #1

SQL Anti Patterns, by Bill Karwin, is a book I wish I’d bought years ago when I really needed it. In these posts I’m briefly going to name the pattern, confess if I’ve been guilty of it in the past and communicate the suffering I’ve endured because of it. If you want solutions for these crimes against normalization I strongly suggest you pick up a copy of Mr. Karwin’s book.

Some days you are sitting around a conference room table, trying to solve a problem with you colleagues and someone says, “You know what we can do…”, and two years later you find yourself coding around that “quick and easy solution” that has plagued you six months after it went into production. Learn the lessons of this book well, it will not only make your queries run faster, but your applications will be flexible and provide you with room to grow and maneuver.

Chapter 2: Jaywalking

Plead: Guilty

Anytime you think it’s OK to store a list or comma separated values in a single field, STOP! This is not OK! You will pay severely!

Many developers confuse fields with columns when designing their database. They might think, “I’ll store the available colors for my umbrellas in field called available_colors and I’ll separate them with a comma.”

create table umbrellas
    umbrealla_id int not null auto_increment primary key
    available_colors text default '' 

| id   | available_colors |
|------|------------------|
|123456|red,blue,green    |
|123457|wine,navy,lime    |

You think, when it comes time to list the available colors I’ll just split the field on the comma and have my colors! While you have simplified your SQL, you have limited the number or colors you can offer. When someone asks what is the maximum number of colors you can store, you don’t know! The available_colors column is not a list of values but an arbitrary string! You can fit many colors with short names but not as many long-named ones.

Besides accounting for size, that column now holds a unique value for that row. If I wanted to discontinue a color, instead of deleting one color from a join table, I’d have to examine and update every row in my umbrella table, hoping I pluck out the correct sub-string without trimming or concatenating any of the other values in the column.

In the example above I’m using English names of colors, what if I wanted to localize my catalog for Spanish users? If I wanted to display “roja,azure,verde” I’d have to translate each sub-string after it was fetched!

Storing multi-value strings in a single column saves you from simple joins but forces you to write complicated queries that tie you in knots whenever you need to select, filter or aggregate data based on such values.

People who know how to properly design a database will assume that you do not when they see you abuse this cardinal rule without a good reason. There are limits to how many columns you can have in a row. There are byte-size limits on how much data you can cram into a single row. If you have a column that represents a list, better to make it a separate table. Indexing the value column can help you find particular values faster. After all a table is a list of rows.

Also, you are going to duplicate a lot of data values. When you store a list of values as a string you are essentially transforming information into an anonymous blob. Instead of your database organizing and managing your data, you’re transferring that responsibility to your application. Your business logic must now fetch, parse and provide context for your values.

Finally, despite your best estimate, someday, someone is going to find your BIG text field of multiple values just a bit too small for all their need. For the sake of a single “cell”, you’ll be required to alter the whole table to make it big enough. This not only wastes space, it could incur downtime while the table is copied into its new schema.

REMEMBER: One column holds one and ONLY one value at a time! Lists should never be stored as values but expressed as a relationship!