Sparse Columns

The Developers did what? Why on Earth would you ever put a sparse column on a bit field?  I mean, seriously, who DOES that?

Sparse columns are exactly what they sound like.  They are columns which are anticipated to have a large percentage of NULL values and because of this, you can opt not to store the NULLS.  When a record does have a value for that column, it is tacked onto the end of the record with a piece of metadata that describes what to do with the value when a query touches that record.  On a basic level, this is somewhat similar to how data is stored in a JSON “document” in non-relational databases (if a record does not have a value for a particular attribute, that attribute is simply skipped).

The extra piece of metadata creates  some overhead, so there are suggestions for when to use/not use a sparse column, depending upon the data type in question.  For a uniqueidentifier, the break even point is around 43% NULL, for an int, it is closer to 64% and for a bit field it is 98%.  Imagine my surprise then when I found that one of the new .Net developers had create a table where nearly every column was sparse (other than the primary key, business key, and columns for when the records was created and by whom).

Eager to correct this before it was promoted to production, I went on a quest to Appdevlandia to find and educate the responsible party.  Once he had been identified, I approached cautiously and set the trap asked him to explain what problem he’d been trying to solve when he created that table and did he have a specific reason for designating bit fields as sparse columns.  He described the task that had been set before him and asked if he’d done something wrong.

It turns out that he was designing a process that would transmit data to a second system.  For the initial transaction, the entire record would be sent, but for subsequent transactions, he would only send the primary and business keys and any columns that had been changed (and thus most of the columns in the log table—including the bit field—would be NULL most of the time).  I explained what my concern had been and thanked him for the lesson.  Deep in thought, I wandered back to share with the rest of the Data Clan what I had just learned.

#sparse-columns, #sql-server