How I Learned to Stop Worrying and Love NUL

After the merger of Woodgrove Bank & Finance with Contoso Bank, each Division had its own silo of information.  Each bank (and several of their prior acquisitions) had slightly different ways of doing things.  Often, the same information was represented differently (Client vs. ClientID vs. ClientNumber) and sometimes one bank would use pieces of data that the other one didn’t.  Month-end financial reporting for the newly combined behemoth was nearly impossible with all of the legacy systems.  “Let’s build a data warehouse from scratch,” they said.  “We’ll stage data from the different systems in different staging databases and standardize the data as it is fed into a central data warehouse.”

They gathered requirements and carefully considered different designs and kept meetings frequent but mercifully short.  Soon they had a data warehouse built on SQL Server 2014 AlwaysOn Availability Groups.  All of the servers were virtual machines in beefy new hosts that used 100% SSD storage.  It was an engineering marvel.  And then it went live.  The staging databases were ginormous and several of the daily feeds were over 150 GB.  The system was able to keep up with all of the data and all of the reporting demands, but the network share for database backups could not handle the sheer size of transaction log backups.  After adding a TB a day for several days, Systems was begging for some relief.

“The flat files from all of the daily feeds take up less space than the databases to begin with, not to mention compressing much, much better after data deduplication,” they said.  “Let’s just set the databases to Simple Recovery, take nightly full backups, and reimport the flat files if we ever need to restore.  Easy peasy lemon squeezy.”

While it still seems to me a valid requirement to have a highly available database that does not require point in time recovery, Microsoft disagrees.

Checklist: Requirements (Availability Databases)
https://msdn.microsoft.com/en-us/library/ff878487(v=sql.120).aspx#PrerequisitesForDbs

If you attempt to set an Availability database to Simple recovery, this is what you will see:

AOAG_Nul

Enter the Good-ish Really Bad Idea™.

Readers familiar with Unix may have used /dev/null.  Output ‘written’ to /dev/null simply ceases to exist as all of those ones and zeros pass into the ether, but the write operation is reported to have succeeded.  In DOS/Windows, this black hole is NUL (one ‘L’) and is the magic behind the Recycle Bin.  SQL Server can backup to this device, but it is almost always a Really Bad Idea™.  As often happens with screwy edge cases, a Really Bad Idea™ may be the last, best option at one’s disposal.

BACKUP LOG [ODS_Contoso] TO DISK = N'NUL'

SQL does what you tell it to do.  It does not care nor will it prompt you if you write your full backups to NUL.  A single transaction log backup written to NUL will break the log chain, so if you really do need point-in-time recovery, you’ll be out of luck until the next full backup.  And don’t forget, this merely discards inactive records in the log file; it doesn’t shrink the file.

This is such a bad idea that the previously documented way of doing this (BACKUP LOG WITH TRUNCATE_ONLY) was officially removed by Microsoft with the release of SQL 2008.  Now, Microsoft insists that if you don’t want to back up the transaction log that you put the database in Simple recovery mode.  Except that you can’t when the database is part of an Availability Group.

#alwayson-availability-groups, #sql-server

Sparse Columns

The Developers did what? Why on Earth would you ever put a sparse column on a bit field?  I mean, seriously, who DOES that?

Sparse columns are exactly what they sound like.  They are columns which are anticipated to have a large percentage of NULL values and because of this, you can opt not to store the NULLS.  When a record does have a value for that column, it is tacked onto the end of the record with a piece of metadata that describes what to do with the value when a query touches that record.  On a basic level, this is somewhat similar to how data is stored in a JSON “document” in non-relational databases (if a record does not have a value for a particular attribute, that attribute is simply skipped).

The extra piece of metadata creates  some overhead, so there are suggestions for when to use/not use a sparse column, depending upon the data type in question.  For a uniqueidentifier, the break even point is around 43% NULL, for an int, it is closer to 64% and for a bit field it is 98%.  Imagine my surprise then when I found that one of the new .Net developers had create a table where nearly every column was sparse (other than the primary key, business key, and columns for when the records was created and by whom).

Eager to correct this before it was promoted to production, I went on a quest to Appdevlandia to find and educate the responsible party.  Once he had been identified, I approached cautiously and set the trap asked him to explain what problem he’d been trying to solve when he created that table and did he have a specific reason for designating bit fields as sparse columns.  He described the task that had been set before him and asked if he’d done something wrong.

It turns out that he was designing a process that would transmit data to a second system.  For the initial transaction, the entire record would be sent, but for subsequent transactions, he would only send the primary and business keys and any columns that had been changed (and thus most of the columns in the log table—including the bit field—would be NULL most of the time).  I explained what my concern had been and thanked him for the lesson.  Deep in thought, I wandered back to share with the rest of the Data Clan what I had just learned.

#sparse-columns, #sql-server

Data Masking

If terms like these (PCI DSS, HIPAA, SOX, etc.) cause you to wake up in a cold sweat, data masking might just be what the doctor ordered. Previews of SQL Server 2016 tout a new feature called dynamic data masking, which will allow you to house and control sensitive data in your database; only users with UNMASK privileges can see the real values. As data breaches occur with increasing frequency, I would hope that this feature would be available across the board.  I have yet to find mention, however, of which edition(s) this feature will be available in, so I would prepare for the sticker shock of Enterprise Edition (just in case).

If you can’t wait for the general release of SQL Server 2016 or do not want your production environment to lose any fingers on the bleeding edge of technology, a quick search will turn up multiple reputable companies that offer persistent and/or dynamic data masking products.  You also have the option to add many of these features with functionality that has been around since at least SQL Server 2000 (Why are you still on SQL 2000?).

For this example, we will look at the Sales.CreditCard table in AdventureWorks2008R2.  The CardNumber field shows the full credit card number to anyone with basic SELECT privileges on the Sales schema.  You could create a view (named CreditCard_NonSensitiveFields or some such), but that only partially addresses the issue.  What I propose is to create a Custom Database Role and use it to deny privileges to its members.

USE [AdventureWorks2008R2];
GO

/* Create a view for "limited" users */
CREATE VIEW [Sales].[CreditCard_NonSensitiveFields]
AS
SELECT CreditCardID
, CardType
, CardNumber = RIGHT(REPLICATE('X', LEN(CardNumber)) + RIGHT(CardNumber, 4), LEN(CardNumber))
, ExpMonth = 1
, ExpYear = 1900
, ModifiedDate
FROM Sales.CreditCard;
GO

/* Create a Custom Database Role to deny access to specific columns */
CREATE ROLE [db_CUSTOMROLE_DenyPCI] AUTHORIZATION [dbo]
GO

/* You can deny all you want */
DENY SELECT ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY INSERT ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY UPDATE ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY DELETE ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY SELECT ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY INSERT ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY UPDATE ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY DELETE ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY SELECT ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
DENY INSERT ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
DENY UPDATE ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
DENY DELETE ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
GO

/* Adding Role members by Active Directory Security Groups */
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\SG_Application_Specialists];
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\SG_Benefits_Specialists];
GO

/* Adding Role members one at a time */
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\barbara1];
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\mike0];
GO

In most cases, it is better to deal with groups when addressing security (database users, roles, etc.), but you can also add individuals.

If you don’t want to be in a position where you HAVE to trust your DBAs (for whatever reason), then hardware security modules or tokenization appliances might be a better fit.  Still, there are some fairly decent options available now and the future looks bright.

#custom-database-roles, #data-masking, #sql-server, #sql-server-2016

Custom Database Roles in SQL Server

In addition to the default Database-Level Roles in SQL Server (db_owner, db_securityadmin, db_accessadmin, db_backupoperator, db_ddladmin, db_datawriter, db_datareader, db_denydatawriter and db_denydatareader), you can create custom roles at the database level.  One reason you might want to do this would be if you have different user accounts for the same functionality across multiple environments (e.g. Dev, QA, UAT, Production).  It might be easier for an application to use the same service user/password combination across all of these environments, but then your developers would know the production Password.  If you store any sort of sensitive information in your database, this makes you an easy target.  It is much more secure to use a dedicated service user for each application in each environment.

Another reason you might want to consider using custom roles would be if multiple users in the same department log directly into the database instead of going through a thread pool in the application layer.  If a new user is hired for accounting or maybe someone leaves the department, there is no need to search for which permissions must be added or removed.

With a little bit of work, you can assign all of the database permissions for a user (or group of users) to a custom role.  If you need to add/remove users, you only need to move them in and out of custom roles.  If you need a redacted copy of the Production database brought down to Dev, just swap out the Production service user for the Dev one.  In this example, the accounting application might have four service users (AcctApp_DEV, AcctApp_QA, AcctApp_UAT and AcctApp).  The same could hold true for the sales application, etc.

Continue reading

#custom-database-roles, #sql-server