Drillthrough Reports: Easy

As the merger of Woodgrove Bank & Finance with Contoso Bank continues (much like the first eukaryotic cell swallowing up a bacteria that ended up being useful and then mandatory), there is a Realignment underway.  All of the database people in the combined company have come together as the Data tribe (with Prod DBA and SQL Dev sub-clans).  It is an extraordinarily exciting and busy time.  Prior to the Realignment, standards were sometimes loose and sometimes….  Now we are discussing new standards for new work (code that comes down to us from the Before Time will not be touched unless there is a compelling reason to do so).  It is always interesting to see how other people have attempted to solve similar issues to ones that you have faced; there is a lot of healthy learning and teaching on both sides.

In one such case, the discussion was about how best to do drillthrough reports.  In most cases, there is an easy way to do this that allows you to set a field in a parent SSRS report as a clickable link to a child SSRS report.  Here is a picture that shows how you would set it up.  In the Order Overview report, you take the properties of the text box that holds OrderID and point it to the Order Detail report.  In this case, the OrderID is used as an input parameter for the child report, but that is configurable.

image002

And remember, Data wants to be free (and it wouldn’t mind a Reuben sandwich, if you’re offering).  It’s good to want things, but it’s not always good to get them.  Just be careful out there.

How I Learned to Stop Worrying and Love NUL

After the merger of Woodgrove Bank & Finance with Contoso Bank, each Division had its own silo of information.  Each bank (and several of their prior acquisitions) had slightly different ways of doing things.  Often, the same information was represented differently (Client vs. ClientID vs. ClientNumber) and sometimes one bank would use pieces of data that the other one didn’t.  Month-end financial reporting for the newly combined behemoth was nearly impossible with all of the legacy systems.  “Let’s build a data warehouse from scratch,” they said.  “We’ll stage data from the different systems in different staging databases and standardize the data as it is fed into a central data warehouse.”

They gathered requirements and carefully considered different designs and kept meetings frequent but mercifully short.  Soon they had a data warehouse built on SQL Server 2014 AlwaysOn Availability Groups.  All of the servers were virtual machines in beefy new hosts that used 100% SSD storage.  It was an engineering marvel.  And then it went live.  The staging databases were ginormous and several of the daily feeds were over 150 GB.  The system was able to keep up with all of the data and all of the reporting demands, but the network share for database backups could not handle the sheer size of transaction log backups.  After adding a TB a day for several days, Systems was begging for some relief.

“The flat files from all of the daily feeds take up less space than the databases to begin with, not to mention compressing much, much better after data deduplication,” they said.  “Let’s just set the databases to Simple Recovery, take nightly full backups, and reimport the flat files if we ever need to restore.  Easy peasy lemon squeezy.”

While it still seems to me a valid requirement to have a highly available database that does not require point in time recovery, Microsoft disagrees.

Checklist: Requirements (Availability Databases)
https://msdn.microsoft.com/en-us/library/ff878487(v=sql.120).aspx#PrerequisitesForDbs

If you attempt to set an Availability database to Simple recovery, this is what you will see:

AOAG_Nul

Enter the Good-ish Really Bad Idea™.

Readers familiar with Unix may have used /dev/null.  Output ‘written’ to /dev/null simply ceases to exist as all of those ones and zeros pass into the ether, but the write operation is reported to have succeeded.  In DOS/Windows, this black hole is NUL (one ‘L’) and is the magic behind the Recycle Bin.  SQL Server can backup to this device, but it is almost always a Really Bad Idea™.  As often happens with screwy edge cases, a Really Bad Idea™ may be the last, best option at one’s disposal.

BACKUP LOG [ODS_Contoso] TO DISK = N'NUL'

SQL does what you tell it to do.  It does not care nor will it prompt you if you write your full backups to NUL.  A single transaction log backup written to NUL will break the log chain, so if you really do need point-in-time recovery, you’ll be out of luck until the next full backup.  And don’t forget, this merely discards inactive records in the log file; it doesn’t shrink the file.

This is such a bad idea that the previously documented way of doing this (BACKUP LOG WITH TRUNCATE_ONLY) was officially removed by Microsoft with the release of SQL 2008.  Now, Microsoft insists that if you don’t want to back up the transaction log that you put the database in Simple recovery mode.  Except that you can’t when the database is part of an Availability Group.

#alwayson-availability-groups, #sql-server

Sparse Columns

The Developers did what? Why on Earth would you ever put a sparse column on a bit field?  I mean, seriously, who DOES that?

Sparse columns are exactly what they sound like.  They are columns which are anticipated to have a large percentage of NULL values and because of this, you can opt not to store the NULLS.  When a record does have a value for that column, it is tacked onto the end of the record with a piece of metadata that describes what to do with the value when a query touches that record.  On a basic level, this is somewhat similar to how data is stored in a JSON “document” in non-relational databases (if a record does not have a value for a particular attribute, that attribute is simply skipped).

The extra piece of metadata creates  some overhead, so there are suggestions for when to use/not use a sparse column, depending upon the data type in question.  For a uniqueidentifier, the break even point is around 43% NULL, for an int, it is closer to 64% and for a bit field it is 98%.  Imagine my surprise then when I found that one of the new .Net developers had create a table where nearly every column was sparse (other than the primary key, business key, and columns for when the records was created and by whom).

Eager to correct this before it was promoted to production, I went on a quest to Appdevlandia to find and educate the responsible party.  Once he had been identified, I approached cautiously and set the trap asked him to explain what problem he’d been trying to solve when he created that table and did he have a specific reason for designating bit fields as sparse columns.  He described the task that had been set before him and asked if he’d done something wrong.

It turns out that he was designing a process that would transmit data to a second system.  For the initial transaction, the entire record would be sent, but for subsequent transactions, he would only send the primary and business keys and any columns that had been changed (and thus most of the columns in the log table—including the bit field—would be NULL most of the time).  I explained what my concern had been and thanked him for the lesson.  Deep in thought, I wandered back to share with the rest of the Data Clan what I had just learned.

#sparse-columns, #sql-server

Data Masking

If terms like these (PCI DSS, HIPAA, SOX, etc.) cause you to wake up in a cold sweat, data masking might just be what the doctor ordered. Previews of SQL Server 2016 tout a new feature called dynamic data masking, which will allow you to house and control sensitive data in your database; only users with UNMASK privileges can see the real values. As data breaches occur with increasing frequency, I would hope that this feature would be available across the board.  I have yet to find mention, however, of which edition(s) this feature will be available in, so I would prepare for the sticker shock of Enterprise Edition (just in case).

If you can’t wait for the general release of SQL Server 2016 or do not want your production environment to lose any fingers on the bleeding edge of technology, a quick search will turn up multiple reputable companies that offer persistent and/or dynamic data masking products.  You also have the option to add many of these features with functionality that has been around since at least SQL Server 2000 (Why are you still on SQL 2000?).

For this example, we will look at the Sales.CreditCard table in AdventureWorks2008R2.  The CardNumber field shows the full credit card number to anyone with basic SELECT privileges on the Sales schema.  You could create a view (named CreditCard_NonSensitiveFields or some such), but that only partially addresses the issue.  What I propose is to create a Custom Database Role and use it to deny privileges to its members.

USE [AdventureWorks2008R2];
GO

/* Create a view for "limited" users */
CREATE VIEW [Sales].[CreditCard_NonSensitiveFields]
AS
SELECT CreditCardID
, CardType
, CardNumber = RIGHT(REPLICATE('X', LEN(CardNumber)) + RIGHT(CardNumber, 4), LEN(CardNumber))
, ExpMonth = 1
, ExpYear = 1900
, ModifiedDate
FROM Sales.CreditCard;
GO

/* Create a Custom Database Role to deny access to specific columns */
CREATE ROLE [db_CUSTOMROLE_DenyPCI] AUTHORIZATION [dbo]
GO

/* You can deny all you want */
DENY SELECT ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY INSERT ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY UPDATE ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY DELETE ON [Sales].[CreditCardID].[CardNumber] TO [db_CUSTOMROLE_DenyPCI];
DENY SELECT ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY INSERT ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY UPDATE ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY DELETE ON [Sales].[CreditCardID].[ExpMonth] TO [db_CUSTOMROLE_DenyPCI];
DENY SELECT ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
DENY INSERT ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
DENY UPDATE ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
DENY DELETE ON [Sales].[CreditCardID].[ExpYear] TO [db_CUSTOMROLE_DenyPCI];
GO

/* Adding Role members by Active Directory Security Groups */
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\SG_Application_Specialists];
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\SG_Benefits_Specialists];
GO

/* Adding Role members one at a time */
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\barbara1];
ALTER ROLE [db_CUSTOMROLE_DenyPCI] ADD MEMBER [adventure-works\mike0];
GO

In most cases, it is better to deal with groups when addressing security (database users, roles, etc.), but you can also add individuals.

If you don’t want to be in a position where you HAVE to trust your DBAs (for whatever reason), then hardware security modules or tokenization appliances might be a better fit.  Still, there are some fairly decent options available now and the future looks bright.

#custom-database-roles, #data-masking, #sql-server, #sql-server-2016

Global Temporary Tables

I was wondering the other day just how global are global temp tables?  MSDN has this to say:

Temporary Tables
You can create local and global temporary tables. Local temporary tables are visible only in the current session, and global temporary tables are visible to all sessions. Temporary tables cannot be partitioned.
Prefix local temporary table names with single number sign (#table_name), and prefix global temporary table names with a double number sign (##table_name).

This is in line with what I remembered, but I still had a few questions, so I ran a short test and this is what I found.  First, we need to set up our initial session.

USE AdventureWorks;
GO

IF OBJECT_ID('tempdb..#Customer') IS NOT NULL
DROP TABLE #Customer;

CREATE TABLE #Customer
(
[CustomerID] [INT] IDENTITY(1, 1) NOT NULL PRIMARY KEY CLUSTERED
, [TerritoryID] [INT] NULL
, [AccountNumber] [VARCHAR](10) NOT NULL
, [CustomerType] [NCHAR](1) NOT NULL
, [rowguid] [UNIQUEIDENTIFIER] NOT NULL
, [ModifiedDate] [DATETIME] NOT NULL
)
ON  [PRIMARY];

SET IDENTITY_INSERT #Customer ON;
INSERT  INTO #Customer
( CustomerID
, TerritoryID
, AccountNumber
, CustomerType
, rowguid
, ModifiedDate
)
SELECT *
FROM    Sales.Customer;
SET IDENTITY_INSERT #Customer OFF;

SELECT * FROM #Customer;

This yields the following results:

Results of Initial Session

In a second session, I ran a select statement against the same local temp table.  As expected, the query failed.

USE AdventureWorks;
GO

SELECT * FROM #Customer;

Results:
Msg 208, Level 16, State 0, Line 2
Invalid object name ‘#Customer’.

Let’s try that again, except with a global temp table.  First, create the global temp table in the initial session.

USE AdventureWorks;
GO

IF OBJECT_ID('tempdb..##Customer') IS NOT NULL
DROP TABLE ##Customer;

CREATE TABLE ##Customer
(
[CustomerID] [INT] IDENTITY(1, 1) NOT NULL
PRIMARY KEY CLUSTERED
, [TerritoryID] [INT] NULL
, [AccountNumber] [VARCHAR](10) NOT NULL
, [CustomerType] [NCHAR](1) NOT NULL
, [rowguid] [UNIQUEIDENTIFIER] NOT NULL
, [ModifiedDate] [DATETIME] NOT NULL
)
ON  [PRIMARY];

SET IDENTITY_INSERT ##Customer ON;
INSERT  INTO ##Customer
( CustomerID
, TerritoryID
, AccountNumber
, CustomerType
, rowguid
, ModifiedDate
)
SELECT  *
FROM    Sales.Customer;
SET IDENTITY_INSERT ##Customer OFF;

SELECT * FROM ##Customer;

This query returns same results as before (as expected).  Now for a couple of small twists.  In the second session, I will use Contoso instead of AdventureWorks.  Also, after testing whether the SELECT works, I will also attempt to DROP the global temp table from the initial session.

USE Contoso;
GO

SELECT * FROM ##Customer

DROP TABLE ##Customer

This time the SELECT statement returns the same results as in the initial session, even though we are running the query under a different database.  Note also that you can even DROP the global temp table in a session other than the one in which it was created.  If we now go back to the initial session and run the final SELECT again, you will see the “Invalid object” error below.
Msg 208, Level 16, State 0, Line 2
Invalid object name ‘##Customer’.

Most things that can be done with global temporary tables can probably be done better with other methods, but it is good to understand how these things work should you choose to use this option.

Custom Database Roles in SQL Server

In addition to the default Database-Level Roles in SQL Server (db_owner, db_securityadmin, db_accessadmin, db_backupoperator, db_ddladmin, db_datawriter, db_datareader, db_denydatawriter and db_denydatareader), you can create custom roles at the database level.  One reason you might want to do this would be if you have different user accounts for the same functionality across multiple environments (e.g. Dev, QA, UAT, Production).  It might be easier for an application to use the same service user/password combination across all of these environments, but then your developers would know the production Password.  If you store any sort of sensitive information in your database, this makes you an easy target.  It is much more secure to use a dedicated service user for each application in each environment.

Another reason you might want to consider using custom roles would be if multiple users in the same department log directly into the database instead of going through a thread pool in the application layer.  If a new user is hired for accounting or maybe someone leaves the department, there is no need to search for which permissions must be added or removed.

With a little bit of work, you can assign all of the database permissions for a user (or group of users) to a custom role.  If you need to add/remove users, you only need to move them in and out of custom roles.  If you need a redacted copy of the Production database brought down to Dev, just swap out the Production service user for the Dev one.  In this example, the accounting application might have four service users (AcctApp_DEV, AcctApp_QA, AcctApp_UAT and AcctApp).  The same could hold true for the sales application, etc.

Continue reading

#custom-database-roles, #sql-server

Mapping a Drive on a Remote Share for SQL Server

I recently had to download a scrubbed copy of a large Production database for the developers to troubleshoot an issue. Normally, I would have purged much of the older data and run a shrink before handing it over, but they convinced me that all of the data was required in this specific instance. I had an 85 GB database and only 70 GB free space on the Dev machine. After a bit of research, I found an excellent article by Taylor Gibb over at MyTechMantra. While I normally shun XP_CMDSHELL like a stray vial of smallpox, I discussed its limited use with the rest of the data team and we decided it would be okay.

What I ended up doing was adding a mapped drive to SQL Server. The drive went to a share that had the scrubbed backup and I both restored from there and created the MDF and LDF files there. I enabled XP_CMDSHELL only long enough to map the drive and then again to unmap the drive when the database was no longer need for troubleshooting. It did run slow, but it was good enough to troubleshoot the issue at hand. I would only recommend this option for a Dev environment, and only when no other options are available.

There are a few important things to note here:
1. The new drive will NOT survive a restart, even with the /persistent:yes switch.
2. The new drive will NOT show up in xp_fixeddrives.
3. XP_CMDSHELL is a HUGE security hole that should only be used sparingly and then *immediately* disabled when finished.
4. See # 3. Continue reading