Corruption – The Denmark of SQL Instances

20150428-countrycorruption
Red = corruption, like your DBCC results

Those who follow government corruption indexes are used to seeing Denmark at the top. The places I work with these days, not so much. Somalia, I’m looking at you buddy. Along with being one of the happiest countries Denmark is also the least corrupt. In my decade and a half as a DBA and developer all of my SQL instances have been Danish. Literally as well as figuratively. Yes, some of them were actually on servers in Denmark, but I’m talking in terms of corruption. I’ve never had a case of corruption, except for purposely caused training examples. Have I been lucky? Yes. Will I hit corruption someday? Maybe. I’m I worried about it and protect against it? Oh hell yes.

20150428-sb
Backups!

First off, backups, or as Steve Ballmer might say, while sweating profusely, “Backups! Backups! Backups! Backups! Backups! Backups!….” With out good backups, corruption will make you hate being a DBA. So, now that you have your backups (fulls and diffs and transaction logs, oh my) taken care of how do you spot corruption? Alerts. What makes my instances, hardware and applications so amazballs that they have incorruptibility of a saint? I have no idea, but I 20150428-dbcccheck, recheck and alert the hell out of them. First step is a nightly DBCC CHECKDB. This of course is a standard across every SQL instance ever setup. Nobody has ever run a SQL instance without implementing this. Right? Good.

 

Next are my alerts. The alert CREATE script below assumes an Operator named DBA and Database Mail are setup.

USE [msdb]
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 016',
@message_id=0,
@severity=16,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 016', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 017',
@message_id=0,
@severity=17,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 017', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 018',
@message_id=0,
@severity=18,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 018', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 019',
@message_id=0,
@severity=19,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 019', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 020',
@message_id=0,
@severity=20,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 020', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 021',
@message_id=0,
@severity=21,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 021', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 022',
@message_id=0,
@severity=22,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 022', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 023',
@message_id=0,
@severity=23,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 023', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 024',
@message_id=0,
@severity=24,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 024', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 025',
@message_id=0,
@severity=25,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 025', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Error Number 823',
@message_id=823,
@severity=0,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Error Number 823', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Error Number 824',
@message_id=824,
@severity=0,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Error Number 824', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Error Number 825',
@message_id=825,
@severity=0,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Error Number 825', @operator_name=N'DBA', @notification_method = 1;
GO

Errors 823, 824 and 835 are caused by errors with Windows API (ReadFile, WriteFile, ReadFileScatter, WriteFileGather) performing I/O operations. These may be triggered by hardware failure and could indicate a need to increase your hardware budget.

Severity 16 through 25 cover a variety of issues, including corruption. In terms of corruption lets look at 22, 23 and 24. This is a terse version of the MSDN entries.
22: Indicates that the table or index specified in the message has been damaged by a software or hardware problem.
23: Indicates that the integrity of the entire database is in question because of a hardware or software problem.
24: Indicates a media failure. The system administrator may have to restore the database. You may also have to call your hardware vendor.

These errors rarely occur, but when/if they do run DBCC CHECKDB to find what objects are damaged.  There are several methods of getting your database back to a consistent state that range from easy and good (restoring from backups) to less than optimal (DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS) to methods that border on black magic (DBCC WRITEPAGE). All of these require practice.

Most people learn best by doing. Reading a blog won’t prevent or fix corruption, but practicing on a corrupted database will give you the skillz so when your time comes you can be the DBA in shinning armor saving the day, rescuing the end users and protecting the organization with trumpets blaring and confetti flying. Or it could just mean keeping your job. Steve Stedman has launched a good series of corruption challenges. He corrupts the database, gives you some details and tells you to fix it. Kinda like your boss (telling you to fix it), but more helpful (giving you details). It’s a timed challenge with a scoreboard for those who like to keep track.

These challenges give you a chance to practice fixing corruption as well as testing your alerts. With week #2 challenge restored you can run a query against the dbo.Result table and have this niffty message appear via email or text.

Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: invalid protection option. It occurred during a read of page (1:244) in database ID 13 at offset 0x000000001e8000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\CorruptionChallenge2.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Challenge #3 set off alerts with severity 16 and 20 when trying to use an orphaned log file.

SQL Server Alert System: 'Severity 016'
DESCRIPTION: Cannot create file 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\CorruptionChallenge3.mdf' because it already exists. Change the file path or the file name, and retry the operation.
SQL Server Alert System: 'Severity 020'
DESCRIPTION: During upgrade, database raised exception 945, severity 14, state 2, address 0000000001385BE1. Use the exception number to determine the cause.

Hopefully my databases continue to reside happily uncorrupted, but I’ve probably jinxed myself with this post and some of those alerts will be firing off soon. Crap.

#SQLNewBlogger
Thanks again Ed Leighton-Dick (T|B)

Advertisements

2 thoughts on “Corruption – The Denmark of SQL Instances

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s