Terrific Trifecta of T-SQL on Tuesday

TSQLTuesday
Much Monitoring in the Month of May

 

Sometimes tooling can be easy, helpful and cheap. You usually only get to pick 2 legs of that triangle. So instead of a triangle you get a less than sign, as in less than awesome. This quick bit of code gets all legs of the Triforce.

I'm better at T-SQL than I ever was at Legend of Zelda
I’m better at T-SQL than I ever was at Legend of Zelda

First leg – Easy.
One create table script and one Agent job and you are ready to go with this. The table we will store the results in is a mirror of sp_Who2, plus an insertdate column with the default of GETDATE().

CREATE TABLE [dbo].[Who2](
[spid] [int] NULL,
[status] [varchar](50) NULL,
[loginname] [varchar](100) NULL,
[hostname] [varchar](100) NULL,
[blkBy] [varchar](50) NULL,
[dbname] [varchar](50) NULL,
[cmd] [varchar](200) NULL,
[CPUTime] [int] NULL,
[DiskIO] [int] NULL,
[LastBatch] [varchar](50) NULL,
[ProgramName] [varchar](200) NULL,
[spid2] [int] NULL,
[request_id] [int] NULL,
[insertdate] [datetime] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[Who2] ADD CONSTRAINT [DF_Who2_insertdate] DEFAULT (getdate()) FOR [insertdate]
GO

Pretty simple and easy there. Set this up in a utility/admin database and you’re off to the next step, the Agent job. The INSERT script below puts the results of sp_Who2 into your new table. All you need to do is determine the timing. I have it setup for every 10 minutes.

INSERT INTO Who2(spid,[Status],[LoginName],hostname,blkBy,dbname,[cmd],CPUTime,DiskIO,LastBatch,Programname,spid2,request_id) EXEC sp_who2

Second leg – Helpful
Not all homegrown or off the shelf apps have ways of measuring usage. This is how this came about for me. One application had a hard limit on number of concurrent users and we wanted warnings when we were close and be able to look at it over time.

DECLARE @maxLicensedUsers AS INT = 10 /*What's your app limit? I hope it is more than 10, but who am I to judge.*/
DECLARE @currentLicensedUsers AS INT
DECLARE @emailMessage AS VARCHAR(100)

SET @currentLicensedUsers = (
SELECT COUNT(*)
FROM Who2
WHERE ProgramName IN (”) /*Add your app name(s) here*/
AND DBName = ” /*Add your db name here*/
AND insertdate = (SELECT MAX(insertdate) FROM Who2)
)

IF @currentLicensedUsers >= @maxLicensedUsers
BEGIN
SET @emailMessage = ‘Your app is super popular and and has ‘ + CAST(@currentLicensedUsers AS VARCHAR(4)) + ‘ sessions.’
EXEC msdb.dbo.sp_send_dbmail @profile_name=”, /*Go set up a db mail profile if you don’t already have that*/
@recipients=”, /*Someone important should get these emails. DBAs are important, maybe one of those people.*/
@subject=’Super popular app license warning’,
@body= @emailMessage
END

This app also has an issue with users opening up duplicate sessions. That doesn’t help with the hard limit on concurrent users. An alert on duplicate sessions is also easy, helpful and cheap.

Working for a global agency means you never sleep or maybe that is your app never sleeps. Logging this data regularly helps point out when the application is most used. This query gives me average and max session counts by hour for the last month. This helps plan maintenance windows, which are generally scheduled for Friday evenings. Ah, the exciting life of a DBA.


SELECT TOP 100 PERCENT hr
, ROUND(AVG(SessionCount),0) as S_AveCount
, MAX(SessionCount) AS S_MaxCount
FROM (
SELECT
DATEPART(YEAR,insertdate) AS yr,DATEPART(MONTH,insertdate) AS mn,DATEPART(DAY,insertdate) AS dy,DATEPART(HH,insertdate) as hr,
DATEPART(minute,insertdate) as 'minute'
, COUNT(DISTINCT loginname) as loginCount
, COUNT(DISTINCT loginname + ' ' + ProgramName) as programCount
, COUNT(spid) as SessionCount
FROM dbo.Who2
WHERE dbname = ''
and ProgramName IN ('')
AND insertdate > DATEADD(M,-1,GETDATE()) --Get data for previous month
AND DATEPART(WEEKDAY,insertdate) <> 7 --Saturday is a low activity day in most countries so we're filtering that out
GROUP BY DATEPART(YEAR,insertdate),DATEPART(MONTH,insertdate),DATEPART(DAY,insertdate),DATEPART(HH,insertdate), DATEPART (minute,insertdate)) AS a
GROUP BY hr
ORDER BY hr

Third leg – Cheap
This is super cheap. You found some code on the internets, you have a SQL Server (presumably) and there isn’t anything to license. I’m sure you could find a way to send me money if you really wanted to, but there are better things to do with your cash. Like helping out in Nepal

Perfection is not one of the legs
This solution hits easy, helpful and cheap for me, but it is not perfect. It is a snapshot of activity every 10 minutes. It is not real time or complete monitoring. Are there users logging in and out between the 10 minute samples? Could be, but I’m not capturing that. This works as long as I and the business owners are clear about what is being measured and what is possibly being missed. Will this satisfy auditors who need to verify system access? I doubt it, but you would need to ask an auditor about that.

Red Gate would get my money, if I had any

This is how software companies should respond when a potential customer finds a problem with their product. I recently tested out Red Gate’s SQL Prompt and found some problems with it in my environment. The TL;DR is that it wasn’t auto-completing and I was getting timeout errors when trying to use it with a 161 thousand table and 3.3 million column database. After my testing and uninstalling the demo, I left my comments and the URL to my post detaling my experience in the uninstall survey. Not everyone likes filling out surveys, but they can provide valuable feedback and possibly effect change. Not long after I submitted I received this email.

Hi Tim,

I’m one of the developers of Prompt and I just saw your uninstall feedback about running out of memory with a huge database.

Unfortunately at the moment I don’t have any solutions for you (a work around is to exclude the database from being loaded which will stop the exception but won’t get you suggestions). We do have plans in the future for solving this by optimising how and where we store our model of the database.

I can let you know when we have a new build with those optimisations in place if you’d be willing to try it?

Thanks,
Aaron.

They are listening and working to improve their product. Yeah, I’ll take some time to test out a new version. Will I end up buying it? I’m not so sure about that. Budgets can be tight in any organization, but working at a non-profit has caused me to scrutenize every purchase extra carefully. I’ve worked without SQL Prompt for a long time and can continue to do my job without. Would I like to have it, (assuming a new version that works for me) of course. Will I be able to justify the purchase and ROI? Hopefully.

Speaking of working for a non-profit. Mercy Corps is doing great work to help survivors in Nepal, please consider a donation.

20150505_MC

Going for another month of #SQLNewBlogger

Corruption – The Denmark of SQL Instances

20150428-countrycorruption
Red = corruption, like your DBCC results

Those who follow government corruption indexes are used to seeing Denmark at the top. The places I work with these days, not so much. Somalia, I’m looking at you buddy. Along with being one of the happiest countries Denmark is also the least corrupt. In my decade and a half as a DBA and developer all of my SQL instances have been Danish. Literally as well as figuratively. Yes, some of them were actually on servers in Denmark, but I’m talking in terms of corruption. I’ve never had a case of corruption, except for purposely caused training examples. Have I been lucky? Yes. Will I hit corruption someday? Maybe. I’m I worried about it and protect against it? Oh hell yes.

20150428-sb
Backups!

First off, backups, or as Steve Ballmer might say, while sweating profusely, “Backups! Backups! Backups! Backups! Backups! Backups!….” With out good backups, corruption will make you hate being a DBA. So, now that you have your backups (fulls and diffs and transaction logs, oh my) taken care of how do you spot corruption? Alerts. What makes my instances, hardware and applications so amazballs that they have incorruptibility of a saint? I have no idea, but I 20150428-dbcccheck, recheck and alert the hell out of them. First step is a nightly DBCC CHECKDB. This of course is a standard across every SQL instance ever setup. Nobody has ever run a SQL instance without implementing this. Right? Good.

 

Next are my alerts. The alert CREATE script below assumes an Operator named DBA and Database Mail are setup.

USE [msdb]
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 016',
@message_id=0,
@severity=16,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 016', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 017',
@message_id=0,
@severity=17,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 017', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 018',
@message_id=0,
@severity=18,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 018', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 019',
@message_id=0,
@severity=19,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 019', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 020',
@message_id=0,
@severity=20,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 020', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 021',
@message_id=0,
@severity=21,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 021', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 022',
@message_id=0,
@severity=22,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 022', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 023',
@message_id=0,
@severity=23,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 023', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 024',
@message_id=0,
@severity=24,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 024', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Severity 025',
@message_id=0,
@severity=25,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000';
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Severity 025', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Error Number 823',
@message_id=823,
@severity=0,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Error Number 823', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Error Number 824',
@message_id=824,
@severity=0,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Error Number 824', @operator_name=N'DBA', @notification_method = 1;
GO
EXEC msdb.dbo.sp_add_alert @name=N'Error Number 825',
@message_id=825,
@severity=0,
@enabled=1,
@delay_between_responses=60,
@include_event_description_in=1,
@job_id=N'00000000-0000-0000-0000-000000000000'
GO
EXEC msdb.dbo.sp_add_notification @alert_name=N'Error Number 825', @operator_name=N'DBA', @notification_method = 1;
GO

Errors 823, 824 and 835 are caused by errors with Windows API (ReadFile, WriteFile, ReadFileScatter, WriteFileGather) performing I/O operations. These may be triggered by hardware failure and could indicate a need to increase your hardware budget.

Severity 16 through 25 cover a variety of issues, including corruption. In terms of corruption lets look at 22, 23 and 24. This is a terse version of the MSDN entries.
22: Indicates that the table or index specified in the message has been damaged by a software or hardware problem.
23: Indicates that the integrity of the entire database is in question because of a hardware or software problem.
24: Indicates a media failure. The system administrator may have to restore the database. You may also have to call your hardware vendor.

These errors rarely occur, but when/if they do run DBCC CHECKDB to find what objects are damaged.  There are several methods of getting your database back to a consistent state that range from easy and good (restoring from backups) to less than optimal (DBCC CHECKDB REPAIR_ALLOW_DATA_LOSS) to methods that border on black magic (DBCC WRITEPAGE). All of these require practice.

Most people learn best by doing. Reading a blog won’t prevent or fix corruption, but practicing on a corrupted database will give you the skillz so when your time comes you can be the DBA in shinning armor saving the day, rescuing the end users and protecting the organization with trumpets blaring and confetti flying. Or it could just mean keeping your job. Steve Stedman has launched a good series of corruption challenges. He corrupts the database, gives you some details and tells you to fix it. Kinda like your boss (telling you to fix it), but more helpful (giving you details). It’s a timed challenge with a scoreboard for those who like to keep track.

These challenges give you a chance to practice fixing corruption as well as testing your alerts. With week #2 challenge restored you can run a query against the dbo.Result table and have this niffty message appear via email or text.

Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: invalid protection option. It occurred during a read of page (1:244) in database ID 13 at offset 0x000000001e8000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\CorruptionChallenge2.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Challenge #3 set off alerts with severity 16 and 20 when trying to use an orphaned log file.

SQL Server Alert System: 'Severity 016'
DESCRIPTION: Cannot create file 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\CorruptionChallenge3.mdf' because it already exists. Change the file path or the file name, and retry the operation.
SQL Server Alert System: 'Severity 020'
DESCRIPTION: During upgrade, database raised exception 945, severity 14, state 2, address 0000000001385BE1. Use the exception number to determine the cause.

Hopefully my databases continue to reside happily uncorrupted, but I’ve probably jinxed myself with this post and some of those alerts will be firing off soon. Crap.

#SQLNewBlogger
Thanks again Ed Leighton-Dick (T|B)

Sir-Auto-Completes-A-Lot a.k.a. how to break IntelliSense, SQL Prompt and SQL Complete

I like big schemas and can not lie
You other DBAs can’t deny
That when studio opens up with a funky ass database
and a million objects in your face
It gets hung, your coding gets rough
’cause you notice your base was stuffed
Deep in the schema it’s searchin
I’m stuck and IntelliSense is lurchin
Oh baby, I wanna finish this query
But this lag is makin’ me weary
My co-workers tried to warn me
Cause this SQL rhyme got way to corny

So coders (Yeah!) Coders (Yeah!)
Can you code better than Hodor (Hell yeah!)
Then fire it up. Type it out.
Even SAN admins got to shout
DBA got back…….
to typing out queries because auto complete tools aren’t really working out
.

I have a love/hate relationship with some of my databases. I love hating them as they are great fodder for testing/breaking tools/procedures, blog posts and most importantly Sir Mix-A-Lot parodies. I’ve put up with IntelliSense stalling and even locking up SSMS when I’m writing a query against one of the 162 thousand tables containing 3.3 million columns. I know how to turn it off, but I don’t have the human RAM to hold the names of everything.

20150421-TandC
This wouldn’t fit in a SMALLINT, but maybe it was really important to save those 2 bytes and not go INT.

IntelliSense is far from perfect for me here so I tried out a few other options, Red Gate’s SQL Prompt and ApexSQL Complete. Both struggled in my setup and both had some very cool features. This is a hot take and not a review/recommendation, your mileage may vary. I haven’t reached out to support yet for either product or fully explorer all of their features.

SQL Prompt
Downloading a trial version and installing was a snap. It shows up as a tool bar and menu item. When you connect to a database SQL Prompt loads objects or at least it tries to. 20150421-LoadingThis is where I hit my first problem. The loading worked great on database with a small number of objects, but once I switched to a large db I ran into the following 2 errors, OutOfMemoryException and timeout. The timeout period is around 1 minute and I haven’t found a way to adjust this.

20150421-PromptErrors

The prompting post errors were limited to the database level and it appeared that none of the tables had been loaded. I know the names of my databases and SSMS already has a cute little drop down to show this. I don’t think I need help at this level.

20150421-Tables
I need help on a deeper level

Promptless Prompt aside I like the Tab History, formatting and expanding wildcards. Expanding out my SELECT * when the table has over 200 columns is pretty great. Especially when those columns often use spaces or odd characters. I kid you not, I have a column that has both and & and ? IN THE NAME!!!

ApexSQL Complete                                                                                                                Complete had the same issues with loading a large number of objects. SSMS returned a busy message and objects weren’t loaded. Once I started to write out my query Complete threw the error, “An error occurred while executing batch. Error message is: Exception of type ‘System.OutOfMemoryException’ was thrown.”

20150421-Apex
Why load table names when you can load random code snippets

I uninstalled Apex without getting to deep into its features, but I did like Auto aliasing. On a small db Apex would automatically tack on an alias to your table. I once had a coworker who would alias their tables in alphabetical order. The first table would be “a”, the second “b” and so on a so forth. Apex would have been a vast improvement there.

#SQLNewBlogger
Thanks again Ed Leighton-Dick (T|B)

Why yes I can do table level restores

Table level restores are available in SQL Server 2008 R2 Standard…under the right circumstances.

20150414-Circumstances
Are you under these circumstances?

For better or for worse you probably aren’t under those circumstances so sorry or good for you, take your pick. MS Dynamics NAV running on SQL Server has the feature to do a restore of a subset of tables without having to restore the entire database. This is especially great when your database has 160 thousand plus tables and you only need to restore some of them. My implementation of Dynamics NAV has the the database split up in to companies and each company has 1,600+ tables. End users come to me with requsts to recover data in a company or refresh a testing environment with the lasted information from production fairly often. I could restore production to a new location and script out the 1,600 tables, but frankly that sounds like a pain. Luckly for me Dynamics NAV has the option to backup and restore by company. I can delete and restore those 1,600+ tables with a few mouse clicks and a fraction of the time a full restore would take.

While connected to your backup target with Classic Client, select Tools > Backup. Select the Company you want to backup.

20150414-CompanyDelete
Are you absolutely sure? Cause it is going to ask you.

20150414-CompanyBackup
Connect to your restore target, select File > Company > Delete. Click Yes an appropriate number of times when you are really, really sure this is in fact the company you want to delete.

Are you absolutely sure? Cause it is going to ask you.
Are you absolutely sure? Cause it is going to ask you.

Click again on Tools > Restore. Browse to your backup and so on and so forth. Much faster than the full restore.

My second method of table level restores takes a little more of a custom setup. Sometimes when you are in a rush to get an application into production the application periodically and undesirably deletes all of the data in a table. You’ve all had this problem, right? I didn’t feel like restoring the whole database just to get back a few hundred rows. I also didn’t have the budget/authority/etc to get a sweet 3rd party backup solution. So here comes the never been seen before combination of an INSERT script and SQL Agent.

20150414-BandT2
Bill, “an INSERT script?” Ted, “and SQL Agent?” B&T, “Whoa”

In a utility database create a table with all the fields for your at risk table, plus in my case a company identifier, and a datestamp column with a GETDATE() default. This is where right clicking in SSMS on the table and selecting Script Table as > CREATE To comes in really handy. Name it something catchy like dbo.WhyInTheHellAmIConstantlyLossingDataFromThisStupidTable or if you’re not that creative just use a similar name as your problem table(s).


INSERT INTO [SQLAdministration].[dbo].[WhyInTheHellAmIConstantlyLossingDataFromThisStupidTable]
(company or other customer field
, all other fields from your problem table)
SELECT 'Company$ProblemTable' AS Company
, all of the fields in this table
FROM [ProductionDatabase].[dbo].[Company$ProblemTable]

Then set up a job to load the full contents of that table from production every X time units. Mine is running every 10 minutes. Don’t forget to include a trim step in your job. You don’t want these rows piling up forever. Or maybe you do. I don’t know. Maybe you like to impress others with the size of your tables.

Now when users coming running to you like their data is on fire (remember deleted data is not the same as burning data, trust me on this one) you can smoothly tell them that you can get their data back and save the day like the DBA superhero you are. That cape is a good look for you. Tights? Maybe, maybe not.

#SQLNewBlogger

Thanks again Ed Leighton-Dick (T|B)

Maintenance on a few…hundred thousand…indexes

Some run big data demos on laptops. Some technologists say big data is data sets that are so large or complex that traditional methods and practices are no longer adequate. My databases fall very short of big, but when I start looking at the number of objects I venture into well above average and maybe even bigish. I don’t have published metrics to back this up, but in my decade plus time with SQL Server I haven’t seen anything like what I deal with these days. I support a MS Dynamics Nav database with a mind boggling number of objects. Maybe my mind boggles a bit easier than others, but when I see 162,525 tables and 371,629 indexes my brain spins a bit. The copious number of objects comes from a setup where each company in the financial system has its own set of tables, nearly 1,600 tables per company. Each country we work in and each legal entity has its own company and more than 80 of those later (and growing…) BAM we’re in the hundreds of thousands of objects.

So what do you do with hundreds of thousands of indexes? DEFRAG THEM! Of course. Indexes are bound to be fragmented because data must be added and deleted and modified in those tables. Right? What’s the point of having 162 thousand tables if you aren’t constantly filling them with mission critical data. Time to throw a standard maintenance plan rebuild task at it. What could possibly go wrong with trying to rebuild 371 thousand indexes on a regular basis?

I don’t know, I’ve never tried that.

I skipped ahead and ran my own script against it which loaded the results of sys.dm_db_index_physical_stats into a temp table, declared a cursor (yikes!) and reorganized when avg_fragmentation_in_percent was between 10 and 30 and rebuilt when it was over 30. It ran OK, it was doing its job and it was running for around 45 minutes on a weekly basis. That was working and nobody was complaining, at least not about that part. Why mess with things?

Because. Reasons.

Ola
My first go around with Ola Hallengren’s scripts did not go so well, but that was a while ago and I failed to take notes. I’m sure something very pressing was going on that prevented me. Yes, very pressing. This guy doesn’t get praise and accolades out the wazoo (or whatever the Swedish word for wazoo is) if he isn’t doing amazingly awesome things. I must have not tried hard enough. MOAR OLA! I gave Ola’s IndexOptimize script another shot. I’ve written a few things about my implementation of Ola’s scripts, but the TL;DR is that they are working. In the beginning this script worked on 1,600 indexes, a few less than the 371 thousand available. Now it floats between 400 and 800, usually taking less than 10 minutes. It’s not perfect yet, there are times when the job run over an hour and I am ignoring an elephant in the room.

Minion
With IndexOptimize running in production I can look into trying new things in development. Minion Reindex by Midnight SQL was introduced as an easy to install, easy to run and very customizable index maintenance tool kit. set up is a snap and there are many ways to tweek and control how it runs. I wanted to see how this ran out of the box. Out of the .zip is more accurate, but doesn’t have the the same consumerist snap to it.

[Minion].[IndexMaintMaster]
@IndexOption = 'All', -- Valid options: All, ONLINE, OFFLINE
@ReorgMode = 'REORG', -- Valid options: All, Reorg, Rebuild
@RunPrepped = 0, -- Valid options: 1, 0. If you've collected index frag stats ahead of time by running with @PrepOnly = 1, then you can use this option.
@PrepOnly = 0, -- Valid options: 1, 0. Allows you to take frag stats earlier in the day and save them for use at a later time.
@StmtOnly = 0, -- Print the reindex stmts w/o modifying the SP. This will NOT run any reindex stmts, only PRINT them.
@Include = NULL, -- Only do DBs listed here. Commas are used. ex: @Include = 'master, model, msdb'
@Exclude = NULL, -- Do all DBs except the ones listed here. Commas are used. ex: @Exclude = 'master, model, msdb'
@LogProgress = 0 -- Allows Live Insight through Log tables.

Minion has a few indexes to chew through. This is running on the dev box which has a slightly larger database than production.

20150403_Minion_WhoIsActive

sp_whoisactive is showing a run time of over 2 days, and 36 billion reads. Looking at the Status field of the Minion.IndexMaintLogDetails table at the same time shows me that it is almost complete, 383,484 of 386,105.

20150403_Minion_SQLAgent

Minion has run a few times with varied success. Two runs completed and lasted nearly 2 and 1/2 days each, then a failure on the 3rd time. I haven’t dug in to figure out what Parameter 3 is, but apparently it was incorrect. I can appreciate the extra level of detail and control here, but I appreciate even more a job that finishes successfully and in less time.

What else do you with indexes besides maintenance? Make them better! How? With sp_BlitzIndex of course. It takes a while to run, but it gave me nearly 300 thousand easy to implement suggestions on how I can improve the indexes I have to deal with.

Finding Count
Feature-Phobic Indexes: No indexes use includes 1
Feature-Phobic Indexes: Potential filtered index (based on column name) 2,544
Index Hoarder: Addicted to nulls 1,272
Index Hoarder: Addicted to strings 12,088
Index Hoarder: Borderline: Wide indexes (7 or more columns) 37,948
Index Hoarder: Many NC indexes on a single table 6,996
Index Hoarder: More than 5 percent NC indexes are unused 1
Index Hoarder: Unused NC index 1,573
Index Hoarder: Wide clustered index (> 3 columns OR > 16 bytes) 129,348
Index Hoarder: Wide tables: 35+ cols or > 2000 non-LOB bytes 23,957
Multiple Index Personalities: Borderline duplicate keys 76,852
Self Loathing Indexes: Active heap 1
Workaholics: Scan-a-lots (index_usage_stats) 5
Workaholics: Top recent accesses (index_op_stats) 5
Total 292,591

Unfortunately this is a 3rd party database and I can’t make any schema changes. Maybe that’s not so unfortunate, I’m not sure I’d want to be held responsible for the creation of this monstrosity.

 

#SQLNewBlogger
Thanks to Ed Leighton-Dick and others for getting this going.

DTS_E_PRIMEOUTPUTFAILED

This is a tricky and elusive error that is showing up now and then in a SSIS job. The full text of the error is, “Error: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on ODBC Source returned error code 0x80004005. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.”

A colleague pointed out this article, that came up thanks to the Google overlords. While it was wrote for Server 2003 it lead me down the rabbit hole where I found a few posts indicating TOE/Chimney can be a problem with Server 2008.

That lead me to this page coving TCP Chimney offload. The feature attempts to offload TCP connections to the network adapter and, according to at least one MS guy, can be a bit dodgy. I’m looking to see if that offloading is happening during the SSIS job. I’m using netstat, run from an elevated command prompt, to write the state of the active connections to a file every 60 seconds

netstat -t 60 > C:\temp\netstat.txt

Active Connections
Proto Local Address Foreign Address State Offload State
TCP 10.10.1.37:135 10.10.26.168:54637 ESTABLISHED InHost

Here’s hoping that something interesting is logged tonight that will help put a stop to DTS_E_PRIMEOUTPUTFAILED.

Update
It’s not a TCP Chimney offload problem which is too bad for a few reasons. One, it’s a new problem to me and I like to solve new problems rather than figuring out I made the same mistake again. Second and more important, the problem is still out there.

The logging for netstat -t showed no offloading to the NIC. What would have been better and faster would have been to run netsh int tcp show chimneystats

20150324-netsh int tcp show chimneystats

This not only shows me that nothing was offloaded, PMAX = 0, but offloading isn’t even supported, Supp = No. That would have saved me a bit of time if I had used this first. Didn’t solve anything, but I did learn something so it’s not a total waste.

Time to catch up

I’m not quite a decade behind, but I’m most certainly late to the PowerShell game. The latest reminder came via the daily SQLServerCentral email. They were re-running the Stairway to SQL PowerShell Level 1. This is something I’ve been meaning to get to for a while, had a bit of time and an idea for a project. Having a project always makes it easier for me to learn a new technology. I need to put ideas into practice for them to stick the best. Here’s my situation. I have several servers running Dynamics NAV and MSSQL. There are a number of services and they need to be stopped and started in a certain order. For a while I’ve been doing this manually as needed. It’s not a hard job, but why not make it easier?

I went through some of the introduction to PowerShell on the SSC site and feeling pretty good about it I jumped into looking a services on a dev server. I have a new server that’s all mine (for the time being) and nobody will complain when services stop without warning. I like having a playground all to my self, though it’s kind of lonely on the teeter totter.

All by myself
All by myself

To look into the services and start them I used the following. The middle line, Get-Member -Type Method, returns the options for the service.


$service = Get-WmiObject -ComputerName xxxxxx -Class Win32_Service -Filter "Name='MicrosoftDynamicsNavServer'"
$service | Get-Member -Type Method
$service.stopservice()

After running the .stopservice I double checked on the server to ensure the ReturnValue of 0 really meant the service was stopped and it was. I changed the -Filter to SQLSERVERAGENT, ran $service.stopservice(), saw the 0 and checked on the server. Boom stopped. Lastly I changed the -Filter to MSSQLSERVER, ran $service.stopservice(), saw the 0 and checked on the server. Yet another stopped service. This is quite a successful day. I usually don’t match up success with stopped services, but that’s the metric I was using.

This is the success you are looking for.
This is the success you are looking for.

Getting the services running again was as easy as going switching to .startservice and running in the reverse order.


$service = Get-WmiObject -ComputerName xxxxxx -Class Win32_Service -Filter "Name='MSSQLSERVER'"
$service.startservice()

This dev server is a bit stripped down and doesn’t have all the services that some other boxes (or however you want to refer to a VM) do. There are more services and typing this all out each time wouldn’t be any faster than remoting in and clicking. Making it faster wasn’t the metric, for today, it was learning something. Making it faster will be a metric for another day and I’ll need to automate this.

Updating Statistics

Now that IndexOptimize is taking care of my indexes, what about the statistics? Is relying on ALTER INDEX and Auto Update Statistics enought? The CSS SQL Server Engineers recently pointed out that some ALTER INDEX statements update some statistics. They’ve put this lovely table together.

20150319-AlterIndexStatistics

The last run of IndexOptimize worked on 653 indexes, using REBUILD on 100 and REORGANIZATION on the remaining 553. Those 553 did not have their stats updated. So only 100 of the 371k indexes had statistics updated with the last maintenance. Is Auto Update Statistics taking care of the remaining? This query uses STATS_DATE and sys.sysindexes to find statistics that haven’t been updated in over a day, i.e. older than my last IndexOptimize job.


SELECT OBJECT_NAME(id),name,STATS_DATE(id, indid),rowmodctr
FROM sys.sysindexes
WHERE STATS_DATE(id, indid)0
AND id IN (SELECT object_id FROM sys.tables)

(15495 row(s) affected)

There is another DMV, sys.dm_db_stats_properties, that can be used here, but the instance I’m working on is not current enough, post 2008 R2 SP2. Boo.

Looking at the 15,495 results, 74% were last updated on Saturday, 5 days ago, by the last remaining task in the old maintenance job.

20150319-UpdateStats

What then is happening with the other hundreds of thousands of indexes? I’ve modified the above query a bit to remove the day old criteria, but even with that it’s only looking at 17 thousand indexes. That’s because of the rowmodctr clause. rowmodctr counts the total number of inserted, deleted, or updated rows since the last time statistics were updated for the table. rowmodctr has some remarkable features, hence the list in the Remarks section. While 15 thousand indexes are in use and important, the vast majority of the 161 thousand tables and 371 thousand indexes in this database are not updated regulary or even being used at all. Hooray for efficient design! I knew some of this data going into this project, but these exercises has given be a much better understanding of the scale.

Procedure timing

IndexOptimize has been running in production in its current state for about a week now. Time to look at some results and use the following against CommandLog


SELECT DATEPART(YYYY,StartTime) AS 'Year'
,DATEPART(MM,StartTime) AS 'Month'
,DATEPART(d,StartTime) AS 'Day'
,DATENAME(dw,StartTime) AS 'Day of Week'
,COUNT(ID) AS 'Index Count'
,DATEDIFF(mi,MIN(StartTime),MAX(EndTime)) AS 'Job Time - minutes'
,DATEDIFF(ss,MIN(StartTime),MAX(EndTime)) AS 'Job Time - seconds'
FROM CommandLog
GROUP BY DATEPART(YYYY,StartTime)
,DATEPART(MM,StartTime)
,DATEPART(d,StartTime)
,DATENAME(dw,StartTime)
ORDER BY DATEPART(YYYY,StartTime)
,DATEPART(MM,StartTime)
,DATEPART(d,StartTime)

20150318-RunTime

First off, what in the name of all that is normalized is going on with Friday? That will need some investigation. Next, run time (outside of Friday) is looking really good. IndexOptimize is powering through several hundred indexes in a few minutes. I’m pretty happy with these results. Looking back on my old indexing job, it was regularly running 30 to 45 minutes. I don’t know how many indexes it was working on as I didn’t have the detailed logging that IndexOptimize does.

The problem table is still sitting there, waiting, lurking and possibly fragmenting. It is being skipped by IndexOptimze for the time being. I’m still keeping an eye on it and logging stats with sys.dm_db_index_physical_stats on a daily basis. Well, almost daily basis. Turns out there can be quite a few decimal places in the avg_fragmentation_in_percent column. I didn’t set that right the first time, but switching over to FLOAT got it back up and running. Yes FLOAT is an approximate data type, but I’m not concerned about accuracy to 7 or even 15 digits out. The fragmentation of the indexes on the problem table has increased a bit, but not too much. There is some work on Fridays that happens on this table that I have a feeling may change the index stats. This Friday work also has something to do with the IndexOptimize run times. This is a good opportunity to better understand the workload on this database.