Statistics are used by SQL Server's query optimizer to help determine the most efficient execution plan for a query. When the option to automatically create statistics is enabled (which it is by default) SQL Server will create statistics on columns used in a query's predicate as necessary, which usually means when statistics don't already exist for the column in question.
Statistics are also created on the key columns of an index when the index is created. SQL Server understands the difference between auto created column statistics and index statistics and maintains both - you can see this by querying the sys.stats system view. As I found out firsthand not too long ago having both auto created statistics and index statistics on the same column caused the query optimizer to choose a different - and less than optimal - execution plan than when only the index statistics existed.
According to the MSDN article Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 auto created statistics are automatically dropped over time if they are not used but that can take an undetermined amount of time. What if we're experiencing the kind of problem I previously wrote about? In most of the cases I've seen it makes sense to help SQL Server out by manually dropping the auto created statistics in favor of index statistics that exist for the same column.
Fortunately SQL Server contains everything we need to know to figure out when column statistics are overlapped by index statistics. The following query will identify overlapped\overlapping statistics and generate the statements you can use to drop the overlapped statistics. All the usual warnings apply here - although this has not caused any problems for me your mileage may vary so wield this with an appropriate degree of caution:
WITH autostats(object_id, stats_id, name, column_id) AS ( SELECT sys.stats.object_id , sys.stats.stats_id , sys.stats.name , sys.stats_columns.column_id FROM sys.stats INNER JOIN sys.stats_columns ON sys.stats.object_id = sys.stats_columns.object_id AND sys.stats.stats_id = sys.stats_columns.stats_id WHERE sys.stats.auto_created = 1 AND sys.stats_columns.stats_column_id = 1 ) SELECT OBJECT_NAME(sys.stats.object_id) AS [Table] , sys.columns.name AS [Column] , sys.stats.name AS [Overlapped] , autostats.name AS [Overlapping] , 'DROP STATISTICS [' + OBJECT_SCHEMA_NAME(sys.stats.object_id) + '].[' + OBJECT_NAME(sys.stats.object_id) + '].[' + autostats.name + ']' FROM sys.stats INNER JOIN sys.stats_columns ON sys.stats.object_id = sys.stats_columns.object_id AND sys.stats.stats_id = sys.stats_columns.stats_id INNER JOIN autostats ON sys.stats_columns.object_id = autostats.object_id AND sys.stats_columns.column_id = autostats.column_id INNER JOIN sys.columns ON sys.stats.object_id = sys.columns.object_id AND sys.stats_columns.column_id = sys.columns.column_id WHERE sys.stats.auto_created = 0 AND sys.stats_columns.stats_column_id = 1 AND sys.stats_columns.stats_id != autostats.stats_id AND OBJECTPROPERTY(sys.stats.object_id, 'IsMsShipped') = 0;
For more information on statistics see Using Statistics to Improve Query Performance in SQL Server Books Online. I also recommend watching Introduction to SQL Server Statistics, a presentation my friend Andy Warren (Blog | Twitter) recently gave for the PASS Performance Virtual Chapter which is available in their Presentation Archive.
5 comments
Now this is very interesting and not something I had previously considered I must admit.
Out of interest, is this something that you perform quite often as part of say your regular performance tuning effort, such as looking for un-used indexes and the like?
I found a couple of issues with the script.
1) The code returns duplicate DROP statements sometimes. This happens when the statistics is duplicated more than once. (this is not a big deal).
2) The other rare issue deals with dropping some autogenerated statistics, and SQL Server will autogenete it again thereafter.
This occurs when the order of the columns in the underlying index is not the same as the order of the columns in the related index statistics.
I encountered the problem in SQL Server 2005 EE SP3 (e.g table dbo.ExecutionCache in "ReportServerTempDB" SSRS database, index "IX_ExecutionCache").
Thanks for the useful script, one question...
why do you only check sys.stats_columns.stats_column_id = 1?
Jag
Good script. I'm going to update my stats presentation to reference it. Let me know if you make any changes/improvements!
Great post, much appreciate the time you took to write this.
Post a Comment