Background
Like a lot of other companies, we use imaging software to get new\repurposed machines up and running quickly. We recently switched to a new imaging product and it seemed to work well. We used it on a couple of SQL Servers that made their way into production and shortly after they went live we noticed intermittent errors with clients trying to connect using Windows authentication that looked like this:
Login failed for user ''. The user is not associated with a trusted SQL Server connection.
This usually happened a few times a day for a couple of minutes and then the system self-corrected. We also found that right before the login failures occurred was this entry in the system event log:
Event Type: Error
Event Source: NETLOGON
Event ID: 5719
Description:
This computer was not able to set up a secure session with a domain controller in domain [OUR DOMAIN] due to the following:
Not enough storage is available to process this command.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.
So the NETLOGON service was failing and killing SQL’s ability to authenticate users. While digging through the logs from each image looking for a possible clue one of my coworkers noticed a single line buried deep within that said something to the effect of “SID change failed”.
SID stands for “Security Identifier” and is a unique alpha character string that’s used in Windows and Active Directory as the basis for identifying computers and objects in the operating system (read more about SIDs here). Usually when you apply an image you change the SID to make your server unique, thereby avoiding potential security problems (see Microsoft KB article 162001), only in our case the imaging software failed to do so and as a result we had several machines on the network that had duplicate SIDs.
One Solution…And The Problem It Causes
One solution to the dupe SID problem is to use the NewSID utility from Sysinternals to create a new SID for the affected servers. But here’s the catch: Local accounts, groups, object names, etc. in the OS are based on the machine SID. Change the machine SID and every other SID gets changed too. During installation SQL Server creates a set of local machine groups (e.g. MACHINE\SQLServer2005MSSQLUser$MACHINE$MSSQLSERVER) that are granted rights throughout the OS and in SQL Server so that the services can run. SQL Server stores the logins - and SID - for each of those groups in master.dbo.syslogins
. NewSID wasn’t written with SQL awareness so after using it to change the machine SID (which in turn changes the SID for each local group) you’re going to be left with the old group SIDs in SQL, new group SIDs in the OS, a SQL service that won’t start, and probably a lot of people wondering “WTF when is the server going to be back up?”
A Solution To The Problem Caused By The Solution
I’ve been through the SID change exercise a few times now so I’ll share with you what to do if you need to change the SID on one of your SQL severs.
0. Find out what Service Pack and Cumulative Update your SQL service is running using the statement below. Get a copy of it and keep it handy because you’ll need it later.
SELECT SERVERPROPERTY('productversion') , SERVERPROPERTY('productlevel') , SERVERPROPERTY('edition');
1. Change SQL service and SQL Agent service to run as Local Service (or Local System) and manual startup.
2. Remove server from domain. Reboot.
3. Change SID using NewSID (note: record your old SID before changing it in case you have to roll back!). Reboot.
4. Your SQL service is now broken by the SID change. Here’s where it gets fun. The SQL service will start but the SQL Agent service won’t and a couple of other components like DB Mail will throw permissions errors that show up in the SQL error log. To get things back on track resynchronize what’s in master.dbo.syslogins
with the new local group SIDs by dropping and recreating the SQL logins (and associated permissions) for the groups. First, using Object Explorer script out the drop and create statements for each local group login. Then use this script I posted previously to script the statements which grant all the necessary server level permissions to each group. Cut and paste the statements that apply to the local groups into your working script. What you’re going to execute for each group will look something like this:
DROP LOGIN [MACHINE\SQLServer2005MSSQLUser$MACHINE$MSSQLSERVER]; GO CREATE LOGIN [MACHINE\SQLServer2005MSSQLUser$MACHINE$MSSQLSERVER] FROM WINDOWS WITH DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english]; GO EXEC sp_addsrvrolemember @rolename = 'sysadmin', @loginame = 'SQL2300\SQLServer2005MSSQLUser$MACHINE$MSSQLSERVER'; GO GRANT CONNECT SQL TO [MACHINE\SQLServer2005MSSQLUser$MACHINE$MSSQLSERVER]; GO
Execute your script and restart the SQL and SQL Agent services to get them back up and running.
5. Join the server back into the domain. Reboot.
6. When you use Configuration Manager to try and change your services to run under a domain account again you’ll get this WMI error:
To fix the WMI error simply reapply a Service Pack\Cumulative Update that is the same or newer than the SQL service is already running at. But…when you try to do the install it will fail. Dig through the install log in C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\LOG\Hotfix and you’ll find this error:
MSP Error: 29528 The setup has encountered an unexpected error while Setting Internal Properties. The error is: Fatal error during installation.
Basically the problem is that old SIDs are stored in the registry and need to be deleted. Check out Microsoft KB article 925976 for details on what to do. Once you’ve deleted the registry entries you can proceed with reinstalling the current SP\CU. Reboot. Now you should be able to use Configuration Manager to change your services to run under a different account.
At this point everything should be running again; the problems caused by changing the SID are resolved and the login errors from the dupe SID should also go away. *PHEW*!
Disclaimer
Although I was able to reproduce and fix the problem several times your installation might have some nuances that mine doesn’t, so consider these steps as more of a guide than a hard and fast rule. Although I used NewSID there are other SID changing utilities out there that might create the same problem. As always, test this out in a non-production environment first!
8 comments
Only a problem in SQL 2008.
SQL 2005 is okay with newsids.
Thanks for the post. We were able to fix a cloned SQL server using this information.
I appreciate the info here. We are sturggling with a server with this same issue. And it is indeed running SQL server 2005.
Same anonymous guy as above. I just want to point out two things different about our install. In our case, we used NewSid to change the value of Sids in an existing VM (using Vmware), so we weren't creating from an image so much as changing sids on a machine image that had SQL server manually installed on it after being initially deployed.
1) We didn't have to drop and re-create the local accounts. Just a heads-up.
2) For the MS link about fixing the registry values, on our VMware-based machines, renaming the sid values for each service's entry didn't help. We had to outright delete the values. When doing this, we found out that the 'ASGroup' key was located under another instance group (MSSQL.4). We had to delete it there.
Thanks for the details on problem and solution.
We run sql server in vms and experience problems with Full-Text Search (SQL server failed to communicate with Full-Text Service) after newsid'ing a VM.
I wonder why sql server stores sids in db tables. IMO, that's OS level detail and sql should be storing domain/user details.
In our case, we are going to remove sql server from our base image. Then clone, newsid, and install sql server from scratch. It seems like it will take about the same amount of time but may be less prone to subtle mistakes or missed steps.
or more importantly,
http://blogs.technet.com/b/markrussinovich/archive/2009/11/03/3291024.aspx
Hello Anonymous,
This was originally posted over 2 years ago in 2009 and was relevant at the time. As you've pointed out though, things have changed since then...but in case someone still has NewSID hanging around and decides to use it, I'll leave this post as-is for their benefit.
Hi, Kendal!
The times were changed maybe, but today I've struggled SQL 2005/2008 R2 Upgrade, so, the things are the same. And you are right! This workaround works fine, if you place your mind inside it.
Thank you very much!
Wlad
Post a Comment