Virtualizing Your Domain Controllers without getting fired!

Please pretty please do not just hit the button and P2V/ColdClone/HotClone/Copy your Windows Server Domain Controllers, regardless if they run Windows Server 2000/2003/2008 etc.

In best case You accomplish to virtualize your domain controllers, wich You could have done with a few simple steps just as easily with out any danger.

In worst case You render your Domain Controllers useless, create several other problems and hickups in your infrastructure, not limited to complete production halt and at least several hours of pain and horror trying to get everything back and running!

Personally I have nothing against virtual Domain Controllers, usually best practice is not to run all kinds of other software or services on a Domain Controller, plus the need to have multiple Domain Controllers for redundancy will quickly add alot of boxes doing very little. Virtualizing some or all of these Domain Controllers, will put better use of ressources and still keep the box seperate from other services. Dont forget to change time synchronisation settings in the w32time service, vmware tools and ntp servers in the ESX’s, but thats another story.

One of the big problems with doing a clone of a Domain Controller, is that if you get problems, you will not notice them untill it is too late. The domain controller will seem to function and work with clients, but it will actually have stopped replicating with all other domain controllers, because it has detected that it has been copied. The result is an inconsistent domain with client records not being updated, they will slowly stop working depending on what domain controller they get in contact with, untill everything goes dead. If you have then virtualized ALL domain controllers, You will be left with 1-3 months of changes going down the tube together with your damaged Domain Controllers. Dont forget to take a full backup of at least 1 Domain Controller before starting your cloning!

So what happens when things go bad?

  • First of You might get problems, but no event log entries – aarrggh try and detect that!
  • If a Domain Controller replicates data after being cloned, it will acknowledge what information it has replicated to the other Domain Controllers. In effect they know what the cloned Domain Controller knows. If the Cloned machine is then turned on, with older information, the other Domain Controllers will refuse to give it the information – after all they know it has allready gotten it! This will create a missing gap of information potentially creating big problems. It is usually refered to as USN Rollback and is a common symptom of a Hot Clone or a Domain Controller that was cloned but the original got Turned On after the cloning. More info here http://support.microsoft.com/kb/875495/
  • If a Domain Controller detects disk signature changes, it will put it self in isolation and refuse replication. Basicly it has detected it has been copied and to avoid replicating wrong information to others it isolates it self. It still keeps on running and serving users, but since it can not replicate, it does not replicate important information like password changes, machine information, etc.
  • Microsoft does not support cloning of Domain Controllers – your on your own!
  • VMware does not support cloning of Domain Controllers – your still on your own!

VMware have more pain and death information about cloning an existing domain controller here http://kb.vmware.com/kb/1006996

How do we avoid all this pain and death? Here is a couple of ways You can safely virtualize your Domain Controllers.

  • My prefered way. If it is just a Domain Controller (It should be), why not just create a new virtual server from scratch and DCPROMO the server up to a new Domain Controller, and DCPROMO down your old server and decommission it? Safest, easiest way of doing things. (dont forget to move FSMO & GC Roles)
  • Now imagine You have a server full of other services as well, and for some reason You feel it is just not worth it doing one from scratch (Yes you can copy DHCP databases, shares, DNS, etc. from one server to another!), well then do this – Make sure You have another Domain Controller running including a Global Catalog server, move any FSMO roles away from the domain controller to another server, then DCPROMO it down to a regular server. ColdClone the server. Turn off the physical server (never reintroduce it to the network!). Turn on your virtual server, DCPROMO it back to a DC and move any FSMO/GC roles as needed. Done!
  • You only have one server, it is full of stuff (i.e. SBS?). You could just clone it hope for the best and cry if it fails… Or set up a temporary Domain Controller on a new (virtual?) server (yes it is possible to have multiple domain controllers in a Small Business Server setup – but only 1 SBS), replicate the domain, create a full backup, backup and restore the database.. up to you, but I would not recommend it. Whatever You choose here, make sure the physical server is never turned on after cloning, dont change disk sizes, and create a full backup before you start! Basicly your physical server will be your best backup, but it is not enough to ensure no problems will happen!

I know some people say, well it worked when I did it.. It is like saying I do not need RAID on my servers storage, I have not had a Disk failure ever! When You have the problem, it doesnt matter how many times it worked, you have the problem!

So a quick check list of do’s and dont’s

  • Do a full backup first (at least active state!)
  • Do NOT do HOT-CLONES!
  • Do have more than one Domain Controller
  • Do NOT turn on the physical server again – ever – after cloning it
  • Do clone your server, while de-promoted and promote after cloning again
  • Do NOT clone ALL your Domain Controllers at the same time, leave at least one physical for 3 months
  • Do create new virtual Domain Controllers to replace old physical
  • Do NOT change disk sizes or types during a clone
  • Do check event logs after cloning to check for problems
  • Do NOT use normal time settings on virtual Domain Controllers
  • Do look up best practices for virtual Domain Controllers time settings

I hope I saved someone a couple of hours of pain 🙂 If not You read this far – leave a comment to encourage more info in the future!

– Sole Viktor

63 Responses to “Virtualizing Your Domain Controllers without getting fired!”

  • […] There is however some special considerations You must do, when You are using virtual Domain Controllers, not to mention, please with sugar on top, do NOT P2V/Convert Your physical Domain Controllers to virtual, without at least reading this article! […]

  • Sole:

    Hi Mike,

    Great information, much appreciated and definately worth a read.

    However my conclusion stands, I would not recommend P2V’ing a Domain Controller unless absolutely necesary. It is so much safer and easier to just demote / promote and if needed P2V while the server is not a Domain Controller.

    -Sole

  • Brian:

    This all makes domain controllers look rather fragile. Makes me want to run from the room screaming. Sure this is more secure. But why would having my whole infrastructure crash because I did not read some obscure tech note be prefered to a small possibility of someone injecting a rogue DC?

    Brian

  • Diego:

    Nice information.Thank you very much !

  • Ken:

    I’m still getting my feet wet with hyper v. I’ve been able to make a bunch of virtual test servers that all authenticate with each other, yada yada. I was tasked with learning Hyper V because we have a client who has 3 physical servers (active domain controller, terminal server, file server) that he wants to virtualize. Can’t you just back up the server into a .vhd and then create a new VM, using the old .vhd as the hard disk source?

  • Ken:

    addendum to above: using sysprep to remove identifiers so that your system will be more willing to accept new hardware changes. Lord knows if you forget to strip the machine of everything that identifies it, all you get is a bsod when you import a machine drive like that.

  • Sole:

    If you use sysprep on a Domain Controller it should be pretty useless domain controller when done… Not sure I understand what you want to achieve.

  • Sole:

    If you want to convert a physical machine to a virtual, you should use a tool for “P2V” Physical to Virtual conversion. But as this article suggest, it would be a better idea to make a new active directory domain controller and move AD services to this new server, than doing a P2V on it. so if you want to convert theese machines, look for tools for P2V.

  • EK:

    Hello – thanks for this information.

    I made the mistake of doing a hot clone, and within 24 hours I noticed some peculiar things, so I deleted the domain controller.

    I also did a test reverse snapshot backwards in time, and was immediately getting event logs. It looks like the replica partners immediately rejected the clone controller and would not replicate to it. I guess this is a safeguard in 2008 R2.

    I ended up deleting the cloned Domain controller, and replication on my remaining 3 seems ok. I’ve been adding test users from each and they appear on each other correctly.

    I’m going to follow your advice (and this is what I thought of immediately after I saw the errors) of just creating fresh new virtual machines and migrating the domain over to them.

  • EK:

    I forgot to ask this: Once a domain controller has been virtualized in Vmware, can it be vmotioned off to another host, or does that cause much of the same issues you raise above?

  • Sole:

    vmotion is not a problem. only “issues” I would worry about after you have created your happy virtual DC is:

    • time issues (important to keep time updated constantly)
    • Do not use snapshots (can do very bad stuff to your AD)
    • Do not make changes to virtual disks i.e. change size.
  • EK:

    With virtual disks does it matter if it’s thin or thick provisioned? I did a 50GB thick partition just to be on the safe side.

    thanks for all the help so far this has been very valuable information.

  • Sole:

    For issues no it should not matter. For performance thick will always be better, but considering how little an AD DC requires of performance you will probally not notice the difference unless you have a high load. So short answer you should be fine.

  • GarethH:

    One thing you might also want to point out is that the computer SID will be the same unless the member server is sys prepped, This can also cause issues if you clone from the forest root DC – This is explained in detail here: http://blogs.technet.com/b/markrussinovich/archive/2009/11/03/3291024.aspx

    I whole heartedly agree, id rather build a new member server then promote it versus cloning or P2V-ing an existing DC. P2V will not remove the old hardware drivers!

  • Demian:

    Great information, actually I personally lived a situation like the described above, I tried to virtualized from an Image Backup after a hardware disaster (yep, I only had one Server… cuz it is a small network), everything just got wrong, I had to create a new domain, migrate everything… a lot of work for a weekend… corporate admins just got very angry… but sh*t happens… btw, what would be the best strategy to backup one domain (repeat: in case you can only have one domain!… because your budget or because you only want to have one or… whatever)…

  • Sole:

    Hi Demian,

    Always have a MS System State Backup, no matter what kind of backups you do. This should preserve your AD information, and will be the minimum requirement to get support from MS in case of disasters.
    An image backup may be usable if there is only 1 Domain Controller in existence and no changes to disks are made and no other system is aware of contents in AD. But I would only use a solution like this if the Domain Controller was virtual to begin with.
    The Microsoft solution would be to use Backup Software to keep backups 🙂 Image backups, snapshots, etc. are not Backups and should not be treated as such.

    -Sole

  • EK:

    I use 2008 R2, and I use the built in Windows Backup utility and use the default settings. I use the recommended settings which makes a system state and image to a network volume on a daily basis. Via network only keeps the latest backup, whereas backing up to disk lets you keep multiple backups.

    From there if you have a hardware issue, you would follow microsoft’s online articles to recover the server from the windows approved backup.

    Even though I run these backups if a DC were to fail, I’d probably just delete the failed DC, get 2008 r2 running on another functional system, and add the domain services again.

  • Sole:

    Thanks for good input. There should always (unless SBS small size) be more than one DC, so a failed DC should be replaced with a new instead of “restoring”. If all DC’s are gone, your backup of System State comes into play.

  • Gary:

    Wow, great post.

    I have a nearly identical issue. we are using Hyper-V on a 2008 R2 server and have our domain controller (2003) running as a VM. It was not cloned but set up from scratch and promoted properly. As a disaster plan (in case the server crashes), we have another 2008 R2 machine that nightly copies the VHD files from the production server (after shutting down the VM) and saves them on the backup server. Theory is that in case of a failure we can offline the production machine and bring the DC VM up on the backup server (less, after reading your blog any account changes made since the last backup). Seems like it should work, all things being equal.

    Well… We tried the backup plan over the weekend and everything seemed OK. Workstations could log on, etc. However we started randomly getting errors on different workstations when attaching shares (Event log: “The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server host/rio.ca1.local. The target name used was cifs/RIO.CA1.local). Some WS were fine, some would not connect. We also have a SQL Server instance on the VM and encountered random connection problems (but using SQL Server security, not Windows-may not be related), but only on some workstations, other were fine.

    The only thing I can assume would be different between the VM’s might be the Virtual Network MAC address. Considering the machine setup is identical and the C drive is a duplicate, what am I missing? Unless there is some kind of flim-flam behind the scenes that makes them think they are not the same (but isn’t that the point of a VM?). Does the Virtual machine file have to be copied (import/export) too?

    After it failed the test we went back to the pre-test production VM and everything is back to normal. Total time on the backup VM was minimal and no AD changes were made so hopefully there are no AD problems.

    We’re trying to make the disaster recovery as seamless as possible without having to buy a bunch of other stuff and have hours of down time.

    Any thoughts would be appreciated.

  • Jeff:

    I am about to do sort of the same thing. I have purchased ESXi Essentials and am taking the company from SBS2003 to Virtualized 2008R2. I have many concerns about doing this one of the main ones is that I trigger the FSMO role transfer too early and then have a point of no return. The hardware on SBS is old and could fail. I have a R710 and an R720 that I plan to use for the Hypervisor infrastructure and make redundancy. The SBS has a few critical factory line of business apps running on it that it never should have had being the DC (Wonderware Historian Industrial SQL, Rockwell automation software). I know that 2008 R2 DC can perfectly co-exist as AD server replication receiver and a DC indefinitely as long as SBS retains FSMO roles.

    It would be great if I could p2v the SBS and do the migration from there but it seems too risky. I’ll be bookmarking this article. 1st time for me having to decommission a critical production server, taking a business from SBS to Standard, and installing hypervisor infrastructure. I’m getting all the birds with one stone here…..great article, thank you (any advice on a better or a tired and true way would be appreciated also.)

  • CQ:

    How about moving from one esx to esxi on different hosts? I have done this in the past with other vms, but not a DC. I shutdown down the vm, in vcenter copy down the vm and then upload the vm to the new esxi host.(only local storage) Then add it to inventory, upgrade the tools, then upgrade the virtual hardware.

    Just moving the vm, no changes to it. There is another DC that it replicates to. Wouldn’t this be like shutting down a DC to add memory, etc?

  • Sole:

    Hi CQ,

    There should be no problems moving an existing virtual DC from one ESX to another ESX/ESXi. The issues I have described are linked to going from physical to virtual, cloning and disk changes.

    -Sole

  • Sole:

    Hi Jeff,

    Personally I would leave the SBS server on the physical hardware if possible. And then move the services and their data away from the SBS, untill it is empty and can be decommissioned.
    If you do decide to virtualize your SBS server (should be possible), make sure you do not introduce the old hardware based to any DC’s who have seen the virtualized SBS server.

    In my opinion based on my experiences, the best way to virtualize a DC, is by making a new server and promoting it to DC – this is very safe 🙂

    -Sole

  • Sole:

    Hi Gary,

    I am unsure as to if you only have 1 DC server, if at all possible you should have at least 2 for redundancy.

    Your “backup” is actually snap-shots in time, and starting the copy is like going back in time for the DC, but it is unaware that it happened.
    If you restore a backup (system-state) of a DC, it will be aware that it happened and try to sync with other DC’s all the information it is missing to be up to date.
    If you do have more DC’s and do a “snap-shot” restore/startup, it will give problems because the DC’s will no longer agree on what information they have synced and they will not re-sync the information the “crashed” DC allready recieved.

    The reason youre getting the errors is because of Kerberos tickets being incorrect. The computer account will update its account “password” with the domain, once in a while.

    • So at time A you do a snapshot of the only DC.
    • At time B a workstation changes its password automatically from OLD to NEW.
    • At time C you restore the snap-shot from time A.

    The password for the workstation will no longer be correct, as the workstation knows it synced with the domain and has password NEW, but the DC is sure the password is OLD.
    You can manually reset the computer accounts password, to fix this or go the long way and take the workstation out and back in to the domain.

    -Sole

  • Mke:

    What if I want to virtualize DC’s for the sole purpose of a test environment? Right now everything from my knowledge is physical. The reason I ask is this company recently acquired another company and they are planning on eventually migrating to either an existing domain or possibly creating a new one. We need to be able to test these changes on a test environment and not on production.

  • Sole:

    You can copy away all you want, just make sure your copies never get in contact with production environment again, otherwise the copy can create the damages described.
    Sounds easy enough, but full isolation is not always easy.

  • Mke:

    Is the best way to create a new virtual server, restore the system state from the forest root DC, copy other services to an external drive that you can hook up to where the isolated vmware environment is and copy those services back on the new virtual DC? I assume I would want to start with the forest root DC and then any other DC’s I want to test with along with other potential servers that are required for testing. The test environment would be completely isolated from the current network.. so there will be no copying data over the network or restoring from backup etc.. all the data will need to be copied to some type of external device with drives in it.

  • damas:

    “Now imagine You have a server full of other services as well, and for some reason You feel it is just not worth it doing one from scratch (Yes you can copy DHCP databases, shares, DNS, etc. from one server to another!), well then do this – Make sure You have another Domain Controller running including a Global Catalog server, move any FSMO roles away from the domain controller to another server, then DCPROMO it down to a regular server. ColdClone the server. Turn off the physical server (never reintroduce it to the network!). Turn on your virtual server, DCPROMO it back to a DC and move any FSMO/GC roles as needed. Done!”

    (ColdClone?) Why not HotClone the server as member?

  • Sole:

    ColdCloning should have a higher success rate, since files and services will be shut Down nicely. The server is in this theoretical scenario running lots of other important stuff, otherwise the Whole need to copy it is pointless. So assuming the server has databases and other stuff that might not be happy with a hotclone, do a coldclone just to be safe.
    After all your making the server a Domain Controller and want it to be stable 🙂

  • Mike:

    Hi ! – Wondering…we have vmware 5.1. My thoughts was to create a time within our environment to freeze all changes and conduct a P2V of our primary FSMO holder DC. During which we would check the box to replicate all changes. After the server has completed converting we would immediately shut down the physical box and bring up the virtual server. As long as we do this and take care of time issues do u think we are safe? OR am I just asking for problems :))

  • Erik K.:

    Mike I wouldn’t do that you’re going to run into problems because the disk signature changes.

    just do this: create a new virtual server, dcpromo it to the existing domain. then use ntdsutil to transfer the roles to the new virtual machine.

    once that is done, you can either keep the physical machine on as a backup, or run dcpromo to remove directory services, remove it from AD, and shut it off for good.

  • Mark:

    So… here’s a question, if a domain controller has been converted “Hot” what would be the best approach to reverse the process. I’m reading that one should never turn the old controller back on, can someone explain to me why I couldn’t just power down the VM and then turn the Old controller back on? What are the repercussions?

  • sai:

    @Mark

    How many dcs do you have you in your network? Have you checked replication if you have more than one?

    I personally wouldnt turn the old dc on…ever, you are just asking for trouble in my experience with duplicate SIDS, corrupt SYS VOL etc.

    Do you have a system state backup?

  • Remy:

    I have a physical sbs 2003 server and another 2003 server running as a DC under Hyper-V, along with other 2008 and 2012 servers running under Hyper-V. The SBS server at this point is running just for Exchange 2003 and ISA as I have migrated everything else to new servers and I am looking at a Office 365 for an exchange replacement. I am taking occasional System Center P-to-V conversions of the SBS system and leaving them in an offline mode with the thought that if something horrible ever happened to the physical SBS server it may be quicker to bring up the virtual copy and mount a backup of the exchange server.

    1. Will this work? Will there be problems with the other domain controller not replicating with the older virtual SBS server? If so would doing a system state restore from the previous days physical SBS server on the virtual SBS server take care of the problem?

    2. Any suggestions?

    Thanks

  • Jim:

    I had a physical 2003 SBS that would run for 4 hours and not a moment more. This machine would shut down for no reason I could find. Not being the original IT guy setting this thing up. I quickly built a computer using UBUNTU 64bit 10.04 and install VMware workstation. Using VMware’s clone software I cloned the SBS Live in 3 hours and 47 minutes. 13 minutes left over before system shutdown.

    We have been running this Clone for 3 years now, few problems have come up and all problems have been hardware. Software RAID works Very well. I agree this is not the Ideal approach, worked this time. I will say that Migrating to a new machine the Microsoft way is VERY painful. You must go by the numbers no matter how many times You have Migrated. Microsoft really? You can not come up with a simple solution like Click and done? 🙂 yes I am lazy 🙂

  • Jeff:

    I have a physical 2008 Std SP2 server that is at EOL and running out of disk space. It is a “member” DC and holds other roles – DNS, DHCP, Print, File, IIS, CA, etc. (I inherited it – not my fault).The FSMO Master is already another VM (VMWare). I also already have an additional virtual “member” DC running in the same domain.

    If I decommision the ADDC role on the physical box using DCPromo and migrate the other roles to other servers before converting it, can this “all in one” box live on as a VM as fileserver with the same name/IP address in the same network? I really want to avoid changing the configurations of several apps that are using it for their file storage.

    Or will there be ghosts from the old DC that could cause problems?

    Thanks in advance.

  • Sole:

    If you decommision the DC role, it becomes a regular server like any other.
    Then there should be no problems virtualising it.

    There should not be any ghosts, at most there might be an AD object here or there that is empty, but won’t do any harm.

    Actually if you realy want a DC to become virtual, it should be decommissioned first, virtualized and then commissioned again. Ofcourse time issues, rollback, snapshots, etc. are still issues that should be considered once a DC is virtual.

  • Jeff:

    Thanks for the reply – I supsepcted that there would be few if any problems. Good to have confirmation. I have already created a replacement DC to eliminate role “congestion” That server willsoon live out its days as a simple fileserver…

  • Steve:

    I have several DC’s with some on Win 2003 and some WIndows 2008 and 2008 R2. I need to test some DFS functions in a test lab. We have a segmented Virtual lab already created. I have a physical DC running win 2003 (DC1) that is also the DFS root for our DFS namespace. I need to bring this DC1 into our Test lab to test some DFS upgrades. I would like to get DC1 in to our virtual test lab to test this. Would it cause issues if we cold clone DC1 and then bring it up in the LAB as virtual and then just turn the Physical DC1 back online in our production environment.

    Thanks in advance.

  • Sy:

    Hi Sole,

    Great article, thank you.

    How would you approach this P2V challenge?

    Windows Server 2000
    Active Directory (No FSMO roles held)
    Exchange 2000 (Only Exchange server)
    GC
    ISA 2000
    DNS
    WINS
    1 other local DC (2003 R2) with GC, DNS, WINS

    Server 2012 Hyper-V
    SC 2012 VMM for P2V conversion

    Any help would be much appreciated.

  • Sole:

    I am unsure as to if all thoose services listed are placed on one machine?

    If I had to virtualize domain controllers today, I would prefer having at least 2 and demoting the machine to be offline virtualized. After virtualization I would promote it back as a DC and ensure full synchronization prior to virtualizing any other DC. Also ensure that the “old” physical machine is never put online on the network again might be a good idea.

    -Sole

  • Sole:

    if the clone is kept strictly out of reach to production environment, the original will never be the wiser or know it was cloned.
    So in that case it is fine.

    But be carefull, if you accidently put the copy online so it can talk to other DC’s, it can give disturbing problems, where it receives updates the original will then never get.
    If for some reason you accidently get a clone online. Demote the original, kill/demote the clone, and make a new DC.

  • Sy:

    Thank you Sole. Apologies for my lack of clarity.

    Server 1 (to be virtualised): Windows Server 2000, Active Directory (No FSMO roles held), Exchange 2000 (Only Exchange server), GC, ISA 2000, DNS, WINS
    Server 2: DC (2003 R2) with GC, DNS, WINS
    Server 3: Server 2012 Hyper-V, SC 2012 VMM for P2V conversion

    As I understand it, however, this clarification would not change your recommendation. I am going to propose a demotion and offline P2V.

    Again, I really appreciate you taking the time to reply.

  • Steve:

    Sole, nice article. Quick question, I am trying to standup a lab environment of my companies Active Directory Domain. In production I have physical and virtual dc’s. I want to clone the virtual DC and P2V a physical DC and then move them in the LAB environment which is totally isolated from production. I understatnd all the metadata cleanup I will need to do since I am not moving all DC’s over to the lab. My question, do i need to worry about P2Ving the physical DC if I am just moving the virtual copy to the LAB and it will never touch the production environment?

    Steve

  • Steve:

    Sole, I see you answered my question in an earlier post. No need to reply.

    Thanks

  • MM:

    Hi Sole / All,
    Great article! But do you know if multiple DC’s can be demoted simultaneously. I have over a hundred 2003 DC’s which need to taken down to less than 10. If I identify the FSMO roles first can demote say 20 DC’s that don’t have FSMO roles / GC assigned?
    MM

  • Sole:

    Yes, just make sure the remaining DC’s have routes to synchronize. Take care with manual site links.

  • Sandeep Agarwal:

    Hi Sole ! I have 2 DCs in VM and want to convert them from Thick to Thin. Are there any known issues / prerequisite / suggestion in doing so? Thanks.

Leave a Reply