This post describes a process I executed to replace a physical VMware VirtualCenter box with a virtual equivalent running in its own cluster. There were no VM outages of machines running on the cluster – each ESX kept merrily running the virtuals until everything was sorted out with the VC layer.
Note that this was done in a lab instance of ESX 3.5 and VirtualCenter 2.5, had this been production I probably would have taken a little more care.
I don’t think there is any compelling reason why you wouldn’t run a virtual VC box, it would be hypocritical of VMware to suggest virtualizing your application servers, except VirtualCenter which should be physical. Having said that, this process could also be used in reverse, taking a VM VirtualCenter instance physical should the need arise. This would also be useful for a VirtualCenter disaster recovery scenario.
The configuration before completing these steps
- Physical VirtualCenter 2.5 running on Windows Server 2003, called vc01, static IP
- Virtual Windows Server 2003 computer running in the cluster, called vc02, dynamic IP \
- SQL 2005 database for VC, running on a separate SQL server.
- ESX 3.5 VC 2.5, single network, iSCSI shared storage
Prerequisites
- The IP address of the VirtualCenter server
- The ESX host of the VM becoming the new VC server
- The path to your VMware FlexLM license file (assuming you’re using a license server)
- The logon details for the SQL connection between vpxd and the database
Steps Taken
On the physical VirtualCenter box that is going to be decommissioned:
- Take note of the physical ESX host running the VM becoming the new VC server
- Stop the vpxd service and change the startup to manual (sc stop %service%, sc config %service% start= demand)
- Stop the vmountVpx, vmware-ufad-vci, vmware-converter and webAccess services change the startup to manual
- Stop the flexlm instance and set the startup to manual
- Take a backup of your .lic license file
- Change the IP address to dynamic (assuming it is static)
- Power off the physical machine
- Delete the computer account for vc01 from the domain
- On the SQL server hosting the VirtualCenter database, backup the VC database and log file with the following command executed through management studio:
BACKUP DATABASE [VMVC]
TO DISK = N'c:\temp\VC_PreMoveToVM.bak'
WITH
DESCRIPTION = N'VirtualCenter Pre-move to VM backup'
, INIT
, NAME =
N'VC Pre move to VM'
GO
RESTORE VERIFYONLY
FROM DISK =
N'c:\temp\VC_PreMoveToVM.bak '
WITH FILE = 1
GO - In the service console of one of the ESX hosts, backup the current certificates (just in case):
- mkdir /tmp/cert_backup
- cp /etc/vmware/ssl/* /tmp/cert_backup
- On the VM becoming the new VirtualCenter server running on a host in the cluster: Rename the server from vc02 to vc01, with the same static IP as the previous vc01
- Restart the virtual machine
- Install the SQL Native Client - required for VC 2.5 SQL connectivity on a non-SQL server
- Install VirtualCenter including FlexLM, connecting to the existing database and using the license file copied off the physical server
- In VirtualCenter, disconnect the first ESX host in the cluster - maintenance mode is not possible at this stage – login errors occur because of the incorrect certificate
- Copy the new VC certificates - I use pscp here, but whatever you normally use to copy files to ESX would be fine:
- cd "C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\ssl"
- pscp rui.crt vcadmin@esx01:/etc/vmware/ssl/
- pscp rui.key vcadmin@esx01:/etc/vmware/ssl/"
- In the service console of the ESX node you’ve just updated the certificates on, restart the management interface, the ESX host did not seem to pick up the new certificate dynamically (maybe it would on a schedule without a restart):
- service mgmt-vmware restart
- Connect the ESX host to VC through the VI Client interface
- Repeat steps 15-18 for each other ESX host in the cluster
- In VirtualCenter, run 'Reconfigure for HA' on each ESX node
Testing - Ensure the vpxd.log file reports no problems with host connectivity or certificates (also check the VC/ESX logs)
- Ensure each ESX host is receiving licenses from the 'new' license server, either through the VI client or the FlexLM admin tool.
- Ensure you can perform simple tasks such as powering on a virtual machine
- Ensure VMotion/HA/DRS is working
The (untested) rollback plan if something goes wrong:
- Shutdown the new VirtualCenter VM
- Restore the database from the backup taken
- Power on the physical VC box, change the IP to the static IP
- Restart the vpxd and VMware License Server services
- For each ESX host in the cluster, disconnect the host, restore the old certificates, restart the management service and connect the ESX host to the old VirtualCenter instance
Additional notes:
- Even though user login errors were returned when the vpxd service tried to form the cluster - which points to the vpxuser account used by VC to manage ESX hosts - this was misleading as this username and password is stored in the VC database – which had not been modified (in the vpx_host table). The next logical step was certificates, which lead to certificate update process used above.
- Manually copying the certificates may not be strictly required, as when I went part-way to reconnecting a host without updating the certificates, I was prompted that another VC instance was managing these servers, would I like to continue. Presumably it would have automatically updated the certificates as required.
No comments:
Post a Comment