Orphaned, disconnected or inaccessible?

I was asked the other day by one of my colleagues to explain what the difference was between each of these VM states so I figured I’d write a quick overview of each.

Orphaned VM

In a nutshell its a VM that vCenter still has a record of within the database, yet it either doesnt actually exist anymore, or isn’t on the host where vCenter expected to find it.

So how did it get into this mess? Well, quite simply really. Imagine you’re managing a two host cluster within vCenter and someone decides to administer one of the individual hosts directly through the vSphere client. They then proceed to remove one of the virtual machines from the inventory. As a result of this, the ESX host itself drops its record of the VM ever being there, but the vCenter DB still has record of its existence. As such vCenter marks this VM as “Orphaned”. To rectify this, either re-add the VM back to the standalone guest or remove the “Orphaned” entry from the inventory and re-add it. (note, if you do not remove the orpahned entry from vCenter server, you will not be able to re-add it as the same VM name).

Inaccessible

This is usually when a datastore or its associated folder/files on the datastore have gone walkabouts and the host can no longer see the VMX file it used to talk to in order to maintain visibility within the vCenter server. This can sometimes happen if someone decides to rename the folder the VM resides in, without removing it from the inventory, renaming the folder and then re-adding it.

To resolve the inaccessible message, either relocate where the underlying VM has gone or remove it from the inventory completely.

Disconnected

This is usually as a result of the host that last managed the VM losing communication with the vCenter server. Any VMs that were running at the time of the break in communication (or indeed a manual right click on the host and clicking disconnect), will render the VMs under its control as disconnected.

To resolve the disconnected state, Connect the ESX host back into the cluster.

Unresponsive guest – hung VM

I’ve encountered a few scenarios where virtual machines have just refused to power off and each time I find myself hunting down the best method to kill them indefinitely. Occassionally these “hung” virtual machines are as a result of losing sight of their storage – yet the memory thread still stays resident.

Firstly, it’s best to determine if the VM really is still running:-

vmware-cmd -l
(this lists the Virtual Machines on the host – on and off)

copy the full path to the VM that you wish to query i.e. /vmfs/volumes/4a69985-29b83f0c-5ee5-001b3432f0d0/vm.vmx

and insert it into

vmware-cmd (path) getstate

i.e. vmware-cmd /vmfs/volumes/4a69985-29b83f0c-5ee5-001b3432f0d0/vm.vmx getstate

if the host believes the virtual machine is still on, it will return
getstate() = on

if the machine is in fact off, it will return
getstate() = off

If it is still running and you are unable to shut it down using the vSphere/VI client, here are a couple of ways to kill off any unresponsive virtual machines:-

vmware-cmd (path) stop

validate whether this has been successful with another getstate command

vmware-cmd (path) getstate

if unsuccessful, try a stop hard request

vmware-cmd (path) stop hard

once again, checking to see if this has worked

vmware-cmd (path) getstate

—————————-

Alternatively, you could try:

vm-support -x
(this displays a list of running VMs and their associated World IDs)

vm-support -X <wid>
(this attempts to kill off the process with the World ID specified)

—————————-

Finally, and as a last resort:-

ps -g | grep <VMname>

This will show the following

649451      vmm0:VMname
649453      vmm1:VMname
649640 649448 mks:VMname       649448 649448  /bin/vmx
649641 649448 vcpu-0:VMname    649448 649448  /bin/vmx
649642 649448 vcpu-1:VMname    649448 649448  /bin/vmx

The first column is the World ID (WID), the second column is CID and the fourth column is the Process Group ID (PGID). The PGID is the relevant value required (649448).

kill -9 <PGID>
i.e. kill -9 649448

Using the kill command, the unique processes for this VM should now be terminated. I have found that whilst this works, it does sometimes reset the VM.