Unresponsive guest – hung VM

I’ve encountered a few scenarios where virtual machines have just refused to power off and each time I find myself hunting down the best method to kill them indefinitely. Occassionally these “hung” virtual machines are as a result of losing sight of their storage – yet the memory thread still stays resident.

Firstly, it’s best to determine if the VM really is still running:-

vmware-cmd -l
(this lists the Virtual Machines on the host – on and off)

copy the full path to the VM that you wish to query i.e. /vmfs/volumes/4a69985-29b83f0c-5ee5-001b3432f0d0/vm.vmx

and insert it into

vmware-cmd (path) getstate

i.e. vmware-cmd /vmfs/volumes/4a69985-29b83f0c-5ee5-001b3432f0d0/vm.vmx getstate

if the host believes the virtual machine is still on, it will return
getstate() = on

if the machine is in fact off, it will return
getstate() = off

If it is still running and you are unable to shut it down using the vSphere/VI client, here are a couple of ways to kill off any unresponsive virtual machines:-

vmware-cmd (path) stop

validate whether this has been successful with another getstate command

vmware-cmd (path) getstate

if unsuccessful, try a stop hard request

vmware-cmd (path) stop hard

once again, checking to see if this has worked

vmware-cmd (path) getstate

—————————-

Alternatively, you could try:

vm-support -x
(this displays a list of running VMs and their associated World IDs)

vm-support -X <wid>
(this attempts to kill off the process with the World ID specified)

—————————-

Finally, and as a last resort:-

ps -g | grep <VMname>

This will show the following

649451      vmm0:VMname
649453      vmm1:VMname
649640 649448 mks:VMname       649448 649448  /bin/vmx
649641 649448 vcpu-0:VMname    649448 649448  /bin/vmx
649642 649448 vcpu-1:VMname    649448 649448  /bin/vmx

The first column is the World ID (WID), the second column is CID and the fourth column is the Process Group ID (PGID). The PGID is the relevant value required (649448).

kill -9 <PGID>
i.e. kill -9 649448

Using the kill command, the unique processes for this VM should now be terminated. I have found that whilst this works, it does sometimes reset the VM.