As an eager VSAN user in my homelab I was very keen to get upgraded to VSAN 6.2 so that I could start to benefit from the new feature set. Following a successful upgrade of the VCSA and the associated Hosts (which I had planned on documenting and may well get round to doing so shortly), I was all prepared and duly pressed the “Upgrade” button on VSAN only to hit an immediate blocker:-
“General Virtual SAN Error” – Failed to realign following Virtual SAN objects: 7ef7a856-333c-7f40-4dcd-0c4de99aaae2 due to being locked or lack of vmdk descriptor file, which requires manual fix
Google produced nothing because the product was less than 24 hours into GA so time to polish up my Ruby skills.
I ssh’d across to my VCSA and from within the Appliance Shell (not Bash shell), logged into RVC using:-
I entered my local sso password and was presented with the RVC shell.
Having used the RVC console as a consequence of some other troubleshooting efforts with VSAN previously, I knew my way around and immediately changed directory to the Datacenter level:- (you can browse your Virtual Center tree like a folder structure with ls inside Ruby)
cd localhost/ACM Computers/computers
Once inside the datacenter and computers folder, I ran the following command to include the Cluster name and one of the UUID provided in the error message “7ef7a856-333c-7f40-4dcd-0c4de99aaae2”
vsan.object_info BRAINS/ 7ef7a856-333c-7f40-4dcd-0c4de99aaae2
This returned the following helpful output:-
This showed me that the object that the upgrade was being blocked by was the ACM-ADC-V001 virtual swap file.
I quickly ran a health check to ensure that the entire VSAN cluster hadn’t got some inaccessible objects as there had been issues with vswp files historically in earlier VSAN releases.
but this returned healthy:-
I powered down the associated VM which appeared to remove the VSWP and the LCK file and re-ran the upgrade attempt. It failed again!
So, now to attempt manual object removal! (please note, I do NOT recommend doing this without GSS, this is my home lab so I did it off my own back). Seems that the vswp is stuck within the object based file system so I SSH’d across to an ESXi HOST (not the VCSA) and ran the following:-
/usr/lib/vmware/osfs/bin/objtool delete -u 7ef7a856-333c-7f40-4dcd-0c4de99aaae2 -f -v 10
Good news, the file successfully deleted, but OH NO! when rerunning the Upgrade again – bad news, whilst the UUID didnt appear in the failure list, it did fail with 3 other UUIDs. So I repeated the first instructions to determine what objects these were as part of a query and they all happened to point to a folder called “cloudvolumes” within which there are a number of pre-created Template files:-
This folder and its files exist because I use AppVolumes, so for me I simply deleted these files directly from within the Datastore file browser and re-ran the VSAN Upgrade (I can recreate these later).
As soon as I completed this and re-ran the Upgrade process, it completed successfully!
I wonder if AppVolumes isn’t VSAN aware? The real issue here I would imagine arises if you have created multiple AppStacks that are placed within the same folder structure as they aren’t so easy to just go ahead and remove! Time for a ticket with VMware? Any one in production with similar issues?