Cannot upgrade the cluster : Object(s) are inaccessible by Virtual SAN : No Policy Entry found in CMMDS : LSOM Object not found

I thought I’d see if I could break my environment again by starting from vSphere 6.0U1 and attempting to perform a completely hassle free upgrade to VSAN 6.2. My previous two posts have come back with problems so 3rd time lucky right?…

As before, I upgraded the VCSA (which was running as a VM within the environment I’m upgrading) from 6.0U1 to U2 and the hosts and rebooted everything to give me a clean slate. I migrated across a couple of VMs from another environment and did some typical admin tasks by cloning, vmotioning, powering down etc.

I hit the magic “Upgrade” button under VSAN Management and thankfully didn’t get any Task related errors. However, out of nowhere, the following error appeared : Cannot Upgrade the Cluster: Object(s) xxx.xxx.xxx are inaccessible in Virtual SAN.

vsanfail8

Wow – third time lucky and a different error! This VSAN upgrade process is starting to become frustrating (umm, how much testing took place before it became GA?).

Anyhow, having seen a similar error, albeit displayed in a different way in this post, I figured I’d see what objects this particular upgrade failure attempt related to. As before, I SSH’d to the VCSA, RVC across to the Ruby Shell (other post has more instructions) and then ran:-

vsan.object_info BRAINS/ fcfbcd56-5731-55b0-42bb-0c4de9cd75c8

This time around, the objects seemed to be complaining about the lack of a Policy Entry within the CMMDS and LSOM Object Not found:-

vsanfail9

As the objects were a) not found and b) reported to have a usage of 0.0GB I figured I’d get rid of them! So, SSH’d across onto an ESXi HOST (yes an ESXi host, people have tried this on the VCSA and it just doesn’t work) and ran the following command, replacing the Object ID where necessary. (note this is a homelab, proceed with caution)

/usr/lib/vmware/osfs/bin/objtool delete -u fcfbcd56-5731-55b0-42bb-0c4de9cd75c8 -f -v 10

The output reported “Successfully deleted” so I re-ran the upgrade and everything completed successfully.

As the battle to perform a seamless upgrade has been taking place, I have been made aware that there is currently a BUG that is in the process of being created into a KB around the deletion of objects that cause orphaned objects that result in problems such as these. As soon as I have more information, I’ll link it to the posts.

A general system error occurred : Failed to evacuate disk : Out of resources to complete the operation : VSAN 6.2

A general system error occurred : Failed to evacuate disk : Out of resources to complete the operation : VSAN 6.2

As per my previous post, I’ve had a few niggles getting VSAN 6.2 to work in my lab environment so I rebuilt things from scratch to try and see if the second attempt was less troublesome.

I started with VMware 6.0 U1 and built the VCSA, added three hosts on 6.0U1 and installed Horizon View 6.2. During the configuration of Horizon I elected to allow the first Desktop Pool to use the vsandatastore which subsequently creates associated Storage Policies to match the requirements of the various components for View (Replicas, OS Disks etc).

After spinning up a couple of Win 7 Virtual Machines, I embarked upon the same 6.2 upgrade. This time, I didn’t experience any “failed to realign following object” errors when attempting to upgrade to VSAN 6.2 as per this post, but I did receive the error “A general system error occured: Failed to evacuate data for disk uuid 522b9a6e-093b-a6c0-01b8-a963ac325bed with error : Out of resources to complete the operation”

vsanfail6

I realised my mistake here in that as this is a 3 node cluster with an FTT set to 1, this wasn’t going to work. I subsequently went into the VSAN Storage Policy and attempted to drop the vSAN default policy FTT to 0 (accepting that my environment would be at risk until I switched it back) and applied the newly defined Storage Policy only to find that everything was suddenly reporting as “Not Applicable”. I checked the Resynchronize Dashboard status within VSAN and noticed that it was still churning away applying the newly configured Storage Policy. Once the resync completed, I tried the upgrade again, but this failed! I ssh’d over to the VCSA and then into the Ruby Console:-

rvc adminstrator@vsphere.local@localhost

I navigated to the folder /localhost/Datacenter/computers (using cd /localhost/etc) , and ran the following command (where BRAINS is the name of my Cluster)

vsan.ondisk_upgrade –allow-reduced-redundancy BRAINS/ 

The upgrade proceeded and eventually completed (wow, took 8 hours)

vsanfail7

After completing the upgrade, I realised my mistake. As per the first section of this post, I’d installed Horizon View, which created a bunch of VSAN storage policies. Whilst I’d changed the default vSAN policy to an FTT of 0, I completely forgot to set the others to 0 as well!

Note to self – always check ALL storage policies before assuming that something else is broken – OR, take the simple route and force the upgrade to run without need to re-sync a shed load of data between disks due to a storage policy change.

General Virtual SAN Error – Failed to realign following objects – VSAN

General Virtual SAN Error – Failed to realign following objects – VSAN

As an eager VSAN user in my homelab I was very keen to get upgraded to VSAN 6.2 so that I could start to benefit from the new feature set. Following a successful upgrade of the VCSA and the associated Hosts (which I had planned on documenting and may well get round to doing so shortly), I was all prepared and duly pressed the “Upgrade” button on VSAN only to hit an immediate blocker:-

“General Virtual SAN Error” – Failed to realign following Virtual SAN objects: 7ef7a856-333c-7f40-4dcd-0c4de99aaae2 due to being locked or lack of vmdk descriptor file, which requires manual fix

vsanfail2

Google produced nothing because the product was less than 24 hours into GA so time to polish up my Ruby skills.

I ssh’d across to my VCSA and from within the Appliance Shell (not Bash shell), logged into RVC using:-

rvc administrator@vsphere.local@localhost

I entered my local sso password and was presented with the RVC shell.

Having  used the RVC console as a consequence of some other troubleshooting efforts with VSAN previously, I knew my way around and immediately changed directory to the Datacenter level:- (you can browse your Virtual Center tree like a folder structure with ls inside Ruby)

cd localhost/ACM Computers/computers

Once inside the datacenter and computers folder, I ran the following command to include the Cluster name and one of the UUID provided in the error message  “7ef7a856-333c-7f40-4dcd-0c4de99aaae2”

vsan.object_info BRAINS/ 7ef7a856-333c-7f40-4dcd-0c4de99aaae2

This returned the following helpful output:-

vsanfail

This showed me that the object that the upgrade was being blocked by was the ACM-ADC-V001 virtual swap file.

I quickly ran a health check to ensure that the entire VSAN cluster hadn’t got some inaccessible objects as there had been issues with vswp files historically in earlier VSAN releases.

vsan.check_state BRAINS/

but this returned healthy:-

vsanfail3

I powered down the associated VM which appeared to remove the VSWP and the LCK file and re-ran the upgrade attempt. It failed again!

So, now to attempt manual object removal! (please note, I do NOT recommend doing this without GSS, this is my home lab so I did it off my own back). Seems that the vswp is stuck within the object based file system so I SSH’d across to an ESXi HOST (not the VCSA) and ran the following:-

/usr/lib/vmware/osfs/bin/objtool delete -u 7ef7a856-333c-7f40-4dcd-0c4de99aaae2 -f -v 10

vsanfail4

Good news, the file successfully deleted, but OH NO! when rerunning the Upgrade again – bad news, whilst the UUID didnt appear in the failure list, it did fail with 3 other UUIDs. So I repeated the first instructions to determine what objects these were as part of a query and they all happened to point to a folder called “cloudvolumes” within which there are a number of pre-created Template files:-

template_uia_plus_profile.vmdk
template.vmdk
template_uia_only.vmdk

vsanfail5

This folder and its files exist because I use AppVolumes, so for me I simply deleted these files directly from within the Datastore file browser and re-ran the VSAN Upgrade (I can recreate these later).

As soon as I completed this and re-ran the Upgrade process, it completed successfully!

I wonder if AppVolumes isn’t VSAN aware? The real issue here I would imagine arises if you have created multiple AppStacks that are placed within the same folder structure as they aren’t so easy to just go ahead and remove! Time for a ticket with VMware? Any one in production with similar issues?