General Virtual SAN Error – Failed to realign following objects – VSAN

General Virtual SAN Error – Failed to realign following objects – VSAN

As an eager VSAN user in my homelab I was very keen to get upgraded to VSAN 6.2 so that I could start to benefit from the new feature set. Following a successful upgrade of the VCSA and the associated Hosts (which I had planned on documenting and may well get round to doing so shortly), I was all prepared and duly pressed the “Upgrade” button on VSAN only to hit an immediate blocker:-

“General Virtual SAN Error” – Failed to realign following Virtual SAN objects: 7ef7a856-333c-7f40-4dcd-0c4de99aaae2 due to being locked or lack of vmdk descriptor file, which requires manual fix


Google produced nothing because the product was less than 24 hours into GA so time to polish up my Ruby skills.

I ssh’d across to my VCSA and from within the Appliance Shell (not Bash shell), logged into RVC using:-

rvc administrator@vsphere.local@localhost

I entered my local sso password and was presented with the RVC shell.

Having  used the RVC console as a consequence of some other troubleshooting efforts with VSAN previously, I knew my way around and immediately changed directory to the Datacenter level:- (you can browse your Virtual Center tree like a folder structure with ls inside Ruby)

cd localhost/ACM Computers/computers

Once inside the datacenter and computers folder, I ran the following command to include the Cluster name and one of the UUID provided in the error message  “7ef7a856-333c-7f40-4dcd-0c4de99aaae2”

vsan.object_info BRAINS/ 7ef7a856-333c-7f40-4dcd-0c4de99aaae2

This returned the following helpful output:-


This showed me that the object that the upgrade was being blocked by was the ACM-ADC-V001 virtual swap file.

I quickly ran a health check to ensure that the entire VSAN cluster hadn’t got some inaccessible objects as there had been issues with vswp files historically in earlier VSAN releases.

vsan.check_state BRAINS/

but this returned healthy:-


I powered down the associated VM which appeared to remove the VSWP and the LCK file and re-ran the upgrade attempt. It failed again!

So, now to attempt manual object removal! (please note, I do NOT recommend doing this without GSS, this is my home lab so I did it off my own back). Seems that the vswp is stuck within the object based file system so I SSH’d across to an ESXi HOST (not the VCSA) and ran the following:-

/usr/lib/vmware/osfs/bin/objtool delete -u 7ef7a856-333c-7f40-4dcd-0c4de99aaae2 -f -v 10


Good news, the file successfully deleted, but OH NO! when rerunning the Upgrade again – bad news, whilst the UUID didnt appear in the failure list, it did fail with 3 other UUIDs. So I repeated the first instructions to determine what objects these were as part of a query and they all happened to point to a folder called “cloudvolumes” within which there are a number of pre-created Template files:-



This folder and its files exist because I use AppVolumes, so for me I simply deleted these files directly from within the Datastore file browser and re-ran the VSAN Upgrade (I can recreate these later).

As soon as I completed this and re-ran the Upgrade process, it completed successfully!

I wonder if AppVolumes isn’t VSAN aware? The real issue here I would imagine arises if you have created multiple AppStacks that are placed within the same folder structure as they aren’t so easy to just go ahead and remove! Time for a ticket with VMware? Any one in production with similar issues?

Nostalgia in the Cloud – Prince of Persia in the sky (online)

vCloud Air is now proudly hosting my favourite Nostalgia VM having recently decided to extend my Homelab into the cloud by installing an on premise vCloud Connector Server and a vCloud Connector Node. The entire setup process took literally no more than 20minutes and the beauty of the Nostalgia VM was that as its virtual disk is so small, even on a consumer ISP, the copy speed to transfer from the vCenter UI (note this is only accessible via the C# client) took barely any time.

The quick and dirty steps to create the hybrid integration involved:

  • Download and deploy vCloud Connector Server
  • Download and deploy vCloud Connector Node
  • Configure the on-premise vCloud node to register with my on-premise vCenter
  • Register the vCloud Connector Server with the on-premise node and the publically available vCloud Air node via the IP address of the datacentre assigned to you as part of your subscription (found here), along with your credentials (note that the name you define for your VDC is the ID visible within vCloud Director from within vCloud Air and not the display name of your VDC)
  • I then enabled the vCloud connector plugin from within the vSphere client and could now see both my private and cloud based VMs within the plugin.

If you haven’t already created a catalog within your VDC, you need to do this before you can copy a VM but this is a three click process within vCloud Director.

Copying VMs to vCloud Air also requires the source VM to be switched off.

After the the copy completed, I powered up the VM and opened up the web embedded console. Even though I’d only recently reacquainted myself with the original game through the VMRC on my vSan setup, playing it whilst running from the public cloud gave it a whole new dimension in playing Prince of Persia online! Admittedly my timings and responsiveness had to be slightly adjusted for keyboard inputs due to the larger than 1ms latency I get from my homelab to the VM, but I was pleasantly surprised to find that it was still playable in its full glory.


Transparent Page Sharing – Reduce Your Memory Footprint (Homelab recommended)

In late 2014, VMware changed their default stance on the use of TPS to share memory between Virtual Machines as a consequence of a low risk security threat. Given that VMware needed to remain squeaky clean on security compliance, they duly changed the out of the box way TPS worked forever which for some customers meant that memory budgets and allocations that were architected based upon the old implementation needed to be reconsidered. Views on whether you should or shouldn’t change the default behaviour of VMware to revert back to what is an extremely efficient way of sharing memory are discussions that need to be had with corporate security officers but for homelabs, where YOU wear the security hat (amongst many others), why not benefit from reclaiming some much needed RAM.

Reasons to consider whether or not to do this can be found on numerous other blog posts, but as a quick and dirty guide, here are the steps you need to follow to restore the original mechanism:

On each host within your cluster, through the Web Client, click on Manage –> Settings –> Advanced System Settings. Locate Mem.ShareForceSalting and change the default value of 2 to 0 (zero)

vMotion the workloads off or power them off and on again for the change to take effect!

To demonstrate the benefits of reverting TPS to the legacy way, on my 3 node VSAN cluster with 48GB of RAM, here are the memory stats before and after:





As you can see from the PSHARE/MB common: saving column, in total across all three hosts, with a little bit of maths, this is a memory reclamation of over 10GB which is just short of a quarter of the RAM available to my entire Cluster.  Why not try it and see what effect it has, at least on your homelab!

Nested ESXi Problems running on vSAN – Stateless caching – Cannot apply the host configuration

I was trying to get AutoDeploy to work in a Nested ESXi configuration whereby I configured stateless caching, but each time I would end up with less and less hair as the Host Profile just would NOT apply and I’d see “Host does not appear to have a compliant stateless cache”.


It always came down to the application of the “System Image Cache Configuration” that would result in the Profile failing to commit. Frustrating as it seemed, I was almost at the point of giving up, until I tried exhaustively hunting through the log files:

/var/log/syslog.log which always returned: “filesystem was created but mount failed on device “mpx.vmhba1:C0:T0:L0:6″.: Not found”

I then started looking through

/var/log/vpxa.log for more detail and found:

“usr/sbin/vmkfstools -C vmfs5 -b 1m -S datastore1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:3′ exited with status 1048320”

I Googled the status 1048320 and as if by magic, the article below was returned from William Lam from back in 2013 which effectively states that whilst vSAN does not make use of SCSI-2 reservations the LVM driver still requires it in order to create a VMFS datastore. By entering the following command on the ESXi hosts presenting vSAN (yes – the physical hosts, not the Nested VMs), this fakes the required SCSI reservations needed to proceed:-

esxcli system settings advanced set -o /VSAN/FakeSCSIReservations -i 1

After running the command on the first of my three host cluster nodes, compliance turned green and I had a smile back on my face. compliant


Given that vSAN is becoming more and more prevalent and the existence of so many Nested configurations, I figured I’d try to share and bring this important tip back to life.



Unable to establish an ssl connection with vcenter server – Mac OSX

I’ve been seeing some strange behaviour whilst performing some deployment operations such as pushing out OVF templates in that occasionally I would see “Unable to establish an SSL connection with vCenter Server”

Initially, I could get away with just ignoring it and if I tried it again, it would sometimes work, however for some reason, it would now always fail.

I did some digging and the logs suggested something to do with the Certificates not matching. As my MAC is not domain joined, I started digging around in the Keychain Access tool  –> Applications –> Utilities –> Keychain Access

In here I found all sorts of historical certificates from my various lab environments that I’ve added over the months and thought I’d better clear up some legacy certs as well as duplicate named certificates as a start point. It was here that I found that I had two certs for the same domain object (one which I’d previously rebuilt) so figured that was a good start to remove at least the now no longer used duplicate.

The problem still existed so I did some further investigation as whilst I had a trusted certificate for my VCSA, I couldn’t see the Trusted Root CA. This isn’t visible by default so I had to add the X509Anchors keychain by clicking “File –> Add Keychain” and locate this within /System/Library/Keychains/X509Anchors

Once this was visible, I again could see a previously trusted Root CA from a legacy domain I no longer use at home, but not the Root CA of my Microsoft Domain. I removed the invalid entry and went across to my internal Microsoft CA website to get the Root CA to reimport.

I downloaded the Root CA by clicking on install this CA certificate and it downloaded it to my MAC.

I then attempted to double click the file to import it and when prompted selected X509Anchors however I then received: Error 100013.

A quick search on Microsoft led me to the following article:

so I followed these steps:-

cd ~/Desktop
cp /System/Library/Keychains/X509Anchors ~/Library/Keychains
certtool i cert_filename k=X509Anchors
sudo cp ~/Library/Keychains/X509Anchors /System/Library/Keychains

I refreshed the Keychain Access tool and lo and behold, the cert appeared.

I double clicked on the newly listed certificate and expanded “Trust” and set the top option of “When using this certificate” to “Always Trust” and closed the certificate properties.

I then reloaded the vSphere Web Client, repeated the OVF deployment and everything worked!

Hope this helps somebody!

Bitdefender – GravityZone


Following on from a Sponsored visit at LonVMUG, I decided to take a closer look at Bitdefenders GravityZone product as it sparked an interest I have in trying to locate the most optimal and performant AV solution in a virtualized environment whilst at the same time offering relative deployment simplicity. I’ve previously deployed Trend Deep Security (agentless) and whilst the product itself performs well when compared with a full fat agent deployment in a virtualized environment, I did find displeasure in having to perform a whole sequence of prerequisite changes in order to upgrade a core component of a previous client infrastructure (Trend Deep Security 9.5 didn’t work with vSphere 6 so a core Virtualisation upgrade resulted in a spawned off project to upgrade to Trend 9.6) . As you may or may not know, agentless AV deployments aren’t really fully agentless when they place a dependency on the vShield Endpoint driver that is installed inside the virtual machine. In effect this could be classified as an installation requirement and therefore an agent of sort as it is not installed into virtual machines via VMware tools by default.

With BitDefenders GravityZones, they offer a light agent deployment that to me sits somewhere between a full fat agent and a typical agentless deployment. The great news is that it takes away the dependency and complexity that vShield introduces to environments that just want to keep things simple. At the end of the day, AV needs to be easy to deploy, guaranteed to work and effective at doing its job.

I’ve successfully deployed Bitdefender into my homelab and will have a deeper look at how its feature set compares with competitor products.

Nostalgia for Nostalgia – Prince of Persia OVF still working within vSphere 6

Many years ago, I used to demo the capabilities of VMware by using the freely accessible Nostalgia OVF from the VMware marketplace (I think it was available through vCenter 2.5 at the time). It was such a small and lightweight appliance containing a simple set of well known games that made demonstrating the power of a relatively new production ready technology (it was 2006) all the easier. I remember sitting in various meetings with clients and decision makers talking about and showing vMotion, Fault Tolerance and HA whilst playing Prince of Persia. I also remember using CPU Hog to enforce DRS activity as the icing on the cake to combine vMotion and intelligent resource placement. It was such as a simple but effective way of getting the message across about the capabilities of what could be done and how VMware was to be a game changer in server deployment, cost reduction and resource optimization.

Earlier this week, I had a Nostalgic moment, wondering if I can still do the same thing today that I did all those years ago – re-performing some new tests but leveraging a number of other product features available in the VMware portfolio (SRM, vSAN stretched cluster etc).

I set out to find the Nostalgia OVF but despite a search through the Virtual Appliance Marketplace (via Solution Exchange) I didn’t have any luck .

I then stumbled across an old VMware community post here that sent me in the right direction of the OVF

After running through the typical OVF deployment process and entering the above URL, the VM appeared within vSphere 6, residing on my vSAN datastore and waiting to be powered on. The results can be seen below:-

Nostalgia3 Nostalgia2 Nostalgia









Not quite sure when my next post will be, lets see how long it takes me to relive some of my childhood gaming memories ;o)


Handling of problematic disks in vSAN 6.1 – HomeLab warning

Just a quick note of caution for any other home lab users who are considering using vSAN 6.1. As part of the prep work for building the environment, it is important that if using consumer grade disks and/or bypassing some of the other HCL requirements, if there are sustained periods of high latency (which can be expected depending on how hard you push your kit), you should disable the device monitoring and unmounting process which could otherwise take your disk group offline. Whilst initially I thought this was the silver bullet to the problems I’ve been experiencing, in my scenario, it’s only been the Consumer grade SSD that disappears, not the entire Disk Group containing both the Samsung (consumer) and Intel (Enterprise) SSD.

I’ve copied the key commands below directly from Cormacs blog but I have applied *BOTH* settings in my environment.

  • Disable VSAN Device Monitoring (and subsequent unmounting of diskgroup):
    # esxcli system settings advanced set -o /LSOM/VSANDeviceMonitoring -i 0    <— default is “1″
  • Disable VSAN Slow Device Unmounting (continues monitoring):
    # esxcli system settings advanced set -o /LSOM/lsomSlowDeviceUnmount -i 0   <— default is “1″

The official VMware article on this can be found here KB2132079

Cormac Hogans blog article can be found here

The homelab rebuild.. vSAN Progress and initial VMs..

Further to my previous post regarding rebuilding my home lab with the Intel SSDs as the caching tier for an all flash vsan, unfortunately within a day, one of the ESX hosts fell over with the usual Permanent Disk Loss error and I had a sad face. I rebooted the host and re-applied the storage policy to bring the hosts back into compliance and thought I’d give everything one last chance before reverting to the magnetic disks. Since then (3 days and counting), the environment has stayed up and online and in fact I have pushed it harder than ever before by running multiple clones (at least 3 at a time) to properly kick the tyres at risk of building lots of VMs only for me to have to svMotion them over to my external array which is time consuming.

On average, a 40GB Windows 2012 Virtual Machine is taking no more than 7 minutes to clone and at the time, as I’ve only got Gb connectivity between hosts as part of the vSan cluster, the network is actually the bottleneck here at 125MB/s (and that would be assuming it was flat out and there was not overhead/transmit issues)

1Gb = 125MB/s
125MB/s x 60 = 7500MBs / Minute
40GB / 7500MB = roughly 5.5 minutes

A quick breakdown of the VM build so far:

2x Win2k12 Domain Controllers, running DNS and acting as a CA
1x SQL 2014 VM – hosting the ViewComposer DB
2 x View Connection Servers
1 x View Composer for Horizon View
1 x AppVolumes Server

I’ve been particularly light on the customisation side, but have green lights where green lights need to exist on the solutions I’ve built thus far. The most time consuming piece was the Certification piece, involving the replacement of the machine cert on the VCSA alongside working out how to reissue the Certs for the view connection and composer servers after I’d already performed the installs. From experience I’ve always had fun with certificates in Horizon View deployments, but this time round wasn’t as painful as I knew most of the pitfalls and gotchas. For those that administer Horizon View, this is a joy to see post installation:-




I used some of the following blogposts/links as reference for redeploying certificates:-

VMUG – first time participant

What an experience and a warm welcome I received during my first VMUG. The day kicked off with an entertaining start and a jam packed agenda with a great list of sponsors and valuable content. As a newbie, I was asked to introduce myself and as a consequence was rewarded for doing not a lot more than saying my name and was given a copy of Mastering vSphere 6, which is quite apt given my recent certification achievement for VCP6.

As I glanced around a very popular event, I felt somewhat star struck to come face to face with some industry experts, most notably Mike Laverick from whom I owe a lot of my career success to as a result of being a regular follower of his own passion for blogging in the early days of his rftm-ed blog.

There were also three VCDX’s in the room, again something I aspire to achieve over the next couple of years work/life dependent of course!

Plenty of swag was also there to be had, including hip flasks, Captain vSan t-shirts, USB keys, portable chargers and the token notebooks and pens. These came courtesy of sponsors such as Tegile, Bitdefender and Velostrata.


I’ll try and include some of the content topics in subsequent blog posts very soon.