Nested ESXi Problems running on vSAN – Stateless caching – Cannot apply the host configuration

I was trying to get AutoDeploy to work in a Nested ESXi configuration whereby I configured stateless caching, but each time I would end up with less and less hair as the Host Profile just would NOT apply and I’d see “Host does not appear to have a compliant stateless cache”.

NotCompliant

It always came down to the application of the “System Image Cache Configuration” that would result in the Profile failing to commit. Frustrating as it seemed, I was almost at the point of giving up, until I tried exhaustively hunting through the log files:

/var/log/syslog.log which always returned: “filesystem was created but mount failed on device “mpx.vmhba1:C0:T0:L0:6″.: Not found”

I then started looking through

/var/log/vpxa.log for more detail and found:

“usr/sbin/vmkfstools -C vmfs5 -b 1m -S datastore1 /vmfs/devices/disks/mpx.vmhba1:C0:T0:L0:3′ exited with status 1048320”

I Googled the status 1048320 and as if by magic, the article below was returned from William Lam from back in 2013 which effectively states that whilst vSAN does not make use of SCSI-2 reservations the LVM driver still requires it in order to create a VMFS datastore. By entering the following command on the ESXi hosts presenting vSAN (yes – the physical hosts, not the Nested VMs), this fakes the required SCSI reservations needed to proceed:-

esxcli system settings advanced set -o /VSAN/FakeSCSIReservations -i 1

After running the command on the first of my three host cluster nodes, compliance turned green and I had a smile back on my face. compliant

 

Given that vSAN is becoming more and more prevalent and the existence of so many Nested configurations, I figured I’d try to share and bring this important tip back to life.

How to run Nested ESXi on top of a VSAN datastore?

 

 

Unable to establish an ssl connection with vcenter server – Mac OSX

I’ve been seeing some strange behaviour whilst performing some deployment operations such as pushing out OVF templates in that occasionally I would see “Unable to establish an SSL connection with vCenter Server”

Initially, I could get away with just ignoring it and if I tried it again, it would sometimes work, however for some reason, it would now always fail.

I did some digging and the logs suggested something to do with the Certificates not matching. As my MAC is not domain joined, I started digging around in the Keychain Access tool¬† –> Applications –> Utilities –> Keychain Access

In here I found all sorts of historical certificates from my various lab environments that I’ve added over the months and thought I’d better clear up some legacy certs as well as duplicate named certificates as a start point. It was here that I found that I had two certs for the same domain object (one which I’d previously rebuilt) so figured that was a good start to remove at least the now no longer used duplicate.

The problem still existed so I did some further investigation as whilst I had a trusted certificate for my VCSA, I couldn’t see the Trusted Root CA. This isn’t visible by default so I had to add the X509Anchors keychain by clicking “File –> Add Keychain” and locate this within /System/Library/Keychains/X509Anchors

Once this was visible, I again could see a previously trusted Root CA from a legacy domain I no longer use at home, but not the Root CA of my Microsoft Domain. I removed the invalid entry and went across to my internal Microsoft CA website to get the Root CA to reimport.

http://domain.name/certsrv

I downloaded the Root CA by clicking on install this CA certificate and it downloaded it to my MAC.

I then attempted to double click the file to import it and when prompted selected X509Anchors however I then received: Error 100013.

A quick search on Microsoft led me to the following article:

https://support.microsoft.com/en-us/kb/887413

so I followed these steps:-

cd ~/Desktop
cp /System/Library/Keychains/X509Anchors ~/Library/Keychains
certtool i cert_filename k=X509Anchors
sudo cp ~/Library/Keychains/X509Anchors /System/Library/Keychains

I refreshed the Keychain Access tool and lo and behold, the cert appeared.

I double clicked on the newly listed certificate and expanded “Trust” and set the top option of “When using this certificate” to “Always Trust” and closed the certificate properties.

I then reloaded the vSphere Web Client, repeated the OVF deployment and everything worked!

Hope this helps somebody!

Bitdefender – GravityZone

Bit1

Following on from a Sponsored visit at LonVMUG, I decided to take a closer look at Bitdefenders GravityZone product as it sparked an interest I have in trying to locate the most optimal and performant AV solution in a virtualized environment whilst at the same time offering relative deployment simplicity. I’ve previously deployed Trend Deep Security (agentless) and whilst the product itself performs well when compared with a full fat agent deployment in a virtualized environment, I did find displeasure in having to perform a whole sequence of prerequisite changes in order to upgrade a core component of a previous client infrastructure (Trend Deep Security 9.5 didn’t work with vSphere 6 so a core Virtualisation upgrade resulted in a spawned off project to upgrade to Trend 9.6) . As you may or may not know, agentless AV deployments aren’t really fully agentless when they place a dependency on the vShield Endpoint driver that is installed inside the virtual machine. In effect this could be classified as an installation requirement and therefore an agent of sort as it is not installed into virtual machines via VMware tools by default.

With BitDefenders GravityZones, they offer a light agent deployment that to me sits somewhere between a full fat agent and a typical agentless deployment. The great news is that it takes away the dependency and complexity that vShield introduces to environments that just want to keep things simple. At the end of the day, AV needs to be easy to deploy, guaranteed to work and effective at doing its job.

I’ve successfully deployed Bitdefender into my homelab and will have a deeper look at how its feature set compares with competitor products.

VMUG – first time participant

What an experience and a warm welcome I received during my first VMUG. The day kicked off with an entertaining start and a jam packed agenda with a great list of sponsors and valuable content. As a newbie, I was asked to introduce myself and as a consequence was rewarded for doing not a lot more than saying my name and was given a copy of Mastering vSphere 6, which is quite apt given my recent certification achievement for VCP6.

As I glanced around a very popular event, I felt somewhat star struck to come face to face with some industry experts, most notably Mike Laverick from whom I owe a lot of my career success to as a result of being a regular follower of his own passion for blogging in the early days of his rftm-ed blog.

There were also three VCDX’s in the room, again something I aspire to achieve over the next couple of years work/life dependent of course!

Plenty of swag was also there to be had, including hip flasks, Captain vSan t-shirts, USB keys, portable chargers and the token notebooks and pens. These came courtesy of sponsors such as Tegile, Bitdefender and Velostrata.

image1

I’ll try and include some of the content topics in subsequent blog posts very soon.

All Flash vSan – Mac Mini Upgrade – Permanent Disk Failure Fix?

I’ve been experiencing the disappearing drive act, more commonly known as Permanent Disk Failure whereby under duress, the host will mark the SSD as failed simply because it just can’t keep up and goes walkabouts. This was almost reproducible on demand by either committing a large snapshot or just powering everything on at the same time (basically heavy IO).

 

After some research into whats causing it (apart from my environment not being on any sort of HCL), it seems that the SATA AHCI controller on the Macs really can’t cope too well and even though I thought I’d bought a decent SSD drive to compliment vSAN (a Samsung 850 Pro), this actually appeared to be more of an achilles heel than the controller. Rather than start replacing my lab with more power hungry, noise demanding hardware to work around the issue, I thought I’d give it one more roll of the dice and whilst again not technically on the HCL for an all Flash vSan, have purchased some Intel DC3700 SSDs to act as the vSAN cache tier to the pre-existing 850 Pro SSDs.

 

IMG_5921IMG_5923

Goodbye Hitachi magnetic disk, hello Intel SSD.

 

If the 850s continue to provide me with problems, I’ll revert to SATA Magnetic disk, although in theory, I shouldn’t be driving the 850s hard enough now for their bottle neck to rear its ugly head – although having said that, in an All Flash vSan, all reads are directed to the capacity tier (gulp). Another consideration I had thought of was to look at ROBO and whilst¬†vSphere 6.1 supports it, it doesn’t when using All Flash. For the time being I’ll be sticking with three Mac Mini hosts.

 

VMware Certification – 8 years a VCP

VCP-DCV

I’ve been VCP certified since 2008 on VI3 and continued to develop my interest in VMware by following up with vSphere 4, 5.5 and on Friday 15th Jan 2016, I’ll be attempting my fourth VCP accreditation in a bid to become a VCP in vSphere 6.

I’ll admit that this time round, it has been far more of a study challenge for me as I’ve not had as much exposure on vSphere 6 through the clients I’ve been working with, largely due to many companies inability to keep moving at the pace that VMware releases new versions and the compliance challenges with the requirements Matrix due to overlapping and dependent technologies. Take for example Trend Micro Deep Security, with vSphere 6 hitting the market in March 2015 and Trend not releasing v9.6 to be compatible with vSphere 6 until August 2015, as most companies go, they rightfully didn’t want to be the first to deploy a new product and in this case, Trend was the new product, requiring a typical 3 month wide birth until proven in the field by other more willing audiences (I won’t mention their legacy Horizon View 5 implementation).

In order to prepare correctly, I hit my home lab and bumped it up from 5.5 to 6.1 but not without challenge. I decided to jump straight into vSAN using the Mac Mini setup that closely matches Peter Bjork:-

A killer custom Apple Mac Mini setup running VSAN

but I had already made a previous purchase/investment in consumer grade SSDs (SAMSUNG 850 Pros), and almost immediately hit performance issues with drives simply disappearing, not to mention very high read/write latency on the capacity magnetic disks. Long story short, the entire vSAN fell over, I lost a few VMs in the process, but this ultimately helped me to learn how vSAN worked and how I could piece things back together again, realising the importance of the HCL and how a vSanDatastore needs to be treated with more respect than a typical VMFS datastore (i.e. don’t just place things on there using the datastore browser).

Anyhow, back to education, I found the following very useful study guide posts on VCP6-DCV (thanks Vladan) and have worked through them meticulously in my lab environment, alongside the reference/blue print material from VMware so fingers crossed for a successful outcome on Friday!

http://www.vladan.fr/vcp6-dcv/

Orphaned, disconnected or inaccessible?

I was asked the other day by one of my colleagues to explain what the difference was between each of these VM states so I figured I’d write a quick overview of each.

Orphaned VM

In a nutshell its a VM that vCenter still has a record of within the database, yet it either doesnt actually exist anymore, or isn’t on the host where vCenter expected to find it.

So how did it get into this mess? Well, quite simply really. Imagine you’re managing a two host cluster within vCenter and someone decides to administer one of the individual hosts directly through the vSphere client. They then proceed to remove one of the virtual machines from the inventory. As a result of this, the ESX host itself drops its record of the VM ever being there, but the vCenter DB still has record of its existence. As such vCenter marks this VM as “Orphaned”. To rectify this, either re-add the VM back to the standalone guest or remove the “Orphaned” entry from the inventory and re-add it. (note, if you do not remove the orpahned entry from vCenter server, you will not be able to re-add it as the same VM name).

Inaccessible

This is usually when a datastore or its associated folder/files on the datastore have gone walkabouts and the host can no longer see the VMX file it used to talk to in order to maintain visibility within the vCenter server. This can sometimes happen if someone decides to rename the folder the VM resides in, without removing it from the inventory, renaming the folder and then re-adding it.

To resolve the inaccessible message, either relocate where the underlying VM has gone or remove it from the inventory completely.

Disconnected

This is usually as a result of the host that last managed the VM losing communication with the vCenter server. Any VMs that were running at the time of the break in communication (or indeed a manual right click on the host and clicking disconnect), will render the VMs under its control as disconnected.

To resolve the disconnected state, Connect the ESX host back into the cluster.