Not all desktop pools were disabled due to an error – Horizon 7.0.2

I’ve come across this problem under a few scenarios before so thought I’d document the how and why of the problem here.

After configuring a new Desktop Pool which was designed to be a failover Pool for another Site, the first thing that was supposed to be done was to disable it. Unfortunately, when trying to do this basic operation, the following error message was displayed:-

“Error. Not all desktop pools were disabled due to an error”DesktopError2

Having seen this before, it transpires that it comes about because 3D Rendering is enabled on the pool, whilst “Allow users to choose protocol” is still set to Yes. I’m advised by VMware GSS that this is a regression bug (one that was fixed before but has crept back in)

DesktopError

Technically speaking, this isn’t a valid configuration as the documentation states its not allowed, however the setup Wizard allows you to save the configuration that way and the error is elusive as the real root cause. To figure this out, I managed to find the exact error within the logs on the Connection Server

C:\ProgramData\VMware\VDM\Logs\log-xxxx-xx-xx.txt (where xxx is the date)

DesktopError3

VMware Horizon View, Session in Session leads to poor performance

A client of mine wanted to dip their toe into the Cloud and what better way than to start delivering applications from the cloud into an existing on-prem VDI environment.

The on-prem setup comprised of nothing more complicated than a Windows 7 VDI and the proposed application was a more recent flavour of the Microsoft Office suite with Windows 2012 under the hood. The environment was spun up painlessly as an extension of the existing Horizon 7.0.2 deployment and integrated as a separate site off the Cloud Pod Architecture capability of VMware Horizon. Application delivery was initially tested by launching the app from a desktop PC geared up with VMware Client 4.3 and it worked flawlessly as expected. Superb, or so I thought, until however I tried to repeat the same application launch from the VDI platform.

The application started, but interactivity seemed sluggish so I figured I’d misconfigured something on the VMware client, perhaps a mismatch in Protocol or something or some sort of contention from the client on its journey into the cloud. After playing around with the various protocols, bearing in mind the VDI session was established using PCoIP and application publishing was also configured to use PCoIP, I tested every possible option to include BLAST to the VDI, PCoIP to the app, PCoIP to the VDI and BLAST to the app etc, but with no success. (note it is not possible to use RDP for application publishing. I’m not entirely sure why – please leave me a comment if you know the reason, but at the time of writing, RDP as a protocol is only usable with desktop publishing). I then RDP’d directly to the Windows 2012 server within the VDI session using the native MSTSC client and performance was absolutely fine (also tested with RemoteApp and it too was fine).

This left me no option but to log a ticket with VMwares GSS and their initial acknowledgement suggested this behaviour wasn’t unexpected. Whilst there is no definitive answer to the problem, it appears to be linked to the behaviour on a device that is installed with both the Horizon Agent in combination with the Horizon Client. This does therefore seem to be a fairly big problem for companies looking to stagger their application migration into the cloud, certainly when using VMware as the core and sole EUC platform.

Update: 26/01/17

After setting up a Citrix bare bones environment and repeating the same application publishing exercise, it was identified that the application was still subject to poor interactive performance!!  This was now despite running a PCoIP session and running an HDX delivered published application within it! 

At that point I figured the problem must be down to the source VDI, rather than the destination app. I subsequently configured a new test pool and started playing around with rendering options to include more VRAM and fewer monitors until eventually, the optimum configuration involved completely dropping 3D rendering altogether. 

Oh no! Turning this off means the loss of Aero for Windows 7, which was the sole purpose for it being enabled in the first instance! Without GPU cards in the hosts (BL460 blades) software rendering is the only choice to permit the clients required configuration, so it seems that session in session publishing using anything other than RemoteApp simply isn’t a possibility to meet their needs.

Update Appstack – Failed 5 times

I got a call the other day advising that AppVolumes was “broken” when trying to update an existing AppStack. I quickly hopped onto the environment and noticed the Activity Log showed the error “Update Appstack xxx  – Failed 5 times” as can be seen in the image below.

appstackfila

Knowing my way around the log files pretty well, I took at a look at the Server logs and noticed the following error:-

P7136DJ449121  INFO   RvSphere: Searching datastore “[pdesxappv_gen2_ssd] cloudvolumes/apps” non-recursively
[2017-01-06 11:49:04 UTC] P7136DJ449121  INFO        Cvo: Application volumes datastore folder already exists: [pdesxappv_gen2_ssd] cloudvolumes/apps/
[2017-01-06 11:49:04 UTC] P7136DJ449121 ERROR        Cvo: Unable to create AppStack because a volume at “[pdesxappv_gen2_ssd] cloudvolumes/apps/Core!20!Applications!20!(Enterprise!20!Trust)!20!-Live_30112016-update.vmdk” already exists
[2017-01-06 11:49:04 UTC] P7136DJ449121 ERROR        Cvo: Job error: Update AppStack #<Thread:0x9d5390> Unable to create AppStack because a volume at “[pdesxappv_gen2_ssd] cloudvolumes/apps/Core!20!Applications!20!(Enterprise!20!Trust)!20!-Live_30112016-update.vmdk” already exists

What this came down to was that the AppStack that was trying to be updated was trying to use a filename that already existed on the underlying datastore, based upon a previous updated version. So, to fix this, I changed the name of the Appstack as part of the update process and as soon as the AppStack name was changed, the error message was no longer presented and the updated became available to be provisioned for use.

I’m going to raise a Feature Request with VMware and ask them to include a level of error checking to ensure that duplicate file names cannot be entered and at least produce an error message when trying to commit an updated stack rather than waiting for the process to fail and having to trawl through the Server logs.

VMware Horizon with Published Applications – No support for opening files from a client device when the file resides on UNC paths. The file may be missing or have misconfigured permissions.

VMware have come on leaps and bounds in a bid to try to compete with the long standing success of Citrix in application delivery. However as a product looking to challenge the market, there are still some significant product limitations that affect its ability for uptake in the Enterprise environment.

I was looking to deploy a basic published application so that the local device would automatically launch the remote application in preference to the older version of the same application installed locally. During the deployment process, the FTAs (File Type Associations) were updated in order to force the file to open in the published application. Initially this proved to be successful when the source file (an .xlsx file) originated on the local devices C: drive.

It was only when I subsequently tested the opening of a file that was located on the Desktop or in My Documents that I was presented with the error:-

“There was an unexpected problem opening the file. The file may be missing or have misconfigured permissions. Contact your administrator for more information”

ftahorizon7

In the first instance, I thought there was something wrong in my configuration and deployment, but upon reviewing the log file on the Server hosting the published application, it turns out that the error translated into something more sinister:-

2017-01-03T16:56:59.537Z ERROR (0900-10AC) [ws_applaunchmgr] File ‘\\tsclient\(VMFR)Documents\Horizon7ProdMigrationPhase1.xlsx’ doesn’t exist {SESSION:541e_***_758e; SESSION:541e_***_758e}

As you can see from the log excerpt above, there was suggestion the process was trying to open the file using CDR (client drive redirection) by appending \\tsclient to the file, as it would normally do successfully, if of course the file was located on the local client device. The catch here however was that the file didn’t reside on the local device, it was in fact actually sat in a Redirected Folder courtesy of the use of Microsoft standard Folder redirection Group Policy to a DFS hosted network share, referenced by a UNC path.

I tried a few other tests to make sure it wasn’t just a problem with the folder redirection, so opened up a direct UNC path to a file sat on the DFS root \\domain.name\DFS\FileShare\Text.xlsx and the same problem occurred.

After a ticket with GSS, they suggested I map the DFS root with a drive letter and then permit the pass-thru of the drive letter to the published application, which did of course work. However, this isn’t practical and in fact inefficient as the file is being pulled from the Network share to the client and then back from the client to the published application. Realistically no Enterprise environment should be expected to change their drive letter standards to overcome a feature that I believe should be native to the product. Citrix XenApp and Microsoft RemoteApp can handle this with relative ease, so I’m really surprised VMware have dropped the ball on something so simplistic and for so long as application publishing has been in their portfolio since 2014!

The latest news I’ve heard is that this is now in product development with the aim to deliver in Q2 of 2017. I urge that if you are considering VMware Application Publishing and rely on the use of redirected files and folders with UNC paths, I would be extremely mindful that this could be a serious blocker for the uptake of this platform, both on-premises and in the cloud.

Migrating from VMware view 5 to Horizon View 7 (Part 2)

Building the Environment

As the client had sufficient swing space in terms of compute, it permitted the deployment of a parallel implementation, allowing a clean slate to be deployed where required. When performing project work like this, it is always my preference to work this way so as to provide a rollback, sufficient testing time for environmental comparisons and avoid bringing in “customisation’s” that grew with the legacy environment that are no longer applicable in the new.

The underlying Hypervisor environment and swing space hosts were built from scratch and provided with sufficient storage to meet the needs of the deployment. The additional components were built out as follows:

Infrastructure:
vSphere 6 in order to meet the requirement for Instant Clones
vCenter using VCSA to reduce unnecessary Windows patch management
vShield Manager 5.x
Trend Deep Security 9.6 SP1

EUC:
AppVolumes 2.11 to meet the requirement for Instant Clones (2.10 doesn’t work with Instant Clones)
Horizon 7.0.1 (latest release at the time of deployment)
Access Points 2.5.2 (latest supported release at the time of writing)

Where supported and required, virtual KEMP Load balancers were used internally in front of the Connection Servers and AppVolumes Managers as well as externally in the DMZ for the Access Points.

I won’t go into the detail of the installation as there are many online tutorials on how this is done but I will elaborate on some of the challenges and compatibility issues faced on the way.

  1. Instant Clones is not compatible with VMware Persona management (Page 16, under restrictions). As the initial aim was to deploy Horizon 7 to ensure support compliance, there was no quicker way than to retain Persona as UEM would take far more planning and effort than timeframes would allow. UEM would follow as a later sub-project so this took Instant Clones off the table immediately.
  2. Scope creep resulting in office activation prompts with certain AppStacks. During the deployment of Horizon 7, Microsoft Lync was introduced which meant that in addition to running Microsoft Office 2010 in the base image, components of Office 2013 were now present too. This resulted in applications delivered by AppVolumes needing to be re-provisioned where there was a KMS overlap as the AppStacks were originally only sealed with the presence of an Office 2010 KMS key. This resulted in the OS trying to re-activate Office applications post logon which caused no end of grief as we muddled through 100+ Appstacks to eliminate those that would present the activation prompt.
  3. Sessions would randomly drop and users were unable to reconnect when using PCoIP. One of the most bizarre problems would be random session disconnects which left users unable to reconnect. If the user switched to using HTML access, they were able to reconnect to their session, save their work and logoff to create a new PCoIP session. Eventually, after running process monitor against a faulting virtual machine, I identified the fault being down to excessive logging created by the TPAutoconnect service which would result in the log files filling the non-persistent disk. This only happened with users who either worked remotely or had printers attached to their endpoint device (in most cases, there were very few as 90% of the user base were on Zero client devices). To date, there hasn’t been a resolution to the growth in logging levels, but as a work around, persistent disks were removed and the OS disk size increased. It has been noted that there is an uplift in I/O and an SR is still active with GSS.
  4. AppVolumes 2.11 – bug with filter driver causing the error “Item not found” and “Could not find this item”. We have scripts that create and delete files to validate certain conditions for virtual desktops but randomly after introducing AppVolumes 2.11, the log files would throw up an error. We could also reproduce this problem by manually creating a file on the C: drive, then deleting it, resulting in the same Windows error. VMware subsquently issued me with AppVolumes 2.11.1 Patch release to allow us to continue with deployment.
  5. Error occurred during vCenter operation. This error occurred when a default linked clone Desktop Pool was configured with 3D rendering (any option) and “Allow users to choose protocol” and the option was changed to “Yes”. Bizarrely, the wizard permitted this configuration but any further actions to the pool resulted in the error. A quick check of the Logs on the Connection server, revealed the error stating that it was an invalid configuration (as per the documentation), but why this couldn’t be translated into a meaningful message from the console, I’m not quite sure.
  6. P25 Zero Clients, the primary monitor does not either correctly detect the resolution (going into 1024×768 rather than 1920×1200), or during login, the screen goes black, returns, hangs on Welcome, stalls back to black and eventually returns to display the desktop. (I’m guessing the sequence of events its part of the sync that takes place as it detects the primary/secondary monitor resolution). This was random and eventually by upgrading the Horizon View Agent 7.0.2, the problem appeared to be resolved.
  7. Remote sessions would disconnect after 10 hours – this came down to a hardcoded and unchangeable value in the VMware access point 2.5.2. This has since been resolved in VMware access point 2.7.2 and has been deployed since as users would work for 12+hours and the hardcoded value was not acceptable.

In summary, here’s where we ended up:-

Horizon View 7.0.2 (due to screen refresh problem)
VMware access point 2.7.2 (due to uncontrollable session disconnects)
VMware AppVolumes 2.11.1 (due to filter driver bug)
VMware persona management (retained due to shortened deployment window)
VMware linked clones (as Instant clones are not supported with Persona management)
50% of the Appstacks being reprovisioned due to the introduction of multiple KMS keys.

If you would like more information or details on these problems, please feel free to reach out and i’ll try and fill in the gaps or update the post where required.

Happy Virtualising