Forums list
New topics
Topics list
Search
Help
Login
Register


Topic: «Running multpile VMs with Workstation 16 along side AWM causes GPU Crash , seems related to their new GPU Sandboxing » on forum: Technical Support   Views: 13671
 
Syrinx
Registered user
 
Posts: 28
Joined: 12/16/2016
Posted: 09/18/2020 05:43:52
 
 
Windows 10 x64 1809 (17763.1457)
VMWare Workstation 16 https://www.vmware.com/go/tryworkstation-win
AWM 8.14.3

Scenario & results:
Running a single VM (Windows 10 x64 1809) works fine and never resulted in any issues.

After a second VM is launched while the first is still up (also Windows 10 x64 1809) and it reaches the point Windows DWM (eg gui handler) would be loaded a sort of cascade failure occurs on the host.
The results are not always the same, meaning sometimes things are worse then others but the least problematic of encounters seems to be most open programs failing/crashing, some with alerts, some without. Some remain open with ghost outlines of where the title bar and buttons should be and remain unresponsive.
Sometimes all monitors except the primary one go blank but this does not always happen.
Sometimes I was unable to start any new programs or even shut down the PC and a hard reset is needed.

Other potentially relevant info:
AWM is running as Admin aka High Integrity
VMWare Workstation is running as Admin aka High Integrity
When used the mksSandbox.exe is running as a Standard User aka Medium Integrity
There are no rules enabled for VMWare Workstation or any of its processes in AWM
Windows 10 Exploit Settings does have a rule to 'Disable extension points' for the vmware-vmx.exe and I also tried adding it in a rule for the mksSandbox.exe but the errors still occurred.
Couldn't look at anything with Procmon as it also crashed around the same time everything else did

Initial assumption aka my best guess atm:
The new 'mksSandbox.exe' used for the GPU Sandboxing is doing something funky when multiple VMs are running (each gets their own sub-process of it running) and it's tripping AWM up somehow to the point it stops communicating/handling things from other processes and so they get stuck and fail.

Workarounds found:
Killing all 3 AWM processes then booting up the multiple VMs results in no errors or issues on the host (This leaves you without AWM functionality, obviously)
Preventing the new GPU Sandbox process "mksSandbox.exe" in VMWare Workstation 16 from being used results in no errors or issues (This leaves you without the GPU sandbox and the fallback to the old way of handling it is likely to be removed in future revisions so this seems to be both less secure and unlikely to exist as a valid workaround for long)
Currently added a rule to vmware-vmx.exe that prevents child processes but simply renaming the mksSandbox.exe should also be a valid way of making sure it doesn't get used (untested)

---------------------
Update 200919: I played with this a bit more today and managed to open the Device Manager 'once' after the existing programs crashed and the other monitors went blank leaving only the primary display....and it turned out that the GPU Drivers were having (!) issues so I've updated the thread title accordingly.

During another test I got a BSOD so I decided to stop testing it after that. Obviously AWM shouldn't really be able to crash the GPU drivers so I suppose it's more likely that it's VMWare causing the actual kernel/driver issues when the multiple sandboxed GPU processes are used.

I still find it odd that it only happens when AWM is running so perhaps AWM is simply serving as a catalyst and exposing a larger bug that they need to fix?

More Info:
AMD R9 Series GPU on host
Only saw crashes and occasional hard lock ups using Adrenalin 2020 Edition 20.4.2 Recommended (WHQL) Drivers (5/26/2020)
BSOD occurred after updating to Adrenalin 2020 Edition 20.9.1 Optional (9/16/2020)
Have reverted to the WHQL Drivers and am not using the new GPU Sandbox for now.
During testing this time I didn't even use full blown OS VMs to test. I found that simply loading up multiple Windows Install ISOs in 2 VMs(1909 & 2004) and reaching the setup screen was enough to trigger the issue with sandboxes not being blocked and AWM active.
 
Top
Bogdan Polishchuk
Administrator
 
Posts: 4010
Joined: 04/04/2012
Posted: 09/29/2020 10:08:13
 
 
Hello, Syrinx

Thank you for contacting us.

I was unable to reproduce the problem, but maybe because there are no mksSandbox.exe processes in my system.

Where did you find the information about the mksSandbox.exe processes being related to the VMware Sandbox Renderer? I did quick google search and was unable to find official information regarding this.

Also, I was unable to find information about how to enable/disable the VMware Sandbox Renderer, which probably could make the mksSandbox.exe appear. Are you aware of any way to enable Sandbox Renderer or the circumstances required for it to be enabled?

As it seemed to me from the description, the feature should be always enabled, but I don't see the mksSandbox.exe process.


Best regards.
 
Top
Syrinx
Registered user
 
Posts: 28
Joined: 12/16/2016
Posted: 09/29/2020 21:47:50
 
 
I don't think there was any one place that said outright this is where the sandbox resides. It was more just me paying attention I guess.
The release notes for v16 said:
Quote

Sandboxed Graphics
Virtual machine security is enhanced by removing graphics render from vmx and running it as a separate sandbox process.

Then I saw a new process being spawned by vmware-vmx.exe, "C:\Program Files (x86)\VMware\VMware Workstation\x64\mksSandbox.exe" running under medium integrity instead of high like vmware-vmx.exe
Also there is a new file in the VM folder when running, mksSandbox.log which when I opened I saw all kinds of graphics related stuff ex:
Quote

MKS-RenderMain: PowerOn allowed MKSBasicOps DX11Renderer
cap[ 18]: 0x00000014 (MAX_SHADER_TEXTURES)

On another note, did you try running on a host with a dedicated GPU and not in a VM?
When testing I tried running VMWare 16 inside a VM to reproduce and troubleshoot more safely but could not get the mksSandbox.exe to run there. I suppose because it detects that its in a virtualized environment and doesn't even bother trying. This is good for my current workaround because it means that unless they start supporting virtual environments for their gpu sandbox the legacy mode without the sandbox may not be removed down the line after all.
 
Top
Syrinx
Registered user
 
Posts: 28
Joined: 12/16/2016
Posted: 09/29/2020 23:47:42
 
 
After replying above I started trying to approach the issue differently and performed more tests. While doing so I found that having AWM open does seem to speed the issue occurrence up quite a bit but AWM is not actually at fault in any way.

I came to this conclusion because despite having all AWM processes closed, killed or stopped before launching the VMs and allowing the sandbox processes I would still face crashes and blank windows (but no gpu driver failure or BSOD thus far) after running multiple VMs alongside each other for about 3-5 minutes.

I should have tested more thoroughly before posting but I'm not a fan of hard resets or BSOD! :P

Now I know for sure who to contact at least! Please feel free to delete this thread. Thanks for trying to re-create the issue and I'm sorry for wasting your time and also dirtying up the forum with (as it turns out) unrelated stuff.
 
Top
Bogdan Polishchuk
Administrator
 
Posts: 4010
Joined: 04/04/2012
Posted: 10/01/2020 12:48:33
 
 
Syrinx,

But GPU driver failure or BSOD haven't happened without AWM running, right?

Please, let us know whether GPU driver failure or BSOD happens during the further using of muliple VMs without AWM running.
 
Top
Syrinx
Registered user
 
Posts: 28
Joined: 12/16/2016
Posted: 10/02/2020 03:32:39
 
 
It was always a mixed bag so to speak. Sometimes it was just the app crashes, other times all monitors except the primary one stopped working (I assume this was when the gpu driver crashed) and other times I'd get a BSOD [unhandled kmode exception or something like that].

Being able to reproduce any of those issues without AWM being active was enough for me to be sure it isn't involved after all. Granted it takes closer to 30-60 seconds with AWM active vs 3-5 minutes without but as it happens regardless I'm satisfied you all shouldn't worry about it right now. I didn't test beyond that one instance because as I said before I'm not fond of hard resets or BSOD on the host but I'm fairly sure that mixed bag would still be in play but I'm not eager to keep testing and in turn do more hard resets. For now I'm preventing use of their new GPU Sandbox and in turn avoiding any issues. I'll likely re-test it once they issue an update for v16 and I'll try to be sure and update here about how things go without AWM in those tests if you want.

Once again, I'm sorry for not testing without AWM 'long enough' and then pointing the finger your way. =(
 
Top


User(s) reading this topic
Number of guests: 1, registered members: 0, in total hidden: 0


Forums list
New topics
Topics list
Search
Help
Login
Register