We're starting to see a cases where we start a VM on an ESXi hosts, but it vanishes from the set of VMs returned by the VI API as well as from vCenter's "vm" view.
This is on ESXi 5.5, build 2143827.
I know the VM is running:
/vmfs/volumes/52b48cea-6688cbe0-25f2-f0921c02d114/log # ps | egrep "44037504|vm-3197364-7320108"
44037504 44037504 vmx /bin/vmx
43382147 vmm0:vm-3197364-7320108
44037509 vmm1:vm-3197364-7320108
44037510 44037504 vmx-vthread-5:vm-3197364-7320108 /bin/vmx
44037511 44037504 vmx-vthread-6:vm-3197364-7320108 /bin/vmx
44037512 44037504 vmx-mks:vm-3197364-7320108 /bin/vmx
44037513 44037504 vmx-svga:vm-3197364-7320108 /bin/vmx
44037514 44037504 vmx-vcpu-0:vm-3197364-7320108 /bin/vmx
44037515 44037504 vmx-vcpu-1:vm-3197364-7320108 /bin/vmx
I also see open file handles on the VM's datastore, including one for the .vmx file.
However, we can't see it or interact with this VM in any way. If I use the VI APIs, or the MOB, or look in vCenter, there's no indication that this VM exists. The only clue I have of something going wrong is this in the log:
2015-10-23T08:12:24.725Z [330C1B70 info 'Vmsvc.vm:/vmfs/volumes/f0ed55bf-0a9d5396/session.vmx' opID=hostd-1d8f user=root] Couldn't find a device with key 4000
2015-10-23T08:12:24.725Z [330C1B70 info 'Vimsvc.TaskManager' opID=hostd-1d8f user=root] Task Completed : haTask-1627-vim.VirtualMachine.reconfigure-464107470 Status error
2015-10-23T08:12:24.725Z [330C1B70 warning 'Vmsvc.vm:/vmfs/volumes/f0ed55bf-0a9d5396/session.vmx' opID=hostd-1d8f user=root] Reconfigure worker failed to validate device spec
2015-10-23T08:12:24.726Z [330C1B70 info 'Vmsvc.vm:/vmfs/volumes/f0ed55bf-0a9d5396/session.vmx' opID=hostd-1d8f user=root] State Transition (VM_STATE_RECONFIGURING -> VM_STATE_ON)
2015-10-23T08:12:24.727Z [330C1B70 info 'Vmsvc.vm:/vmfs/volumes/f0ed55bf-0a9d5396/session.vmx' opID=hostd-1d8f user=root] Marking VirtualMachine invalid
2015-10-23T08:12:24.727Z [330C1B70 info 'Vmsvc.vm:/vmfs/volumes/f0ed55bf-0a9d5396/session.vmx' opID=hostd-1d8f user=root] State Transition (VM_STATE_ON -> VM_STATE_INVALID_CONFIG)
Can anybody shed some light on what I'm missing here?
Is there a particular VMX world not running that should be?
Is there some way to "kick" the ESXi box so that the VM re-appears when I query it from the API?